原文地址:How to Choose Loss Functions When Training Deep Learning Neural Networks (Jason Brownlee on January 30, 2019)

Updated Oct/2019: Updated for Keras 2.3 and TensorFlow 2.0.Update Jan/2020: Updated for changes in scikit-learn v0.22 API




如何配置回归问题的均方误差和变量的模型;如何配置二元分类的交叉熵和 hinge 损失函数模型;如何配置多类分类的交叉熵和KL散度损失函数模型。




回归的损失函数 均方误差损失/Mean Squared Error Loss平均平方对数误差损失/Mean Squared Logarithmic Error Loss平均绝对误差损失/Mean Absolute Error Loss 二值分类损失函数 二叉熵/Binary Cross-EntropyHinge损失/Hinge LossSquared Hinge损失/Squared Hinge Loss 多类分类损失函数 多类交叉熵损失/Multi-Class Cross-Entropy Loss稀疏多类交叉熵损失/Sparse Multiclass Cross-Entropy LossKullback Leibler散度损失/Kullback Leibler Divergence Loss

我们将着重于如何选择和实现不同的损失功能。更多关于损失函数的理论,请看帖子:Loss and Loss Functions for Training Deep Learning Neural Networks



  我们将使用这个函数来定义一个有20个输入特征的问题:其中10个输入特征是有意义的,10个是无关紧要的。总共随机生成1000个示例。伪随机数生成器将被固定,以确保每一运行代码时我们得到相同的1,000 个实例。

from sklearn.datasets import make_regression # generate regression dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

  当实值输入和输出变量被缩放到一个可感知的范围时,神经网络通常表现得更好。对于这个问题,每个输入变量和目标变量都具有高斯分布;因此,在这种情况下,标准化数据是可取的,我们可以使用scikit-learn库中的StandardScaler transformer类来实现。在实际的问题中,我们会在训练数据集上准备scaler,并将其应用到训练和测试集上,但为了简单起见,我们会在分割为训练集和测试集之前将所有的数据一起标准化。sklearn.preprocessing.StandardScaler

# standardize dataset X = StandardScaler().fit_transform(X) y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]


# split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:]


输入层:20个神经元(20个输入特征);隐藏层:25个神经元,激活函数采用rectified linear activation function (ReLU);输出层:一个神经元,激活函数采用线性函数,将会给出一个要预测的实值。 # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear'))

Dense就是常用的全连接层,所实现的运算是output = activation(dot(input, kernel)+bias)。其中activation是逐元素计算的激活函数,kernel是本层的权值矩阵,bias为偏置向量,只有当use_bias=True才会添加。Keras官方文档Dense层


opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='...', optimizer=opt) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

现在我们已经有了一个问题和模型的基础,我们可以评估适合回归预测建模问题的三个常见损失函数。 虽然在这些例子中使用了MLP,但是在训练CNN和RNN模型进行回归时可以使用相同的损失函数。

Mean Squared Error Loss


  均方误差是预测值和实际值之间的平方差的平均值: M S E = 1 N ∑ i = 1 N ( y i − y ^ i ) 2 MSE=\frac{1}{N}\sum_{i=1}^N(y_i-\hat{y}_i)^2 MSE=N1​i=1∑N​(yi​−y^​i​)2




model.add(Dense(1, activation='linear'))


# mlp for regression with mse loss function from sklearn.datasets import make_regression from sklearn.preprocessing import StandardScaler from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot # generate regression dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1) # standardize dataset X = StandardScaler().fit_transform(X) y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0] # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='mean_squared_error', optimizer=opt) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model train_mse = model.evaluate(trainX, trainy, verbose=0) test_mse = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_mse, test_mse)) # plot loss during training pyplot.title('Loss / Mean Squared Error') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() pyplot.show()


Train: 0.000, Test: 0.001

  还创建了一个线状图,显示了在训练时段内,训练集(蓝色)和测试集(橙色)的均方误差损失。   我们可以看到,模型收敛得相当快,训练和测试性能保持相同。该模型的性能和收敛特性表明,均方误差是一个很好的匹配神经网络学习这个问题。

Mean Squared Logarithmic Error Loss

  可能会出现目标值范围很大的回归问题,在预测较大的值时,您可能不希望像均方误差那样严重地惩罚模型。相反,你可以首先计算每个预测值的自然对数,然后计算再其均方误差,这叫做平均平方对数误差损失,简称MSLE。 M S L E = 1 N ∑ i = 1 N ( log ⁡ ( y ^ i + 1 ) − log ⁡ ( y i + 1 ) ) 2 MSLE=\frac{1}{N}\sum_{i=1}^N(\log (\hat{y}_i+1)-\log (y_i+1))^2 MSLE=N1​i=1∑N​(log(y^​i​+1)−log(yi​+1))2

  MSLE能够放松对大预测值所带来的大差异的惩罚效应,当模型直接预测未缩放的数据时,它可能是一种更合适的损失度量,我们可以用简单回归问题来演示这个损失函数。可以使用损失函数mean_squared_logmic_error 更新模型,并保持相同的输出层配置。在拟合模型时,我们还将MSE作为评估模型在训练和测试时的性能的指标,并使用它绘制学习曲线。

model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=['mse'])

compile(self, optimizer, loss, metrics=[], loss_weights=None, sample_weight_mode=None)Keras官方文档Model metrics:列表,包含评估模型在训练和测试时的性能的指标,典型用法是metrics=[‘accuracy’]如果要在多输出模型中为不同的输出指定不同的指标,可像该参数传递一个字典,例如metrics={‘ouput_a’: ‘accuracy’}


# mlp for regression with msle loss function from sklearn.datasets import make_regression from sklearn.preprocessing import StandardScaler from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot # generate regression dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1) # standardize dataset X = StandardScaler().fit_transform(X) y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0] # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=['mse']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model _, train_mse = model.evaluate(trainX, trainy, verbose=0) _, test_mse = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_mse, test_mse)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot mse during training pyplot.subplot(212) pyplot.title('Mean Squared Error') pyplot.plot(history.history['mean_squared_error'], label='train') pyplot.plot(history.history['val_mean_squared_error'], label='test') pyplot.legend() pyplot.show()


Train: 0.165, Test: 0.184


Mean Absolute Error Loss

  在一些回归问题上,目标变量的分布可能大多是高斯分布,但也可能有离群值,例如离均值很远(很大或很小)的值。在这种情况下,平均绝对误差损失是一个合适的损失函数,因为它对异常值更有鲁棒性,它是以实际值和预测值之间的绝对差的平均值来计算的。 M A E = 1 N ∑ i = 1 N ∣ y i − y ^ i ∣ MAE=\frac{1}{N}\sum_{i=1}^N|y_i-\hat{y}_i| MAE=N1​i=1∑N​∣yi​−y^​i​∣

可以使用损失函数 mean_absolute_error 更新模型,并保持输出层的配置不变。

# mlp for regression with mae loss function from sklearn.datasets import make_regression from sklearn.preprocessing import StandardScaler from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot # generate regression dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1) # standardize dataset X = StandardScaler().fit_transform(X) y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0] # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='mean_absolute_error', optimizer=opt, metrics=['mse']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model _, train_mse = model.evaluate(trainX, trainy, verbose=0) _, test_mse = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_mse, test_mse)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot mse during training pyplot.subplot(212) pyplot.title('Mean Squared Error') pyplot.plot(history.history['mean_squared_error'], label='train') pyplot.plot(history.history['val_mean_squared_error'], label='test') pyplot.legend() pyplot.show()


Train: 0.002, Test: 0.002




  在这一节中,我们将研究适和二元分类预测建模问题的损失函数。我们将从scikiti -learn中的圆圈测试问题中生成示例,作为本研究的基础。圆问题涉及从二维平面上的两个同心圆中抽取的样本,其中外圆上的点属于类0,内圆上的点属于类1。统计噪声被添加到样本,以增加模糊度,使问题更具有挑战性的学习。


from sklearn.datasets import make_circles # generate circles X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)


# scatter plot of the circles dataset with points colored by class from sklearn.datasets import make_circles from numpy import where from matplotlib import pyplot # generate circles X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # select indices of points with each class label for i in range(2): samples_ix = where(y == i) pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1], label=str(i)) pyplot.legend() pyplot.show()


这些点已经合理地在0左右伸缩,几乎在[-1,1]内,我们不会再作标准化。 数据集被均匀地分割成训练集和测试集。

# split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:]


# define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='...'))


opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='...', optimizer=opt, metrics=['accuracy'])


# fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0)

现在我们有了一个问题和模型的基础,我们可以看看评估三个常见的损失函数是适合二分类预测建模问题。 虽然在这些例子中使用了一个MLP,但是在训练CNN和RNN模型进行二值分类时可以使用相同的损失函数。

Binary Cross-Entropy

  交叉熵是二分类问题的默认的损失函数,它用于目标值在 { 0 , 1 } \{0,1\} {0,1}集合中的二分类。在数学上,它是最大似然推理框架下的优选损失函数。交叉熵损失函数是二分类问题首先计算的损失函数,只有在有充分理由的情况下才会转而计算其他损失函数。

  交叉熵将计算一个分数,对实际概率分布和预测概率分布之间的平均差异求和,用于预测第1类。分数最小化,完美的交叉熵值为0。 B C E L o s s = − 1 N ∑ i = 1 N [ y i log ⁡ P ( y ^ i = 1 ) + ( 1 − y i ) log ⁡ ( 1 − P ( y ^ i = 1 ) ) ] BCELoss=-\frac{1}{N}\sum_{i=1}^N\left[y_i \log P(\hat{y}_i=1)+(1-y_i)\log (1-P(\hat{y}_i=1))\right] BCELoss=−N1​i=1∑N​[yi​logP(y^​i​=1)+(1−yi​)log(1−P(y^​i​=1))]


model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])


model.add(Dense(1, activation='sigmoid'))


# mlp for the circles problem with cross entropy loss from sklearn.datasets import make_circles from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='sigmoid')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.836, Test: 0.852


Hinge Loss

  标签值 y i ∈ { − 1 , 1 } y_i\in\{-1,1\} yi​∈{−1,1},预测值 y ^ = w x + b ∈ R \hat{y}=wx+b\in\mathbb{R} y^​=wx+b∈R,对于任一样本,定义hinge损失函数为: h i n g e L o s s = max ⁡ { 0 , 1 − y i ⋅ y ^ i } = max ⁡ { 0 , 1 − y i ⋅ ( w x i + b ) } hingeLoss=\max\{0,1-y_i\cdot \hat{y}_i\}=\max\{0,1-y_i\cdot (wx_i+b)\} hingeLoss=max{0,1−yi​⋅y^​i​}=max{0,1−yi​⋅(wxi​+b)}


  Hinge损失函数用于目标值在 { − 1 , 1 } \{- 1,1\} {−1,1}集合中的二分类,它鼓励实例使用正确的符号,当实际的类值和预测的类值之间的符号有差异时,会产生更大的误差。关于Hinge损失函数的性能结果是混合的,有时在二分类问题上的性能优于交叉熵。

  首先,必须修改目标变量,使其具有集合 { − 1 , 1 } \{- 1,1\} {−1,1}中的值。

# change y from {0,1} to {-1,1} y[where(y == 0)] = -1


model.compile(loss='hinge', optimizer=opt, metrics=['accuracy'])

最后,网络的输出层必须配置为具有双曲正切激活函数的单个节点,能够输出范围为 [ − 1 , 1 ] [- 1,1] [−1,1]的单个值。

model.add(Dense(1, activation='tanh'))


# mlp for the circles problem with hinge loss from sklearn.datasets import make_circles from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot from numpy import where # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # change y from {0,1} to {-1,1} y[where(y == 0)] = -1 # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='tanh')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='hinge', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.792, Test: 0.740


Squared Hinge Loss


  与使用铰链损失函数一样,目标变量必须修改为取值于集合 { − 1 , 1 } \{- 1,1\} {−1,1}。

# change y from {0,1} to {-1,1} y[where(y == 0)] = -1


model.compile(loss='squared_hinge', optimizer=opt, metrics=['accuracy'])

最后,输出层必须使用具有双曲正切激活函数的单个节点,该双曲正切激活函数能够输出 [ − 1 , 1 ] [- 1,1] [−1,1]范围内的连续值。

model.add(Dense(1, activation='tanh'))


# mlp for the circles problem with squared hinge loss from sklearn.datasets import make_circles from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD from matplotlib import pyplot from numpy import where # generate 2d classification dataset X, y = make_circles(n_samples=1000, noise=0.1, random_state=1) # change y from {0,1} to {-1,1} y[where(y == 0)] = -1 # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='tanh')) opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='squared_hinge', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.682, Test: 0.646



  多类分类是指将实例分配到两个以上类中的一个的预测建模问题。该问题通常被定义为预测一个整数值,其中每个类被分配一个从0到 ( n u m _ c l a s s e s − 1 ) (num\_classes - 1) (num_classes−1)的唯一整数值,该问题通常被实现为预测实例属于每个已知类的概率。


# generate dataset X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2)


# scatter plot of blobs dataset from sklearn.datasets import make_blobs from numpy import where from matplotlib import pyplot # generate dataset X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2) # select indices of points with each class label for i in range(3): samples_ix = where(y == i) pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1]) pyplot.show()




# split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:]


# define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(..., activation='...'))


# compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='...', optimizer=opt, metrics=['accuracy'])


# fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)


Multi-Class Cross-Entropy Loss

  交叉熵是多类分类问题的默认损失函数。在这种情况下,它用于多类分类,其中目标值在 { 0 , 1 , 3 , … , n } \{0,1,3,…,n\} {0,1,3,…,n}集合中,其中每个类被分配一个唯一的整数值。在数学上,它是最大似然推理框架下的优选损失函数。交叉熵损失函数是多分类问题首先计算的损失函数,只有在有充分理由的情况下才会改变。


− 1 N ∑ i = 1 N ∑ k = 1 K y i , k log ⁡ P ( y i ^ = k ) -\frac{1}{N}\sum_{i=1}^N\sum_{k=1}^Ky_{i,k}\log P(\hat{y_i}=k) −N1​i=1∑N​k=1∑K​yi,k​logP(yi​^​=k)

其中, y i , k y_{i,k} yi,k​为0,1变量,取值为1表示第 i i i个样本属于第 k k k类, K K K为类别个数, N N N为样本个数。

model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])


model.add(Dense(3, activation='softmax'))

反过来,这意味着目标变量必须是热编码的。这是为了确保每个示例对实际类值的期望概率为1.0,对所有其他类值的期望概率为0.0。这可以使用Keras中的to_categorical() 函数来实现。

# one hot encode output variable y = to_categorical(y)


# mlp for the blobs multi-class classification problem with cross-entropy loss from sklearn.datasets import make_blobs from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from keras.utils import to_categorical from matplotlib import pyplot # generate 2d classification dataset X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2) # one hot encode output variable y = to_categorical(y) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(3, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.840, Test: 0.822


Sparse Multiclass Cross-Entropy Loss



model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])


model.add(Dense(3, activation='softmax'))



# mlp for the blobs multi-class classification problem with sparse cross-entropy loss from sklearn.datasets import make_blobs from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from matplotlib import pyplot # generate 2d classification dataset X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(3, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.832, Test: 0.818

  我们还绘制了两个线图,顶部是训练集(蓝色)和测试集(橙色)在不同时期的稀疏交叉熵损失,底部的图显示不同时期的分类精度。在这种情况下,loss曲线和classification accuracy曲线显示了较好收敛性。

Kullback Leibler Divergence Loss

  Kullback Leibler散度,或简称KL散度,度量了一个概率分布如何不同于基线分布。KL散度损失为0表明分布是相同的。实际上,KL散度的行为与交叉熵非常相似。它计算如果使用预测的概率分布来近似期望的目标概率分布,会有多少信息丢失(以比特为单位)。



model.compile(loss='kullback_leibler_divergence', optimizer=opt, metrics=['accuracy'])


# one hot encode output variable y = to_categorical(y)


# mlp for the blobs multi-class classification problem with kl divergence loss from sklearn.datasets import make_blobs from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from keras.utils import to_categorical from matplotlib import pyplot # generate 2d classification dataset X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2) # one hot encode output variable y = to_categorical(y) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # define model model = Sequential() model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(3, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(loss='kullback_leibler_divergence', optimizer=opt, metrics=['accuracy']) # fit model history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0) # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) # plot loss during training pyplot.subplot(211) pyplot.title('Loss') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() # plot accuracy during training pyplot.subplot(212) pyplot.title('Accuracy') pyplot.plot(history.history['accuracy'], label='train') pyplot.plot(history.history['val_accuracy'], label='test') pyplot.legend() pyplot.show()


Train: 0.822, Test: 0.822





