
您所在的位置:网站首页 numpyhamming 11.2.评价指标


2023-04-09 23:53| 来源: 网络整理| 查看: 265

Windows 10 Python 3.7.3 @ MSC v.1915 64 bit (AMD64) Latest build date 2020.05.10 sklearn version: 0.22.1 评价指标


函数 说明 accuracy_score(y_true, y_pred[, …]) ✅ 计算accuracy balanced_accuracy_score(y_true, y_pred) ✅ 计算 balanced accuracy precision_score(y_true, y_pred[, …]) ✅ 计算 precision recall_score(y_true, y_pred[, …]) ✅ 计算 recall f1_score(y_true, y_pred[, labels, …]) ✅ 计算 F1 score fbeta_score(y_true, y_pred, beta[, …]) ✅ 计算 F-beta score classification_report(y_true, y_pred) ✅ 显示主要的分类指标 precision_recall_fscore_support(…) ✅ 计算每个类别的 precision、recall、F-measure、support confusion_matrix(y_true, y_pred[, …]) ✅ 计算混淆矩阵 precision_recall_curve(y_true, …) ✅ 计算不同概率阈值的 precision 和 recall roc_curve(y_true, y_score[, …]) ✅ 计算 Receiver operating characteristic (ROC) roc_auc_score(y_true, y_score[, …]) ✅ 计算 ROC 曲线下的面积(AUC) auc(x, y) ✅ 计算 AUC,使用梯形计算公式 average_precision_score(y_true, y_score) ✅ 计算average precision (AP) brier_score_loss(y_true, y_prob[, …]) ✅ 计算 Brier score cohen_kappa_score(y1, y2[, labels, …]) 🔲 Cohen’s kappa: a statistic that measures inter-annotator agreement dcg_score(y_true, y_score[, k, …]) 🔲 计算Discounted Cumulative Gain hamming_loss(y_true, y_pred[, …]) ✅ 计算 average Hamming loss hinge_loss(y_true, pred_decision[, …]) 🔲 计算 Average hinge loss (non-regularized) jaccard_score(y_true, y_pred[, …]) ✅ 计算 Jaccard similarity coefficient score log_loss(y_true, y_pred[, eps, …]) ✅ Log loss, aka logistic loss or cross-entropy loss matthews_corrcoef(y_true, y_pred[, …]) 🔲 计算 the Matthews correlation coefficient (MCC) multilabel_confusion_matrix(y_true, …) ✅ Compute a confusion matrix for each class or sample ndcg_score(y_true, y_score[, k, …]) 🔲 Compute Normalized Discounted Cumulative Gain. zero_one_loss(y_true, y_pred[, …]) ✅ 计算Zero-one分类损失 应用场景


precision_recall_curve roc_curve brier_score_loss


confusion_matrix balanced_accuracy_score hinge_loss cohen_kappa_score matthews_corrcoef


accuracy_score recall_score precision_score f1_score fbeta_score classification_report precision_recall_fscore_support hamming_loss jaccard_similarity_score log_loss zero_one_loss


average_precision_score roc_auc_score


dcg_score ndcg_score accuracy

$$ \texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} \mathbb{I}(\hat{y}_i = y_i) $$


$$ \texttt{accuracy}(y, \hat{y}) = \frac{1}{\sum_{i=0}^{n_\text{samples}-1}w_i} \sum_{i=0}^{n_\text{samples}-1} \mathbb{I}(\hat{y}_i = y_i)w_i $$

from sklearn.metrics import accuracy_score accuracy_score(y_true, y_pred, normalize=True, sample_weight=None) normalize : 如果False,则返回分类正确的样本数量,如果True,则返回总体的正确率。 y_true = [0, 1, 2, 3] y_pred = [0, 2, 1, 3] print(accuracy_score(y_true, y_pred)) print(accuracy_score(y_true, y_pred, normalize=False)) print(accuracy_score(y_true, y_pred, sample_weight=(1,1,1,10))) 0.5 2 0.8461538461538461 balanced accuracy

$$ \texttt{balanced accuracy}(y, \hat{y}) = \frac{1}{n_\text{class}} \sum_{i=1}^{n_\text{class}} \text{accuracy}_i $$


from sklearn.metrics import balanced_accuracy_score balanced_accuracy_score(y_true, y_pred, sample_weight=None, adjusted=False) adjusted: The best value is 1 and the worst value is 0 when adjusted=False. When true, the result is adjusted for chance, so that random performance would score 0, and perfect performance scores 1. y_true = [0, 1, 0, 0, 1, 0] y_pred = [0, 1, 0, 0, 0, 1] balanced_accuracy_score(y_true, y_pred) 0.625 confusion matrix from toolkit import H from sklearn.metrics import confusion_matrix from sklearn.metrics import multilabel_confusion_matrix from sklearn.metrics import plot_confusion_matrix from sklearn.metrics import ConfusionMatrixDisplay import numpy as np confusion_matrix(y_true, y_pred, labels=None, sample_weight=None, normalize=None) y_true = [0, 1, 0, 0, 1, 0, 2, 1, 2, 2] y_pred = [0, 1, 0, 0, 0, 1, 2, 1, 0, 2] confusion_matrix(y_true, y_pred, labels=[1, 0, 2]) array([[2, 1, 0], [1, 3, 0], [0, 1, 2]], dtype=int64) multilabel confusion matrix multilabel_confusion_matrix(y_true, y_pred, sample_weight=None, labels=None, samplewise=False) y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] multilabel_confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) y_true = np.array([[1, 0, 1], [0, 1, 0]]) y_pred = np.array([[1, 0, 0], [0, 1, 1]]) multilabel_confusion_matrix(y_true, y_pred, samplewise=True) array([[[1, 0], [1, 1]], [[1, 1], [0, 1]]], dtype=int64) confusion matrix可视化


plot_confusion_matrix(estimator, X, y_true, labels=None, sample_weight=None, normalize=None, display_labels=None, include_values=True, xticks_rotation='horizontal', values_format=None, cmap='viridis', ax=None)


import numpy as np import matplotlib.pyplot as plt from sklearn import svm, datasets from sklearn.model_selection import train_test_split # import some data to play with iris = datasets.load_iris() X = iris.data y = iris.target class_names = iris.target_names # Split the data into a training set and a test set X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) # Run classifier, using a model that is too regularized (C too low) to see # the impact on the results classifier = svm.SVC(kernel='linear', C=0.01).fit(X_train, y_train) np.set_printoptions(precision=2) # Plot non-normalized confusion matrix disp = plot_confusion_matrix(classifier, X_test, y_test, display_labels=class_names, cmap=plt.cm.Blues, include_values=True, normalize=None) disp.ax_.set_title("Confusion matrix") print(disp.confusion_matrix) plt.show() [[13 0 0] [ 0 10 6] [ 0 0 9]]

通过 plot_confusion_matrix 函数返回的ConfusionMatrixDisplay对象的ax_属性可以访问Axes对象。


ConfusionMatrixDisplay(confusion_matrix, display_labels) ConfusionMatrixDisplay.plot(include_values=True, cmap='viridis', xticks_rotation='horizontal', values_format=None, ax=None) disp_2 = ConfusionMatrixDisplay(disp.confusion_matrix, display_labels=[0,1,2]).plot() # disp_2.figure_.savefig() disp_2.figure_


一些评价指标基本上是为二分类任务定义的(例如 f1_score、roc_auc_score)。在这些情况下,默认情况下仅评估 positive label。

将 binary metric 拓展到 multiclass 或 multilabel 问题时,数据将被视为二分类问题的集合,每个类都有一个metric。 例如,以下两种情况都被拆分为3个二分类任务:

multiclass task: [1, 2, 3] multilabel task: [1, 0, 1]

然后各个类的metric将以某种方式作均值运算,得到的均值将作为 multiclass 或 multilabel 问题的评价指标。 均值运算的方式通过 average 参数指定。

macro :各个 binary metrics 作简单的算术平均,每个类别具有相同的权重。binary metrics 由各类的混淆矩阵得到(每个类有一个混淆矩阵)。在那些具有不频繁类别,但不频繁类仍然重要的问题上,macro-averaging是突出其性能的一种手段。但另一方面,所有类别同样重要的假设通常是不真实的,因此 macro-averaging 将过度强调不频繁类的低性能。 weighted:每个样本赋予一个权重,各个类的权重取决于它们样本的权重。所以实际上,weighted平均是加权的macro平均。 micro: 各个类的混淆矩阵求和,再根据样本数求平均值(各个样本有同样的权重),得到平均混淆矩阵,最后根据平均混淆矩阵求出metric。在 multilabel 任务和需要忽略频繁类的 multiclass任务中,Micro-averaging 可能是优先选择的。 samples:仅适用于 multilabel problems。 各个 binary metrics 作的算术平均,binary metrics 由各样本的混淆矩阵得到(每个样本有一个混淆矩阵)。并返回 (sample_weight-weighted) 加权平均。 None:返回每个类的 score 。 precision、recall、F1



$$ \text{precision} = \frac{TP}{TP+FP} $$

$$ \text{recall} = \frac{TP}{TP+FN} $$

$$ F1 = \frac{2\times\text{precision}\times \text{recall}}{\text{precision}+ \text{recall}} $$




$$ {P_{\text {macro}}=\frac{1}{n} \sum_{i=1}^{n} P_{i}} $$

$$ {R_{\text {macro}}=\frac{1}{n} \sum_{i=1}^{n} R_{i}} $$

$$ {F_{\text {macro}}=\frac{2 \times P_{\text {macro}} \times R_{\text {macro}}}{P_{\text {macro}}+R_{\text {macro}}}} $$



$$ \begin{aligned} P_{\text {micro }}=\frac{\overline{T P}}{\overline{T P}+\overline{F P}}& =\frac{\frac{1}{n_{\text {class}}} \sum_{i=1}^{n} T P_{i}}{\frac{1}{n_{\text {class}}} \sum_{i=1}^{n} T P_{i}+\frac{1}{n_{\text {class}}} \sum_{i=1}^{n} F P_{i}} \\ &=\frac{\sum_{i=1}^{n} T P_{i}}{\sum_{i=1}^{n} T P_{i}+\sum_{i=1}^{n} F P_{i}} \end{aligned} $$

$$ {R_{\text {micro}}=\frac{\overline{TP}}{\overline{TP} + \overline{F N}}=\frac{\sum_{i=1}^{n} T P_{i}}{\sum_{i=1}^{n} T P_{i}+\sum_{i=1}^{n} F N_{i}}} $$

$$ {F_{\text {micro}}=\frac{2 \times P_{\text {micro}} \times R_{\text {micro}}}{P_{\text {micro}}+R_{\text {micro}}}} $$


precision_score(y_true, y_pred,labels = None, pos_label = 1, average ='binary', sample_weight = None) recall_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None) f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)

labels : list, optional。可以排除数据中存在的label,而如果输入数据中不存在的label,则会以0填充对应的位置。默认情况下,y_true和y_pred中的所有label会自动排序。

pos_label : str or int, 1 by default。对于二分类数据有效,如果数据是多分类的,该参数将被忽略。

average : string, [None, 'binary' (default), 'micro', 'macro', 'samples','weighted']。对于多类、多标签数据,此参数是必需的。如果为None,则返回每个类别的分数。否则将执行指定的平均类型。

'binary': 仅返回pos_label的分数,仅当二分类时才适用。

'micro': 返回micro平均。

'macro': 返回macro平均(未加权,即未考虑数据不平衡的问题)

'weighted': 加权的macro平均,考虑数据不平衡的问题,可能会导致F1不在precision和recall之间。

'samples': 计算每个实例的指标,并找到其平均值,仅对于多标签分类有意义。

sample_weight : array-like of shape = [n_samples], optional。Sample weights.


y_true = [0, 0, 0, 2, 1, 2, 0, 1, 1, 2] y_pred = [0, 0, 2, 1, 0, 2, 0, 2, 1, 2]


from sklearn.metrics import precision_score from sklearn.metrics import recall_score from sklearn.metrics import f1_score from sklearn.metrics import confusion_matrix


confusion_matrix(y_true, y_pred, labels=[0,1,2]) array([[3, 0, 1], [1, 1, 1], [0, 1, 2]], dtype=int64)

根据公式,各分类的指标计算如下: $$ P_{0} = \frac{\text{正确预测为0的样本个数}}{\text{预测为0的样本个数}} = \frac{3}{4} \\ R_{0} = \frac{\text{正确预测为0的样本个数}}{\text{真实为0的样本个数}} = \frac{3}{4} \\ F_{0} = \frac{2\times P_{0}\times R_{0}}{P_{0} + R_{0}} = \frac{3}{4} $$

$$ P_{1} = \frac{\text{正确预测为1的样本个数}}{\text{预测为1的样本个数}} = \frac{1}{2} \\ R_{1} = \frac{\text{正确预测为1的样本个数}}{\text{真实为1的样本个数}} = \frac{1}{3} \\ F_{1} = \frac{2\times P_{1}\times R_{1}}{P_{1} + R_{1}} = \frac{2}{5} $$

$$ P_{2} = \frac{\text{正确预测为2的样本个数}}{\text{预测为2的样本个数}} = \frac{2}{4} \\ R_{2} = \frac{\text{正确预测为2的样本个数}}{\text{真实为2的样本个数}} = \frac{2}{3} \\ F_{2} = \frac{2\times P_{2}\times R_{2}}{P_{2} + R_{2}} = \frac{4}{7} $$

$$ P_{0} = \frac{3}{4} P_{1}=\frac{1}{2} P_{2}=\frac{1}{2} \\ R_{0} = \frac{3}{4} R_{1}=\frac{1}{3} R_{2}=\frac{2}{3} \\ F_{0} = \frac{3}{4} F_{1}=\frac{2}{5} F_{2}=\frac{4}{7} $$

precision_score(y_true, y_pred, average=None) array([0.75, 0.5 , 0.5 ]) recall_score(y_true, y_pred, average=None) array([0.75, 0.33, 0.67]) f1_score(y_true, y_pred, average=None) array([0.75, 0.4 , 0.57])


$$ P_{\text {macro}} = \frac{1}{3}\times (\frac{3}{4} + \frac{1}{2} + \frac{1}{2}) = 0.58 \\ R_{\text {macro}} = \frac{1}{3}\times (\frac{3}{4} + \frac{1}{3} + \frac{2}{3}) = 0.58 \\ F_{\text {macro}} = \frac{1}{3}\times (\frac{3}{4} + \frac{2}{5} + \frac{4}{7}) = 0.57 $$


print( precision_score(y_true, y_pred, average='macro') ) print( recall_score(y_true, y_pred, average='macro') ) print( recall_score(y_true, y_pred, average='macro')) 0.5833333333333334 0.5833333333333334 0.5833333333333334

$$ P_{\text {micro}} = \frac{3+1+2}{4+2+4} = 0.6 \\ R_{\text {micro}} = \frac{3+1+2}{4+3+3} = 0.6 \\ F_{\text {micro}} = 0.6 $$

print( precision_score(y_true, y_pred, average='micro') ) print( recall_score(y_true, y_pred, average='micro') ) print( f1_score(y_true, y_pred, average='micro') ) 0.6 0.6 0.6


$$ P_{\text {macro}} = \frac{1}{10}\times (\frac{3}{4}\times4 + \frac{1}{2}\times3 + \frac{1}{2}\times3) = 0.6 \\ R_{\text {macro}} = \frac{1}{10}\times (\frac{3}{4}\times4 + \frac{1}{3}\times3 + \frac{2}{3}\times3) = 0.6 \\ F_{\text {macro}} = \frac{1}{10}\times (\frac{3}{4}\times4 + \frac{2}{5}\times3 + \frac{4}{7}\times3) = 0.59 $$

print( precision_score(y_true, y_pred, average='weighted') ) print( recall_score(y_true, y_pred, average='weighted') ) print( f1_score(y_true, y_pred, average='weighted') ) 0.6 0.6 0.5914285714285714 F-beta

$$ F_{\beta} = \frac{(1+\beta^2)\times P\times R}{\beta^2\times P + R} $$

fbeta_score(y_true, y_pred, beta, labels=None, pos_label=1, average=’binary’, sample_weight=None) 显示主要的分类指标



classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False) - labels: 选择展示分数信息的类别,默认为None,展示所有类别 - target_names:label的别名 - digits:小数精度,仅当output_dict=False时有效 - output_dict:默认为False,返回字符串,若为True,返回dict from sklearn.metrics import classification_report y_true = [0, 0, 0, 2, 1, 2, 0, 1, 1, 2] y_pred = [0, 0, 2, 1, 0, 2, 0, 2, 1, 2] print( classification_report(y_true, y_pred, target_names=["A","B","C"]) ) precision recall f1-score support A 0.75 0.75 0.75 4 B 0.50 0.33 0.40 3 C 0.50 0.67 0.57 3 accuracy 0.60 10 macro avg 0.58 0.58 0.57 10 weighted avg 0.60 0.60 0.59 10


precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None, pos_label=1, average=None, warn_for=(‘precision’, ’recall’, ’f-score’), sample_weight=None) from sklearn.metrics import precision_recall_fscore_support y_true = [0, 0, 0, 2, 1, 2, 0, 1, 1, 2] y_pred = [0, 0, 2, 1, 0, 2, 0, 2, 1, 2] print(precision_recall_fscore_support(y_true, y_pred, average='macro')) print(precision_recall_fscore_support(y_true, y_pred, average='micro')) print(precision_recall_fscore_support(y_true, y_pred, average='weighted')) (0.5833333333333334, 0.5833333333333334, 0.5738095238095239, None) (0.6, 0.6, 0.6, None) (0.6, 0.6, 0.5914285714285714, None) P-R曲线 precision_recall_curve(y_true, probas_pred, pos_label=None, sample_weight=None) import numpy as np from sklearn.metrics import precision_recall_curve y_true = np.array([0, 0, 1, 1]) y_scores = np.array([0.1, 0.4, 0.35, 0.8]) precision, recall, thresholds = precision_recall_curve(y_true, y_scores) # precision, recall, thresholds = precision_recall_curve(y_true, y_scores, # pos_label=0) print(thresholds) print(precision) print(recall) [0.35 0.4 0.8 ] [0.67 0.5 1. 1. ] [1. 0.5 0.5 0. ]



plot_precision_recall_curve(estimator, X, y, sample_weight=None, response_method='auto', name=None, ax=None, kwargs) from sklearn.metrics import plot_precision_recall_curve from sklearn import svm, datasets from sklearn.model_selection import train_test_split import numpy as np from sklearn.metrics import plot_precision_recall_curve import matplotlib.pyplot as plt iris = datasets.load_iris() X = iris.data y = iris.target # Add noisy features random_state = np.random.RandomState(0) n_samples, n_features = X.shape X = np.c_[X, random_state.randn(n_samples, 200 * n_features)] # Limit to the two first classes, and split into training and test X_train, X_test, y_train, y_test = train_test_split(X[y




CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3