python + sklearn ︱分类效果评估

您所在的位置：网站首页 › python计算召回率代码 › python + sklearn ︱分类效果评估

python + sklearn ︱分类效果评估

2023-10-06 06:21| 来源: 网络整理| 查看: 265

之前提到过聚类之后，聚类质量的评价：聚类︱python实现六大分群质量评估指标（兰德系数、互信息、轮廓系数） R语言相关分类效果评估： R语言︱分类器的性能表现评价（混淆矩阵，准确率，召回率，F1,mAP、ROC曲线）

文章目录一、acc、recall、F1、混淆矩阵、分类综合报告1、准确率第一种方式：accuracy_score第二种方式：metrics 2、召回率3、F14、混淆矩阵5、分类报告6、 kappa score 二、ROC1、计算ROC值2、ROC曲线三、距离1、海明距离2、Jaccard距离四、回归1、可释方差值（Explained variance score）2、平均绝对误差（Mean absolute error）3、均方误差（Mean squared error）4、中值绝对误差（Median absolute error）5、 R方值，确定系数五合理的进行绘图（混淆矩阵/ROC）6 AUC 和 F1的异同7 不同场景下的准确率与召回率的解读7.1 **推荐系统中的准确与召回文章见：推荐策略之召回**7.2 质检中的召回率、准确率的取舍参考文献：

一、acc、recall、F1、混淆矩阵、分类综合报告 1、准确率第一种方式：accuracy_score # 准确率 import numpy as np from sklearn.metrics import accuracy_score y_pred = [0, 2, 1, 3,9,9,8,5,8] y_true = [0, 1, 2, 3,2,6,3,5,9] accuracy_score(y_true, y_pred) Out[127]: 0.33333333333333331 accuracy_score(y_true, y_pred, normalize=False) # 类似海明距离，每个类别求准确后，再求微平均 Out[128]: 3 第二种方式：metrics

宏平均比微平均更合理，但也不是说微平均一无是处，具体使用哪种评测机制，还是要取决于数据集中样本分布

宏平均（Macro-averaging），是先对每一个类统计指标值，然后在对所有类求算术平均值。微平均（Micro-averaging），是对数据集中的每一个实例不分类别进行统计建立全局混淆矩阵，然后计算相应指标。（来源：谈谈评价指标中的宏平均和微平均）

from sklearn import metrics metrics.precision_score(y_true, y_pred, average='micro') # 微平均，精确率 Out[130]: 0.33333333333333331 metrics.precision_score(y_true, y_pred, average='macro') # 宏平均，精确率 Out[131]: 0.375 metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') # 指定特定分类标签的精确率 Out[133]: 0.5

其中average参数有五种：(None, ‘micro’, ‘macro’, ‘weighted’, ‘samples’)

2、召回率 metrics.recall_score(y_true, y_pred, average='micro') Out[134]: 0.33333333333333331 metrics.recall_score(y_true, y_pred, average='macro') Out[135]: 0.3125

3、F1 metrics.f1_score(y_true, y_pred, average='weighted') Out[136]: 0.37037037037037035

4、混淆矩阵 # 混淆矩阵 from sklearn.metrics import confusion_matrix confusion_matrix(y_true, y_pred) Out[137]: array([[1, 0, 0, ..., 0, 0, 0], [0, 0, 1, ..., 0, 0, 0], [0, 1, 0, ..., 0, 0, 1], ..., [0, 0, 0, ..., 0, 0, 1], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 1, 0]])

横为true label 竖为predict [外链图片转存失败(img-IrrcFK8f-1568273933227)(http://scikit-learn.org/stable/_images/sphx_glr_plot_confusion_matrix_0011.png)] .

5、分类报告 # 分类报告：precision/recall/fi-score/均值/分类个数 from sklearn.metrics import classification_report y_true = [0, 1, 2, 2, 0] y_pred = [0, 0, 2, 2, 0] target_names = ['class 0', 'class 1', 'class 2'] print(classification_report(y_true, y_pred, target_names=target_names))

其中的结果：

precision recall f1-score support class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 1.00 1.00 2 avg / total 0.67 0.80 0.72 5

包含：precision/recall/fi-score/均值/分类个数 .

6、 kappa score

kappa score是一个介于(-1, 1)之间的数. score>0.8意味着好的分类；0或更低意味着不好（实际是随机标签）

from sklearn.metrics import cohen_kappa_score y_true = [2, 0, 2, 2, 0, 1] y_pred = [0, 0, 2, 2, 0, 2] cohen_kappa_score(y_true, y_pred)

二、ROC 1、计算ROC值 import numpy as np from sklearn.metrics import roc_auc_score y_true = np.array([0, 0, 1, 1]) y_scores = np.array([0.1, 0.4, 0.35, 0.8]) roc_auc_score(y_true, y_scores) 2、ROC曲线 y = np.array([1, 1, 2, 2]) scores = np.array([0.1, 0.4, 0.35, 0.8]) fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)

来看一个官网例子，贴部分代码，全部的code见：Receiver Operating Characteristic (ROC)

import numpy as np import matplotlib.pyplot as plt from itertools import cycle from sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp # Import some data to play with iris = datasets.load_iris() X = iris.data y = iris.target # 画图 all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)])) # Then interpolate all ROC curves at this points mean_tpr = np.zeros_like(all_fpr) for i in range(n_classes): mean_tpr += interp(all_fpr, fpr[i], tpr[i]) # Finally average it and compute AUC mean_tpr /= n_classes fpr["macro"] = all_fpr tpr["macro"] = mean_tpr roc_auc["macro"] = auc(fpr["macro"], tpr["macro"]) # Plot all ROC curves plt.figure() plt.plot(fpr["micro"], tpr["micro"], label='micro-average ROC curve (area = {0:0.2f})' ''.format(roc_auc["micro"]), color='deeppink', linestyle=':', linewidth=4) plt.plot(fpr["macro"], tpr["macro"], label='macro-average ROC curve (area = {0:0.2f})' ''.format(roc_auc["macro"]), color='navy', linestyle=':', linewidth=4) colors = cycle(['aqua', 'darkorange', 'cornflowerblue']) for i, color in zip(range(n_classes), colors): plt.plot(fpr[i], tpr[i], color=color, lw=lw, label='ROC curve of class {0} (area = {1:0.2f})' ''.format(i, roc_auc[i])) plt.plot([0, 1], [0, 1], 'k--', lw=lw) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Some extension of Receiver operating characteristic to multi-class') plt.legend(loc="lower right") plt.show()

[外链图片转存失败(img-wX1gSdL7-1568273933229)(http://scikit-learn.org/stable/_images/sphx_glr_plot_roc_002.png)]

三、距离

1、海明距离 from sklearn.metrics import hamming_loss y_pred = [1, 2, 3, 4] y_true = [2, 2, 3, 4] hamming_loss(y_true, y_pred) 0.25

2、Jaccard距离 import numpy as np from sklearn.metrics import jaccard_similarity_score y_pred = [0, 2, 1, 3,4] y_true = [0, 1, 2, 3,4] jaccard_similarity_score(y_true, y_pred) 0.5 jaccard_similarity_score(y_true, y_pred, normalize=False) 2

四、回归 1、可释方差值（Explained variance score） from sklearn.metrics import explained_variance_score y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] explained_variance_score(y_true, y_pred)

2、平均绝对误差（Mean absolute error） from sklearn.metrics import mean_absolute_error y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] mean_absolute_error(y_true, y_pred)

3、均方误差（Mean squared error） from sklearn.metrics import mean_squared_error y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] mean_squared_error(y_true, y_pred)

4、中值绝对误差（Median absolute error） from sklearn.metrics import median_absolute_error y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] median_absolute_error(y_true, y_pred)

5、 R方值，确定系数 from sklearn.metrics import r2_score y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] r2_score(y_true, y_pred)

五合理的进行绘图（混淆矩阵/ROC） %matplotlib inline import itertools import numpy as np import matplotlib.pyplot as plt from sklearn import svm, datasets from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score,accuracy_score,recall_score,classification_report,confusion_matrix def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): """ This function prints and plots the confusion matrix. Normalization can be applied by setting `normalize=True`. """ if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') print(cm) plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.tight_layout() plt.ylabel('True label') plt.xlabel('Predicted label') def CalculationResults(val_y,y_val_pred,simple = False,\ target_names = ['class_-2_Not_mentioned','class_-1_Negative','class_0_Neutral','class_1_Positive']): # 计算检验 F1_score = f1_score(val_y,y_val_pred, average='macro') if simple: return F1_score else: acc = accuracy_score(val_y,y_val_pred) recall_score_ = recall_score(val_y,y_val_pred, average='macro') confusion_matrix_ = confusion_matrix(val_y,y_val_pred) class_report = classification_report(val_y, y_val_pred, target_names=target_names) print('f1_score:',F1_score,'ACC_score:',acc,'recall:',recall_score_) print('\n----class report ---:\n',class_report) #print('----confusion matrix ---:\n',confusion_matrix_) # 画混淆矩阵 # 画混淆矩阵图 plt.figure() plot_confusion_matrix(confusion_matrix_, classes=target_names, title='Confusion matrix, without normalization') plt.show() return F1_score,acc,recall_score_,confusion_matrix_,class_report

函数plot_confusion_matrix是绘制混淆矩阵的函数，CalculationResults则为只要给入y的预测值 + 实际值，以及分类的标签大致内容，就可以一次性输出：f1值，acc,recall以及报表

输出结果的部分，如下：

f1_score: 0.6111193724134587 ACC_score: 0.9414 recall: 0.5941485524896096 ----class report ---: precision recall f1-score support class_-2_Not_mentioned 0.96 0.97 0.97 11757 class_-1_Negative 0.68 0.51 0.58 182 class_0_Neutral 1.00 0.01 0.01 136 class_1_Positive 0.87 0.89 0.88 2925 avg / total 0.94 0.94 0.94 15000 Confusion matrix, without normalization [[11437 27 0 293] [ 72 93 0 17] [ 63 10 1 62] [ 328 7 0 2590]] 6 AUC 和 F1的异同

AUC vs F1 的区别

相同点：分析的起点是这两个指标存在一个共同目标。事实上： r e c a l l = T P R = T P T P + F N recall = TPR = \frac{TP}{TP + FN}recall=TPR=TP+FNTP 也就是说auc和f1 score都希望将样本中实际为真的样本检测出来(检验阳性)。

不同点：

auc希望训练一个尽量不误报的模型，也就是知识外推的时候倾向保守估计，而f1希望训练一个不放过任何可能的模型，即知识外推的时候倾向激进，这就是这两个指标的核心区别。

如何选择：所以在实际中，选择这两个指标中的哪一个，取决于一个trade-off。如果我们犯检验误报错误的成本很高，那么我们选择auc是更合适的指标。如果我们犯有漏网之鱼错误的成本很高，那么我们倾向于选择f1score。

放到实际中，对于检测传染病，相比于放过一个可能的感染者，我们愿意多隔离几个疑似病人，所以优选选择F1score作为评价指标。

而对于推荐这种场景，由于现在公司的视频或者新闻库的物料总量是很大的，潜在的用户感兴趣的item有很多，所以我们更担心的是给用户推荐了他不喜欢的视频，导致用户体验下降，而不是担心漏掉用户可能感兴趣的视频。所以推荐场景下选择auc是更合适的。

7 不同场景下的准确率与召回率的解读

召回率正样本有多少被找出来了（召回了多少）recall描述模型“把正样本叫 (call) 回来(re)”的能力。

准确率你认为的正样本，有多少猜对了（猜的精确性如何） precision描述模型“叫回来的正样本”有多少是精确的。

召回率高，准确率低，可能预测为负样本的很多，召回率比较低，但是准确率比较高；常见在：正负样本分布很不均匀，负样本很多的情况；解决：需要根据具体问题做离线交叉验证去找到最好的比例；正负样本不太均匀；新增正样本离线交叉验证，是咋做？

召回率低，准确率高出现的场景，模型缺少特征，只是很精准的找到一部分；看着加一些特征工程可能预测为正样本的很多，势必造成了正样本很多，但是错误也很多常见在：模型特征工程没做好；或者验证集和训练集分布差异比较大；或者训练集数据太小；或者代码问题；或者正负样本重合很大，容易混淆解决：分析Bad case，PRC，各个特征的权重分布，剔除无效特征，增加有效特征；需要对错误的标注样本进行修正

召回率低，准确率低常见在：正负样本重合很大，容易混淆

7.1 推荐系统中的准确与召回文章见：推荐策略之召回

召回率和准确率有时存在相互制衡的情况，好的召回策略应该在保证高准确率的情况下也保证高召回率。基于内容匹配的召回，基于内容匹配的召回率较高，但准确率较低，比较适用于冷启动的语义环境

根据内容兴趣去找，可能找到你之前没有记录过的兴趣，这种就是召回率高，但是准确率低

可能在你的历史兴趣之外，去找一些兴趣点

基于协同过滤的召回，基于协同过滤的召回即建立用户和内容间的行为矩阵，依据“相似性”进行分发。这种方式准确率较高，但存在一定程度的冷启动问题。

根据之前你的兴趣推荐你感兴趣的，就是在你的历史行为里面做推荐，会比较精准

两者结合：历史行为推荐（协同过滤） + 兴趣发散推荐（内容匹配/推荐）

7.2 质检中的召回率、准确率的取舍

负向质检项：通常“召回率”优先负向质检，就是找出不合格、不合规的地方。通俗地说，在销售、客服质检中，是指找出业务员“说了什么不该说的”。负向质检是企业比较常见的需求，尤其是在受监管比较严格的领域。在这里插入图片描述

正向质检项：通常“准确率”优先正向质检，是指对业务员符合规范的地方进行加分激励。最近几年，正向质检越来越受到企业重视，因为负向质检判断的只是业务员有没有犯错，是一种“惩恶”的理念，而正向质检可以用来激励业务员变得更加专业、更加规范，是一种“扬善”的理念，更有利于形成正向循环。在这里插入图片描述

参考文献：

sklearn中的模型评估

【本文地址】

python + sklearn ︱分类效果评估

python + sklearn ︱分类效果评估

今日新闻

推荐新闻