sklearn:auc、roc

您所在的位置:网站首页 auc的功能 sklearn:auc、roc

sklearn:auc、roc

2024-01-17 02:02| 来源: 网络整理| 查看: 265

sklearn.metrics.auc

作用:计算AUC(Area Under the Curve)

metrics.roc_curve

作用:计算 ROC(Receiver operating characteristic) 注意: this implementation is restricted to the binary classification task

sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)

parameter :

y_true : array, shape = [n_samples] True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given

y_score : array, shape = [n_samples]

pos_label : int or str, default=None , Label considered as positive and others are considered negative.

Returns fpr : false positive ratestpr : true positive ratesthresholds : array, shape = [n_thresholds]

例子:

pos_label = 1即表示标签为1的是正样本,其余的都是负样本,因为这个只能做二分类。 import numpy as np from sklearn import metrics y = np.array([1, 1, 2, 2,3,3]) pred = np.array([0.1, 0.4, 0.35, 0.8,0.1,0.8]) fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label = 1) metrics.auc(fpr, tpr) 0.3125 sklearn.metrics.roc_auc_score

作用:Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores 注意:this implementation is restricted to the binary classification task or multilabel classification task inlabel indicator format.

sklearn.metrics.roc_auc_score(y_true, y_score, average=’macro’, sample_weight=None, max_fpr=None) Parameters:

y_true : array, shape = [n_samples] or [n_samples, n_classes]

y_score : array, shape = [n_samples] or [n_samples, n_classes] average : string, [None, ‘micro’, ‘macro’ (default), ‘samples’, ‘weighted’],If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data。

Returns:

auc : float

### roc_auc_score import numpy as np from sklearn.metrics import roc_auc_score y_true = np.array([0, 0, 1, 1]) y_scores = np.array([0.1, 0.4, 0.35, 0.8]) roc_auc_score(y_true, y_scores) 0.75

roc_auc_score 是 预测得分曲线下的 auc,在计算的时候调用了 auc;

def _binary_roc_auc_score(y_true, y_score, sample_weight=None): if len(np.unique(y_true)) != 2: raise ValueError("Only one class present in y_true. ROC AUC score " "is not defined in that case.") fpr, tpr, tresholds = roc_curve(y_true, y_score, sample_weight=sample_weight) return auc(fpr, tpr, reorder=True)

所以不能用在多分类问题上。

多分类问题的auc计算例子: import numpy as np import matplotlib.pyplot as plt from itertools import cycle from sklearn import svm,datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp

导入数据:

iris = datasets.load_iris() X = iris.data y = iris.target

对训练标签做标签二值化运算(one-hot编码):

# Binarize the output y = label_binarize(y,classes=[0,1,2]) n_classes = y.shape[1] n_classes 3

对每个数据在尾部加入噪音:

# Add noisy features to make the problem harder random_state = np.random.RandomState(0) n_samples, n_features = X.shape X = np.c_[X,random_state.randn(n_samples,200 * n_features)]

注:np.c_

np.c_[random_state.randn(2,2),[[0,0],[1,1]]] array([[ 0.73381936, 0.26909417, 0. , 0. ], [ 1.07274021, -0.9826661 , 1. , 1. ]])

划分数据集:

# shuffle and split training and test sets X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.5,random_state=0)

fit一个分类器:

# Learn to predict each class against the other classifier = OneVsRestClassifier(svm.SVC(kernel='linear',probability=True, random_state=random_state)) y_score = classifier.fit(X_train,y_train).decision_function(X_test)

注:decision_function(X): Returns the distance of each sample from the decision boundary for each class.

计算每一个类别的ROC与AUC:

# Compute ROC curve and ROC area for each class fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): # 取出来的是各个类的测试值和预测值 fpr[i], tpr[i],_ = roc_curve(y_test[:, i],y_score[:,i]) roc_auc[i] = auc(fpr[i], tpr[i]) #Compute micro-average ROC curve and ROC area #类总和的基础上平均的ROC 和 AUC fpr["micro"],tpr["micro"],_ = roc_curve(y_test.ravel(), y_score.ravel()) roc_auc["micro"] = auc(fpr["micro"],tpr["micro"])

绘图:

plt.rcParams['savefig.dpi'] = 300 #图片像素 plt.rcParams['figure.dpi'] = 300 #分辨率 plt.figure() # linewidth lw = 2 plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2]) plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic example') plt.legend(loc="lower right") plt.show()

在这里插入图片描述

# Compute macro-average ROC curve and ROC area # First aggregate all false positive rates all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)])) # Then interpolate all ROC curves at this points mean_tpr = np.zeros_like(all_fpr) for i in range(n_classes): mean_tpr += interp(all_fpr, fpr[i], tpr[i]) # Finally average it and compute AUC mean_tpr /= n_classes fpr["macro"] = all_fpr tpr["macro"] = mean_tpr roc_auc["macro"] = auc(fpr["macro"], tpr["macro"]) # Plot all ROC curves plt.figure() plt.plot(fpr["micro"], tpr["micro"], label='micro-average ROC curve (area = {0:0.2f})' ''.format(roc_auc["micro"]), color='deeppink', linestyle=':', linewidth=4) plt.plot(fpr["macro"], tpr["macro"], label='macro-average ROC curve (area = {0:0.2f})' ''.format(roc_auc["macro"]), color='navy', linestyle=':', linewidth=4) colors = cycle(['aqua', 'darkorange', 'cornflowerblue']) for i, color in zip(range(n_classes), colors): plt.plot(fpr[i], tpr[i], color=color, lw=lw, label='ROC curve of class {0} (area = {1:0.2f})' ''.format(i, roc_auc[i])) plt.plot([0, 1], [0, 1], 'k--', lw=lw) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Some extension of Receiver operating characteristic to multi-class') plt.legend(loc="lower right") plt.show()

在这里插入图片描述



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3