SelectKBest()函数筛选特征

您所在的位置:网站首页 数据筛选法 SelectKBest()函数筛选特征

SelectKBest()函数筛选特征

2024-07-11 04:03| 来源: 网络整理| 查看: 265

SelectKBest()函数:选择K个最好的特征,返回选择特征后的数据   运用了三种方法来选择: (1)相关系数法。 使用相关系数法,先计算各个特征对目标值的相关系数及相关系数的P值,然后根据阈值筛选特征。 (2)卡方检验。 经典的卡方检验是检验定性自变量与定性因变量的相关性。 (3)最大信息系数法(Maximal information coefficient, MIC)。 MIC是基于互信息理论的,经典的互信息也是评价定性自变量与定性因变量相关性的方法。

import numpy as np from sklearn.datasets import load_iris from array import array from sklearn.feature_selection import SelectKBest from scipy.stats import pearsonr from sklearn.feature_selection import chi2 from minepy import MINE iris = load_iris() print("iris.data:\n", iris.data) print("iris.target:\n", iris.target) # (1)相关系数法。 # 使用相关系数法,先要计算各个特征对目标值的相关系数及相关系数的P值,然后根据阈值筛选特征。 # 选择K个最好的特征,返回选择特征后的数据 # 第一个参数为计算评估特征是否好的函数,该函数输入特征矩阵和目标向量,输出二元组(评分,P值)的数组, # 数组第i项为第i个特征的评分和P值。在此定义为计算相关系数 # 参数k为选择的特征个数 # corr, p = pearsonr(x,y): # 皮尔逊相关系数(Pearson Correlation Coefficient)用于衡量两个变量之间的线性相关相关关系, # 相关系数的取值在-1与1之间,大于0为正相关,小于0为负相关。 # 如 PearsonRResult(statistic=-0.8233869695926184, pvalue=0.17661303040738163) data_cor = SelectKBest( lambda X, Y: np.array(list(map(lambda x: pearsonr(x, Y), X.T))).T[0], k=2).fit_transform(iris.data, iris.target) print("data_cor:\n", data_cor) ################################################################################### # (2)卡方检验。 # 经典的卡方检验是检验定性自变量与定性因变量的相关性。 # 卡方检验核心思想:实际观测值与期望值之间的偏离程度 # 实际观测值与期望值之间的偏离程度决定卡方值的大小,卡方值越小,偏差越小,实际值越趋于符合期望值 # 选择K个最好的特征,返回选择特征后的数据 data_chi2 = SelectKBest(chi2, k=2).fit_transform(iris.data, iris.target) print("data_chi2:\n", data_chi2) #################################################################################### # (3)最大信息系数法(Maximal information coefficient, MIC)。 # MIC是基于互信息理论的,经典的互信息也是评价定性自变量与定性因变量相关性的方法。 # 两个随机变量的互信息是变量间相互依赖性的量度,度量两个随机变量共享的信息 # 也可以说互信息是指知道随机变量x,对随机变量y的不确定性(熵,表示一个随机变量的信息量)的减少 # 由于MINE的设计不是函数式的,定义mic方法将其为函数式的,返回一个二元组,二元组的第2项设置成固定的P值0.5 def mic(x, y): m = MINE() m.compute_score(x, y) # m.mic(): # Returns the Maximal Information Coefficient (MIC or MIC_e). return (m.mic(), 0.5) # 选择K个最好的特征,返回特征选择后的数据 data_mic = SelectKBest( lambda X, Y: np.array(list(map(lambda x: mic(x, Y), X.T))).T[0], k=2).fit_transform(iris.data, iris.target) print("data_mic:\n", data_mic)

  运行结果:

iris.data: [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4] [4.6 3.4 1.4 0.3] [5. 3.4 1.5 0.2] [4.4 2.9 1.4 0.2] [4.9 3.1 1.5 0.1] [5.4 3.7 1.5 0.2] [4.8 3.4 1.6 0.2] [4.8 3. 1.4 0.1] [4.3 3. 1.1 0.1] [5.8 4. 1.2 0.2] [5.7 4.4 1.5 0.4] [5.4 3.9 1.3 0.4] [5.1 3.5 1.4 0.3] [5.7 3.8 1.7 0.3] [5.1 3.8 1.5 0.3] [5.4 3.4 1.7 0.2] [5.1 3.7 1.5 0.4] [4.6 3.6 1. 0.2] [5.1 3.3 1.7 0.5] [4.8 3.4 1.9 0.2] [5. 3. 1.6 0.2] [5. 3.4 1.6 0.4] [5.2 3.5 1.5 0.2] [5.2 3.4 1.4 0.2] [4.7 3.2 1.6 0.2] [4.8 3.1 1.6 0.2] [5.4 3.4 1.5 0.4] [5.2 4.1 1.5 0.1] [5.5 4.2 1.4 0.2] [4.9 3.1 1.5 0.2] [5. 3.2 1.2 0.2] [5.5 3.5 1.3 0.2] [4.9 3.6 1.4 0.1] [4.4 3. 1.3 0.2] [5.1 3.4 1.5 0.2] [5. 3.5 1.3 0.3] [4.5 2.3 1.3 0.3] [4.4 3.2 1.3 0.2] [5. 3.5 1.6 0.6] [5.1 3.8 1.9 0.4] [4.8 3. 1.4 0.3] [5.1 3.8 1.6 0.2] [4.6 3.2 1.4 0.2] [5.3 3.7 1.5 0.2] [5. 3.3 1.4 0.2] [7. 3.2 4.7 1.4] [6.4 3.2 4.5 1.5] [6.9 3.1 4.9 1.5] [5.5 2.3 4. 1.3] [6.5 2.8 4.6 1.5] [5.7 2.8 4.5 1.3] [6.3 3.3 4.7 1.6] [4.9 2.4 3.3 1. ] [6.6 2.9 4.6 1.3] [5.2 2.7 3.9 1.4] [5. 2. 3.5 1. ] [5.9 3. 4.2 1.5] [6. 2.2 4. 1. ] [6.1 2.9 4.7 1.4] [5.6 2.9 3.6 1.3] [6.7 3.1 4.4 1.4] [5.6 3. 4.5 1.5] [5.8 2.7 4.1 1. ] [6.2 2.2 4.5 1.5] [5.6 2.5 3.9 1.1] [5.9 3.2 4.8 1.8] [6.1 2.8 4. 1.3] [6.3 2.5 4.9 1.5] [6.1 2.8 4.7 1.2] [6.4 2.9 4.3 1.3] [6.6 3. 4.4 1.4] [6.8 2.8 4.8 1.4] [6.7 3. 5. 1.7] [6. 2.9 4.5 1.5] [5.7 2.6 3.5 1. ] [5.5 2.4 3.8 1.1] [5.5 2.4 3.7 1. ] [5.8 2.7 3.9 1.2] [6. 2.7 5.1 1.6] [5.4 3. 4.5 1.5] [6. 3.4 4.5 1.6] [6.7 3.1 4.7 1.5] [6.3 2.3 4.4 1.3] [5.6 3. 4.1 1.3] [5.5 2.5 4. 1.3] [5.5 2.6 4.4 1.2] [6.1 3. 4.6 1.4] [5.8 2.6 4. 1.2] [5. 2.3 3.3 1. ] [5.6 2.7 4.2 1.3] [5.7 3. 4.2 1.2] [5.7 2.9 4.2 1.3] [6.2 2.9 4.3 1.3] [5.1 2.5 3. 1.1] [5.7 2.8 4.1 1.3] [6.3 3.3 6. 2.5] [5.8 2.7 5.1 1.9] [7.1 3. 5.9 2.1] [6.3 2.9 5.6 1.8] [6.5 3. 5.8 2.2] [7.6 3. 6.6 2.1] [4.9 2.5 4.5 1.7] [7.3 2.9 6.3 1.8] [6.7 2.5 5.8 1.8] [7.2 3.6 6.1 2.5] [6.5 3.2 5.1 2. ] [6.4 2.7 5.3 1.9] [6.8 3. 5.5 2.1] [5.7 2.5 5. 2. ] [5.8 2.8 5.1 2.4] [6.4 3.2 5.3 2.3] [6.5 3. 5.5 1.8] [7.7 3.8 6.7 2.2] [7.7 2.6 6.9 2.3] [6. 2.2 5. 1.5] [6.9 3.2 5.7 2.3] [5.6 2.8 4.9 2. ] [7.7 2.8 6.7 2. ] [6.3 2.7 4.9 1.8] [6.7 3.3 5.7 2.1] [7.2 3.2 6. 1.8] [6.2 2.8 4.8 1.8] [6.1 3. 4.9 1.8] [6.4 2.8 5.6 2.1] [7.2 3. 5.8 1.6] [7.4 2.8 6.1 1.9] [7.9 3.8 6.4 2. ] [6.4 2.8 5.6 2.2] [6.3 2.8 5.1 1.5] [6.1 2.6 5.6 1.4] [7.7 3. 6.1 2.3] [6.3 3.4 5.6 2.4] [6.4 3.1 5.5 1.8] [6. 3. 4.8 1.8] [6.9 3.1 5.4 2.1] [6.7 3.1 5.6 2.4] [6.9 3.1 5.1 2.3] [5.8 2.7 5.1 1.9] [6.8 3.2 5.9 2.3] [6.7 3.3 5.7 2.5] [6.7 3. 5.2 2.3] [6.3 2.5 5. 1.9] [6.5 3. 5.2 2. ] [6.2 3.4 5.4 2.3] [5.9 3. 5.1 1.8]] iris.target: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] data_cor: [[1.4 0.2] [1.4 0.2] [1.3 0.2] [1.5 0.2] [1.4 0.2] [1.7 0.4] [1.4 0.3] [1.5 0.2] [1.4 0.2] [1.5 0.1] [1.5 0.2] [1.6 0.2] [1.4 0.1] [1.1 0.1] [1.2 0.2] [1.5 0.4] [1.3 0.4] [1.4 0.3] [1.7 0.3] [1.5 0.3] [1.7 0.2] [1.5 0.4] [1. 0.2] [1.7 0.5] [1.9 0.2] [1.6 0.2] [1.6 0.4] [1.5 0.2] [1.4 0.2] [1.6 0.2] [1.6 0.2] [1.5 0.4] [1.5 0.1] [1.4 0.2] [1.5 0.2] [1.2 0.2] [1.3 0.2] [1.4 0.1] [1.3 0.2] [1.5 0.2] [1.3 0.3] [1.3 0.3] [1.3 0.2] [1.6 0.6] [1.9 0.4] [1.4 0.3] [1.6 0.2] [1.4 0.2] [1.5 0.2] [1.4 0.2] [4.7 1.4] [4.5 1.5] [4.9 1.5] [4. 1.3] [4.6 1.5] [4.5 1.3] [4.7 1.6] [3.3 1. ] [4.6 1.3] [3.9 1.4] [3.5 1. ] [4.2 1.5] [4. 1. ] [4.7 1.4] [3.6 1.3] [4.4 1.4] [4.5 1.5] [4.1 1. ] [4.5 1.5] [3.9 1.1] [4.8 1.8] [4. 1.3] [4.9 1.5] [4.7 1.2] [4.3 1.3] [4.4 1.4] [4.8 1.4] [5. 1.7] [4.5 1.5] [3.5 1. ] [3.8 1.1] [3.7 1. ] [3.9 1.2] [5.1 1.6] [4.5 1.5] [4.5 1.6] [4.7 1.5] [4.4 1.3] [4.1 1.3] [4. 1.3] [4.4 1.2] [4.6 1.4] [4. 1.2] [3.3 1. ] [4.2 1.3] [4.2 1.2] [4.2 1.3] [4.3 1.3] [3. 1.1] [4.1 1.3] [6. 2.5] [5.1 1.9] [5.9 2.1] [5.6 1.8] [5.8 2.2] [6.6 2.1] [4.5 1.7] [6.3 1.8] [5.8 1.8] [6.1 2.5] [5.1 2. ] [5.3 1.9] [5.5 2.1] [5. 2. ] [5.1 2.4] [5.3 2.3] [5.5 1.8] [6.7 2.2] [6.9 2.3] [5. 1.5] [5.7 2.3] [4.9 2. ] [6.7 2. ] [4.9 1.8] [5.7 2.1] [6. 1.8] [4.8 1.8] [4.9 1.8] [5.6 2.1] [5.8 1.6] [6.1 1.9] [6.4 2. ] [5.6 2.2] [5.1 1.5] [5.6 1.4] [6.1 2.3] [5.6 2.4] [5.5 1.8] [4.8 1.8] [5.4 2.1] [5.6 2.4] [5.1 2.3] [5.1 1.9] [5.9 2.3] [5.7 2.5] [5.2 2.3] [5. 1.9] [5.2 2. ] [5.4 2.3] [5.1 1.8]] data_chi2: [[1.4 0.2] [1.4 0.2] [1.3 0.2] [1.5 0.2] [1.4 0.2] [1.7 0.4] [1.4 0.3] [1.5 0.2] [1.4 0.2] [1.5 0.1] [1.5 0.2] [1.6 0.2] [1.4 0.1] [1.1 0.1] [1.2 0.2] [1.5 0.4] [1.3 0.4] [1.4 0.3] [1.7 0.3] [1.5 0.3] [1.7 0.2] [1.5 0.4] [1. 0.2] [1.7 0.5] [1.9 0.2] [1.6 0.2] [1.6 0.4] [1.5 0.2] [1.4 0.2] [1.6 0.2] [1.6 0.2] [1.5 0.4] [1.5 0.1] [1.4 0.2] [1.5 0.2] [1.2 0.2] [1.3 0.2] [1.4 0.1] [1.3 0.2] [1.5 0.2] [1.3 0.3] [1.3 0.3] [1.3 0.2] [1.6 0.6] [1.9 0.4] [1.4 0.3] [1.6 0.2] [1.4 0.2] [1.5 0.2] [1.4 0.2] [4.7 1.4] [4.5 1.5] [4.9 1.5] [4. 1.3] [4.6 1.5] [4.5 1.3] [4.7 1.6] [3.3 1. ] [4.6 1.3] [3.9 1.4] [3.5 1. ] [4.2 1.5] [4. 1. ] [4.7 1.4] [3.6 1.3] [4.4 1.4] [4.5 1.5] [4.1 1. ] [4.5 1.5] [3.9 1.1] [4.8 1.8] [4. 1.3] [4.9 1.5] [4.7 1.2] [4.3 1.3] [4.4 1.4] [4.8 1.4] [5. 1.7] [4.5 1.5] [3.5 1. ] [3.8 1.1] [3.7 1. ] [3.9 1.2] [5.1 1.6] [4.5 1.5] [4.5 1.6] [4.7 1.5] [4.4 1.3] [4.1 1.3] [4. 1.3] [4.4 1.2] [4.6 1.4] [4. 1.2] [3.3 1. ] [4.2 1.3] [4.2 1.2] [4.2 1.3] [4.3 1.3] [3. 1.1] [4.1 1.3] [6. 2.5] [5.1 1.9] [5.9 2.1] [5.6 1.8] [5.8 2.2] [6.6 2.1] [4.5 1.7] [6.3 1.8] [5.8 1.8] [6.1 2.5] [5.1 2. ] [5.3 1.9] [5.5 2.1] [5. 2. ] [5.1 2.4] [5.3 2.3] [5.5 1.8] [6.7 2.2] [6.9 2.3] [5. 1.5] [5.7 2.3] [4.9 2. ] [6.7 2. ] [4.9 1.8] [5.7 2.1] [6. 1.8] [4.8 1.8] [4.9 1.8] [5.6 2.1] [5.8 1.6] [6.1 1.9] [6.4 2. ] [5.6 2.2] [5.1 1.5] [5.6 1.4] [6.1 2.3] [5.6 2.4] [5.5 1.8] [4.8 1.8] [5.4 2.1] [5.6 2.4] [5.1 2.3] [5.1 1.9] [5.9 2.3] [5.7 2.5] [5.2 2.3] [5. 1.9] [5.2 2. ] [5.4 2.3] [5.1 1.8]] data_mic: [[1.4 0.2] [1.4 0.2] [1.3 0.2] [1.5 0.2] [1.4 0.2] [1.7 0.4] [1.4 0.3] [1.5 0.2] [1.4 0.2] [1.5 0.1] [1.5 0.2] [1.6 0.2] [1.4 0.1] [1.1 0.1] [1.2 0.2] [1.5 0.4] [1.3 0.4] [1.4 0.3] [1.7 0.3] [1.5 0.3] [1.7 0.2] [1.5 0.4] [1. 0.2] [1.7 0.5] [1.9 0.2] [1.6 0.2] [1.6 0.4] [1.5 0.2] [1.4 0.2] [1.6 0.2] [1.6 0.2] [1.5 0.4] [1.5 0.1] [1.4 0.2] [1.5 0.2] [1.2 0.2] [1.3 0.2] [1.4 0.1] [1.3 0.2] [1.5 0.2] [1.3 0.3] [1.3 0.3] [1.3 0.2] [1.6 0.6] [1.9 0.4] [1.4 0.3] [1.6 0.2] [1.4 0.2] [1.5 0.2] [1.4 0.2] [4.7 1.4] [4.5 1.5] [4.9 1.5] [4. 1.3] [4.6 1.5] [4.5 1.3] [4.7 1.6] [3.3 1. ] [4.6 1.3] [3.9 1.4] [3.5 1. ] [4.2 1.5] [4. 1. ] [4.7 1.4] [3.6 1.3] [4.4 1.4] [4.5 1.5] [4.1 1. ] [4.5 1.5] [3.9 1.1] [4.8 1.8] [4. 1.3] [4.9 1.5] [4.7 1.2] [4.3 1.3] [4.4 1.4] [4.8 1.4] [5. 1.7] [4.5 1.5] [3.5 1. ] [3.8 1.1] [3.7 1. ] [3.9 1.2] [5.1 1.6] [4.5 1.5] [4.5 1.6] [4.7 1.5] [4.4 1.3] [4.1 1.3] [4. 1.3] [4.4 1.2] [4.6 1.4] [4. 1.2] [3.3 1. ] [4.2 1.3] [4.2 1.2] [4.2 1.3] [4.3 1.3] [3. 1.1] [4.1 1.3] [6. 2.5] [5.1 1.9] [5.9 2.1] [5.6 1.8] [5.8 2.2] [6.6 2.1] [4.5 1.7] [6.3 1.8] [5.8 1.8] [6.1 2.5] [5.1 2. ] [5.3 1.9] [5.5 2.1] [5. 2. ] [5.1 2.4] [5.3 2.3] [5.5 1.8] [6.7 2.2] [6.9 2.3] [5. 1.5] [5.7 2.3] [4.9 2. ] [6.7 2. ] [4.9 1.8] [5.7 2.1] [6. 1.8] [4.8 1.8] [4.9 1.8] [5.6 2.1] [5.8 1.6] [6.1 1.9] [6.4 2. ] [5.6 2.2] [5.1 1.5] [5.6 1.4] [6.1 2.3] [5.6 2.4] [5.5 1.8] [4.8 1.8] [5.4 2.1] [5.6 2.4] [5.1 2.3] [5.1 1.9] [5.9 2.3] [5.7 2.5] [5.2 2.3] [5. 1.9] [5.2 2. ] [5.4 2.3] [5.1 1.8]]

Process finished with exit code 0



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3