机器学习之多元线性回归问题综合应用示例:简单案例+解决红酒质量的判断问题 |
您所在的位置:网站首页 › 多元线性回归应用场景有哪些方法 › 机器学习之多元线性回归问题综合应用示例:简单案例+解决红酒质量的判断问题 |
补博客持续更新中🤟🤟🤟🤟🤟不断更不断更🙌🙌🙌
下面举一个简单的例子哈🍯🍯🍯 import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures X_train = [[6], [8], [10], [14], [18]] y_train = [[7], [9], [13], [17.5], [18]] X_test = [[6], [8], [11], [16]] y_test = [[8], [12], [15], [18]] regressor = LinearRegression() ##没有任何处理,默认一元线性模型 regressor.fit(X_train, y_train) ##这里可以看到我们是直接fit的,没有做任何处理 xx = np.linspace(0, 26, 100) ##np.linspace()在这里指的是从0开始,到26结束,这段区间内,取100个样本数量 yy = regressor.predict(xx.reshape(xx.shape[0], 1)) plt.plot(xx, yy) ##得到类似y=x这种结果的图像 quadratic_featurizer = PolynomialFeatures(degree=2) ##用加强版 X_train_quadratic = quadratic_featurizer.fit_transform(X_train) X_test_quadratic = quadratic_featurizer.transform(X_test) ##这两步很重要!!X_train当然是要fit_transform的,但是别忘了X_test也要transform!!! regressor_quadratic = LinearRegression() regressor_quadratic.fit(X_train_quadratic, y_train) xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1)) ##格式问题啦 plt.plot(xx, regressor_quadratic.predict(xx_quadratic), c='r', linestyle='--') plt.title('Pizza price regressed on diameter') plt.xlabel('Diameter in inches') plt.ylabel('Price in dollars') plt.axis([0, 25, 0, 25]) plt.grid(True) plt.scatter(X_train, y_train) ##散点图,也就是下面的蓝色点 plt.show() print(X_train) print(X_train_quadratic) print(X_test) print(X_test_quadratic) print('Simple linear regression r-squared', regressor.score(X_test, y_test)) print('Quadratic regression r-squared', regressor_quadratic.score(X_test_quadratic, y_test))
这个数据集呢是我老师上课的时候的,所以我也没有,但是主要是学习这个过程,然后应用到自己已有的数据集上就行了🥪 本项目数据集的地址: /data/shixunfiles/ddca0fd2b1025671866a9344eca4dac7_1633960156078.csv 读入数据 ##魔法命令 %matplotlib inline import numpy as np import pandas as pd # 读入数据 filename = '/data/shixunfiles/ddca0fd2b1025671866a9344eca4dac7_1633960156078.csv' df = pd.read_csv(filename)##不需要加sep=";" df.describe()
output: [‘固定酸’, ‘挥发性酸’, ‘柠檬酸’, ‘糖分’, ‘氯化物’, ‘游离二氧化硫’, ‘总二氧化硫’, ‘密度’, ‘pH值’, ‘硫酸盐’, ‘酒精’, ‘质量等级’] ❗❗❗❗❗在这里要区分一个非常重要的东西,也是一个很重要的点就是对于拿到像类似上述那样的数据集的时候,多个变量的,我们拿到的X和y指的到底是什么!!具体看下 # ========= begin ======== X = df[list(df.columns)[:-1]] y = df[list(df.columns)[-1]] print(X) print(y)
既然是选择简单的模型,那当然是选择——简单线性模型啦,得到score from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2) model = LinearRegression() model.fit(X_train,y_train) score = model.score(X_test,y_test) print(score)
最后一步了,那就是如何找到合适的模型和预处理方式,这一步来说的话呢,就是自己去找到合适的模型和预处理方式: 提示: 预处理可选 PolynomialFeatures 和/或 StandardScaler; 模型可选择 LinearRegression、Ridge、Lasso 或 SGDRegressor 等; 建议用管道方便处理; 用 cross_val_score 结合 ShuffleSplit 进行交叉验证。 这一部分就是自己玩儿啦,提供一下我的测试代码,不全,中间一直在删减,只作为参考 # ========= begin ======== ##导入好要用的库文件 from sklearn.linear_model import LinearRegression from sklearn.linear_model import Lasso from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.model_selection import cross_val_score from sklearn.model_selection import ShuffleSplit from sklearn.model_selection import train_test_split from sklearn.linear_model import SGDRegressor import numpy as np ##预处理函数 得到模型 def polynomial_model(degree=1): polynomial_features = PolynomialFeatures(degree=degree) ridge = Ridge() pipeline = Pipeline([("polynomial_features",polynomial_features),(" ridge", ridge)]) return pipeline ## 先来个简单的 ##交叉验证----随机打乱10次 8/2开 X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2) model = polynomial_model(degree=2) scores = cross_val_score(model,X_train,y_train,cv =5 ) ##test_size=0.8 print(scores) scores_mean = np.mean(scores) scores_std = np.std(scores) print(scores_mean) print(scores_std) # for i in range(1,5): # pip = polynomial_model(i) # pip.fit(X_train,y_train) # train_score = pip.score(X_train, y_train) # cv_score = pip.score(X_test, y_test) # print('train_score: {0:0.6f}; cv_score: {1:.6f}'.format(train_score, cv_score)) # print("********************************") |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |