42 正则化与惩罚回归

您所在的位置:网站首页 R语言predict函数 42 正则化与惩罚回归

42 正则化与惩罚回归

2024-07-01 22:40| 来源: 网络整理| 查看: 265

42 正则化与惩罚回归 42.1 介绍

这一章用Hitters数据集演示线性回归、回归自变量选择, 岭回归、lasso回归, 以及如何进行超参数调优。

考虑ISLR包的Hitters数据集。 此数据集有322个运动员的20个变量的数据, 其中的变量Salary(工资)是我们关心的。 变量包括:

library(tidyverse) library(ISLR) # 参考书对应的包 data(Hitters) names(Hitters) ## [1] "AtBat" "Hits" "HmRun" "Runs" "RBI" "Walks" "Years" "CAtBat" "CHits" "CHmRun" "CRuns" "CRBI" "CWalks" "League" "Division" "PutOuts" "Assists" "Errors" "Salary" "NewLeague"

数据集的详细变量信息如下:

glimpse(Hitters) ## Rows: 322 ## Columns: 20 ## $ AtBat 293, 315, 479, 496, 321, 594, 185, 298, 323, 401, 574, 202, 418, 239, 196, 183, 568, 190, 407, 127, 413, 426, 22, 472, 629, 587, 324, 474, 550, 513, 313, 419, 517, 583, 204, 379, 161, 268, 346, 241, 181, 216, 200, 217, 194, 254, 416, 205, 542, 526, 457, 214, 19, 591, 403, 405, 244, 235, 313, 627, 416, 155, 236, 216, 24, 585, 191, 199, 521, 419, 311, 138, 512, 507, 529, 424, 351, 195, 388, 339, 561, 255, 677, 227, 614, 329, 637, 280, 155, 458, 314, 475, 317, 511, 278, 382, 565… ## $ Hits 66, 81, 130, 141, 87, 169, 37, 73, 81, 92, 159, 53, 113, 60, 43, 39, 158, 46, 104, 32, 92, 109, 10, 116, 168, 163, 73, 129, 152, 137, 84, 108, 141, 168, 49, 106, 36, 60, 98, 61, 41, 54, 57, 46, 40, 68, 132, 57, 140, 146, 101, 53, 7, 168, 101, 102, 58, 61, 78, 177, 113, 44, 56, 53, 3, 139, 37, 53, 142, 113, 81, 31, 131, 122, 137, 119, 97, 55, 103, 96, 118, 70, 238, 46, 163, 83, 174, 82, 41, 114, 83, 123, 78, 138, 69, 119, 148, 71, 115, 110, 151, 132, 49, 106, 114, 37, 95, 154,… ## $ HmRun 1, 7, 18, 20, 10, 4, 1, 0, 6, 17, 21, 4, 13, 0, 7, 3, 20, 2, 6, 8, 16, 3, 1, 16, 18, 4, 4, 10, 6, 20, 9, 6, 27, 17, 6, 10, 0, 5, 5, 1, 1, 0, 6, 7, 7, 2, 7, 8, 12, 13, 14, 2, 0, 19, 12, 18, 9, 3, 6, 25, 24, 6, 0, 1, 0, 31, 4, 5, 20, 1, 3, 8, 26, 29, 26, 6, 4, 5, 15, 4, 35, 7, 31, 7, 29, 9, 31, 16, 12, 13, 13, 27, 7, 25, 3, 13, 24, 2, 27, 15, 17, 9, 2, 16, 23, 8, 23, 22, 31, 4, 16, 16, 24, 31, 14, 34, 12, 14, 4, 3, 21, 16, 5, 11, 2, 16, 13, 5, 15, 21, 14, 10, 7, 1, 5, 4, 40, 6,… ## $ Runs 30, 24, 66, 65, 39, 74, 23, 24, 26, 49, 107, 31, 48, 30, 29, 20, 89, 24, 57, 16, 72, 55, 4, 60, 73, 92, 32, 50, 92, 90, 42, 55, 70, 83, 23, 38, 19, 24, 31, 34, 15, 21, 23, 32, 19, 28, 57, 34, 46, 71, 42, 30, 1, 80, 45, 49, 28, 24, 32, 98, 58, 21, 27, 31, 1, 93, 12, 29, 67, 44, 42, 18, 69, 78, 86, 57, 55, 24, 59, 37, 70, 49, 117, 23, 89, 50, 89, 44, 21, 67, 39, 76, 35, 76, 24, 54, 90, 27, 97, 70, 61, 69, 41, 48, 67, 15, 55, 76, 101, 19, 70, 33, 81, 91, 30, 91, 63, 45, 42, 30, … ## $ RBI 29, 38, 72, 78, 42, 51, 8, 24, 32, 66, 75, 26, 61, 11, 27, 15, 75, 8, 43, 22, 48, 43, 2, 62, 102, 51, 18, 56, 37, 95, 30, 36, 87, 80, 25, 60, 10, 25, 53, 12, 21, 18, 14, 19, 29, 26, 49, 32, 75, 70, 63, 29, 2, 72, 53, 85, 25, 39, 41, 81, 69, 23, 15, 15, 0, 94, 17, 22, 86, 27, 30, 21, 96, 85, 97, 46, 29, 33, 47, 29, 94, 35, 113, 20, 83, 39, 116, 45, 29, 57, 46, 93, 35, 96, 21, 58, 104, 29, 71, 47, 84, 47, 23, 56, 67, 19, 58, 84, 108, 18, 73, 52, 105, 101, 42, 108, 54, 47, 36, 4… ## $ Walks 14, 39, 76, 37, 30, 35, 21, 7, 8, 65, 59, 27, 47, 22, 30, 11, 73, 15, 65, 14, 65, 62, 1, 74, 40, 70, 22, 40, 81, 90, 39, 22, 52, 56, 12, 30, 17, 15, 30, 14, 33, 15, 14, 9, 30, 22, 33, 9, 41, 84, 22, 23, 1, 39, 39, 20, 35, 21, 12, 70, 16, 15, 11, 22, 2, 62, 14, 21, 45, 44, 26, 38, 52, 91, 97, 13, 39, 30, 39, 23, 33, 43, 53, 12, 75, 56, 56, 47, 22, 48, 16, 72, 32, 61, 29, 36, 77, 14, 68, 36, 78, 54, 18, 35, 53, 15, 37, 43, 41, 11, 80, 37, 62, 64, 24, 52, 30, 26, 66, 20, 60, 41,… ## $ Years 1, 14, 3, 11, 2, 11, 2, 3, 2, 13, 10, 9, 4, 6, 13, 3, 15, 5, 12, 8, 1, 1, 6, 6, 18, 6, 7, 10, 5, 14, 17, 3, 9, 5, 7, 14, 4, 2, 16, 1, 2, 18, 9, 4, 11, 6, 3, 5, 16, 6, 17, 2, 4, 9, 12, 6, 4, 14, 12, 6, 1, 16, 4, 4, 3, 17, 4, 3, 4, 12, 17, 3, 14, 18, 15, 9, 4, 8, 6, 4, 16, 15, 5, 5, 11, 9, 14, 2, 16, 4, 5, 4, 1, 3, 8, 12, 14, 15, 3, 7, 10, 2, 8, 10, 13, 6, 3, 14, 5, 1, 14, 5, 13, 3, 18, 6, 4, 16, 9, 8, 15, 20, 5, 5, 11, 13, 5, 8, 5, 7, 7, 5, 18, 4, 9, 3, 6, 15, 5, 2, 2, 4, 12, … ## $ CAtBat 293, 3449, 1624, 5628, 396, 4408, 214, 509, 341, 5206, 4631, 1876, 1512, 1941, 3231, 201, 8068, 479, 5233, 727, 413, 426, 84, 1924, 8424, 2695, 1931, 2331, 2308, 5201, 6890, 591, 3571, 1646, 1309, 6207, 1053, 350, 5913, 241, 232, 7318, 2516, 694, 4183, 999, 932, 756, 7099, 2648, 6521, 226, 41, 4478, 5150, 950, 1335, 3926, 3742, 3210, 416, 6631, 1115, 926, 159, 7546, 773, 514, 815, 4484, 8247, 244, 5347, 7761, 6661, 3651, 1258, 1313, 2174, 1064, 6677, 6311, 2223, 1325, 5017, 3… ## $ CHits 66, 835, 457, 1575, 101, 1133, 42, 108, 86, 1332, 1300, 467, 392, 510, 825, 42, 2273, 102, 1478, 180, 92, 109, 26, 489, 2464, 747, 491, 604, 633, 1382, 1833, 149, 994, 452, 308, 1906, 244, 78, 1615, 61, 50, 1926, 684, 160, 1069, 236, 273, 192, 2130, 715, 1767, 59, 13, 1307, 1429, 231, 333, 1029, 968, 927, 113, 1634, 270, 210, 28, 1982, 163, 120, 205, 1231, 2198, 53, 1397, 1947, 1785, 1046, 353, 338, 555, 290, 1575, 1661, 737, 324, 1388, 948, 2024, 113, 1338, 298, 405, 471, 78… ## $ CHmRun 1, 69, 63, 225, 12, 19, 1, 0, 6, 253, 90, 15, 41, 4, 36, 3, 177, 5, 100, 24, 16, 3, 2, 67, 164, 17, 13, 61, 32, 166, 224, 8, 215, 44, 27, 146, 3, 5, 235, 1, 4, 46, 46, 32, 64, 21, 24, 32, 235, 77, 281, 2, 1, 113, 166, 29, 49, 35, 35, 133, 24, 98, 1, 9, 0, 315, 16, 8, 22, 32, 100, 12, 221, 347, 291, 32, 16, 25, 80, 11, 442, 154, 93, 44, 266, 145, 247, 25, 181, 28, 28, 108, 7, 28, 32, 41, 305, 60, 45, 38, 275, 14, 7, 86, 241, 36, 31, 131, 92, 4, 209, 71, 271, 53, 348, 107, 14, … ## $ CRuns 30, 321, 224, 828, 48, 501, 30, 41, 32, 784, 702, 192, 205, 309, 376, 20, 1045, 65, 643, 67, 72, 55, 9, 242, 1008, 442, 291, 246, 349, 763, 1033, 80, 545, 219, 126, 859, 156, 34, 784, 34, 20, 796, 371, 86, 486, 108, 113, 117, 987, 352, 1003, 32, 3, 634, 747, 99, 164, 441, 409, 529, 58, 698, 116, 118, 20, 1141, 61, 57, 99, 612, 950, 33, 712, 1175, 1082, 461, 196, 144, 285, 123, 901, 1019, 349, 156, 813, 575, 978, 61, 746, 160, 156, 292, 35, 87, 258, 287, 1135, 753, 156, 335, 8… ## $ CRBI 29, 414, 266, 838, 46, 336, 9, 37, 34, 890, 504, 186, 204, 103, 290, 16, 993, 23, 658, 82, 48, 43, 9, 251, 1072, 198, 108, 327, 182, 734, 864, 46, 652, 208, 132, 803, 86, 29, 901, 12, 29, 627, 230, 76, 493, 117, 121, 107, 1089, 342, 977, 32, 4, 563, 666, 138, 179, 401, 321, 472, 69, 661, 64, 69, 12, 1179, 74, 40, 103, 344, 909, 32, 815, 1152, 949, 301, 110, 149, 274, 108, 1210, 608, 401, 158, 822, 528, 1093, 70, 805, 123, 159, 343, 35, 110, 192, 294, 1234, 596, 119, 174, 1015… ## $ CWalks 14, 375, 263, 354, 33, 194, 24, 12, 8, 866, 488, 161, 203, 207, 238, 11, 732, 39, 653, 56, 65, 62, 3, 240, 402, 317, 180, 166, 308, 784, 1087, 31, 337, 136, 66, 571, 107, 18, 560, 14, 45, 483, 195, 32, 608, 118, 80, 51, 431, 289, 619, 27, 4, 319, 526, 64, 194, 333, 170, 313, 16, 777, 57, 114, 9, 727, 52, 39, 78, 422, 690, 55, 548, 1380, 989, 112, 117, 153, 186, 55, 608, 820, 171, 67, 617, 635, 495, 63, 875, 122, 76, 267, 32, 71, 162, 227, 791, 259, 99, 258, 709, 90, 106, 248,… ## $ League A, N, A, N, N, A, N, A, N, A, A, N, N, A, N, A, N, A, A, N, N, A, A, N, A, A, N, N, N, A, A, N, N, A, A, N, A, N, A, N, A, N, N, A, A, A, N, A, A, N, A, N, A, A, A, N, N, A, N, A, A, N, A, N, A, A, N, A, A, A, N, N, A, A, A, A, N, N, A, A, A, N, A, A, N, A, N, A, A, A, A, N, A, A, N, N, A, N, N, N, A, A, A, A, A, A, N, A, A, A, A, N, N, N, N, A, A, A, N, A, N, N, A, N, N, A, A, A, N, A, N, N, A, A, N, N, A, A, A, A, A, A, N, N, A, N, A, A, A, A, A, A, N, N, N, A, N, N, A, A, … ## $ Division E, W, W, E, E, W, E, W, W, E, E, W, E, E, E, W, W, W, W, W, E, W, W, W, E, E, E, W, W, W, W, W, W, E, W, W, E, W, E, W, E, W, W, E, E, E, W, E, E, W, W, E, E, W, E, W, W, E, W, E, E, E, W, W, W, E, E, W, E, E, W, E, W, E, E, E, W, E, W, W, W, E, E, W, W, W, W, E, W, W, W, E, E, W, W, W, E, W, W, W, E, E, E, E, E, E, W, W, E, E, W, W, E, W, E, W, W, W, W, E, E, W, W, E, W, W, W, W, E, W, E, E, W, W, W, E, E, E, E, W, W, E, E, W, W, E, E, E, E, E, W, W, W, W, E, W, E, E, W, W, … ## $ PutOuts 446, 632, 880, 200, 805, 282, 76, 121, 143, 0, 238, 304, 211, 121, 80, 118, 105, 102, 912, 202, 280, 361, 812, 518, 1067, 434, 222, 732, 262, 267, 127, 226, 1378, 109, 419, 72, 70, 442, 0, 166, 326, 103, 69, 307, 325, 359, 73, 58, 697, 303, 389, 109, 0, 67, 316, 161, 142, 425, 106, 240, 203, 53, 125, 73, 80, 0, 391, 152, 107, 211, 153, 244, 119, 808, 280, 224, 226, 83, 182, 104, 463, 51, 1377, 92, 303, 276, 278, 148, 165, 246, 533, 226, 45, 157, 142, 59, 292, 360, 274, 292, 1… ## $ Assists 33, 43, 82, 11, 40, 421, 127, 283, 290, 0, 445, 45, 11, 151, 45, 0, 290, 177, 88, 22, 9, 22, 84, 55, 157, 9, 3, 83, 329, 5, 221, 7, 102, 292, 46, 170, 149, 59, 0, 172, 29, 84, 1, 25, 22, 30, 177, 4, 61, 9, 39, 7, 0, 147, 6, 10, 14, 43, 206, 482, 70, 88, 199, 152, 4, 0, 38, 3, 242, 2, 223, 21, 216, 108, 10, 286, 7, 2, 9, 213, 32, 54, 100, 2, 6, 6, 9, 4, 9, 389, 40, 10, 122, 7, 210, 156, 9, 32, 2, 6, 88, 327, 132, 41, 2, 115, 10, 439, 17, 5, 218, 87, 62, 111, 4, 334, 377, 6, 48… ## $ Errors 20, 10, 14, 3, 4, 25, 7, 9, 19, 0, 22, 11, 7, 6, 8, 0, 10, 16, 9, 2, 5, 2, 11, 3, 14, 3, 3, 13, 16, 3, 7, 4, 8, 25, 5, 24, 12, 6, 0, 10, 5, 5, 1, 1, 2, 4, 18, 4, 9, 9, 4, 3, 0, 4, 5, 3, 2, 4, 7, 13, 10, 3, 13, 11, 0, 0, 8, 5, 23, 1, 10, 4, 12, 2, 5, 8, 3, 1, 4, 9, 8, 8, 6, 2, 6, 2, 9, 2, 1, 18, 4, 6, 26, 8, 10, 9, 5, 5, 7, 3, 13, 20, 10, 7, 4, 15, 7, 10, 10, 12, 16, 3, 8, 11, 4, 21, 26, 5, 19, 12, 9, 16, 7, 4, 20, 0, 5, 1, 4, 5, 15, 20, 0, 16, 1, 2, 3, 13, 5, 9, 14, 6, 3, 4, … ## $ Salary NA, 475.000, 480.000, 500.000, 91.500, 750.000, 70.000, 100.000, 75.000, 1100.000, 517.143, 512.500, 550.000, 700.000, 240.000, NA, 775.000, 175.000, NA, 135.000, 100.000, 115.000, NA, 600.000, 776.667, 765.000, 708.333, 750.000, 625.000, 900.000, NA, 110.000, NA, 612.500, 300.000, 850.000, NA, 90.000, NA, NA, 67.500, NA, NA, 180.000, NA, 305.000, 215.000, 247.500, NA, 815.000, 875.000, 70.000, NA, 1200.000, 675.000, 415.000, 340.000, NA, 416.667, 1350.000, 90.000, 275.000, 2… ## $ NewLeague A, N, A, N, N, A, A, A, N, A, A, N, N, A, N, A, N, A, A, N, N, N, A, N, A, A, N, N, N, A, A, N, N, A, A, N, A, N, A, N, A, N, N, A, A, A, N, A, A, N, A, N, A, A, A, N, N, A, N, A, A, N, A, N, A, A, N, A, A, A, N, N, A, A, A, N, A, N, A, A, A, N, A, A, N, A, N, A, A, A, A, N, A, A, N, N, A, N, N, N, A, A, A, A, A, A, N, A, A, A, A, A, N, N, N, A, A, A, N, A, N, N, A, A, N, A, A, A, N, A, N, N, A, A, N, N, A, A, N, N, A, A, N, N, A, N, A, A, A, A, A, A, N, N, N, A, N, N, A, A, …

希望以Salary为因变量,查看其缺失值个数:

sum( is.na(Hitters$Salary) ) ## [1] 59

为简单起见,去掉有缺失值的观测:

da_hit


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3