基于python爬取全国2822所高校在各省,近三年的录取分数线

您所在的位置:网站首页 高考成绩p图教程 基于python爬取全国2822所高校在各省,近三年的录取分数线

基于python爬取全国2822所高校在各省,近三年的录取分数线

2024-01-22 08:26| 来源: 网络整理| 查看: 265

数据更新:爬取的2022、2021、2020三年的数据如下 链接:https://pan.baidu.com/s/1UrYmrE5chYuJ6VeJCLbdzA 提取码:ozu5

最近全国高考结束,考生都在等分当中,鉴于自己之前一直有个想法,爬取各高校的信息,方便考生选择,因此完成了一下代码,爬取了全国2822所高校,包括本科和高职院校,在各省的分数线。

下图是各高校在湖北省的,经过高校软科排名排序后的近3年录取分数情况: 各大学最近三年 完整的数据下载地址: 链接:https://pan.baidu.com/s/1uohDZQk2SPSjI0htZBJd1g 提取码:z1db

在这里插入图片描述

数据中分数栏,空白部分,说明该学校在该省不招生。

部分代码如下,未优化…(代码已更新)

from ast import Str from time import sleep import requests import json import csv import time import random from sqlalchemy import null def save_data(s,data): with open('D:/PYTHON_CODE/高校分数线/'+s+'.csv', encoding='UTF-8', mode='a+',newline='') as f: f_csv = csv.writer(f) f_csv.writerow(data) f.close() headers_list = [ { 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 10; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.109 Safari/537.36 CrKey/1.54.248666' }, { 'user-agent': 'Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.188 Safari/537.36 CrKey/1.54.250320' }, { 'user-agent': 'Mozilla/5.0 (BB10; Touch) AppleWebKit/537.10+ (KHTML, like Gecko) Version/10.0.9.2372 Mobile Safari/537.10+' }, { 'user-agent': 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+' }, { 'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.3; en-us; SM-N900T Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30' }, { 'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.1; en-us; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30' }, { 'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.0; en-us; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 7.0; SM-G950U Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G965U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.1.0; SM-T837A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.80 Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; U; en-us; KFAPWI Build/JDQ39) AppleWebKit/535.19 (KHTML, like Gecko) Silk/3.13 Safari/535.19 Silk-Accelerated=true' }, { 'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; LGMS323 Build/KOT49I.MS32310c) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 550) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Mobile Safari/537.36 Edge/14.14263' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 10 Build/MOB31T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 5X Build/OPR4.170623.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 6P Build/OPP3.170518.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)' }, { 'user-agent': 'Mozilla/5.0 (MeeGo; NokiaN9) AppleWebKit/534.13 (KHTML, like Gecko) NokiaBrowser/8.5.0 Mobile Safari/534.13' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 9; Pixel 3 Build/PQ1A.181105.017.A1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.158 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 10; Pixel 4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 11; Pixel 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.181 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36' }, { 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1' }, { 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1' }, { 'user-agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1' } ] headers = random.choice(headers_list) def get_url(url): try: response = requests.get(url, headers=headers, timeout=1) # 超时设置为10秒 except: for i in range(4): # 循环去请求网站 response = requests.get(url, headers=headers, timeout=20) if response.status_code == 200: break html_str = response.text return html_str print("#########" " 版权所有:殷宗敏 & 数据接口来源-https://www.gaokao.cn/school/search & 在此表示感谢!" "##########") url = 'https://static-data.gaokao.cn/www/2.0/school/name.json' html = requests.get(url).text unicodestr=json.loads(html) #将string转化为dict dat = unicodestr["data"] province_id=[{"name":11,"value":"北京"},{"name":12,"value":"天津"},{"name":13,"value":"河北"},{"name":14,"value":"山西"},{"name":15,"value":"内蒙古"},{"name":21,"value":"辽宁"},{"name":22,"value":"吉林"},{"name":23,"value":"黑龙江"},{"name":31,"value":"上海"},{"name":32,"value":"江苏"},{"name":33,"value":"浙江"},{"name":34,"value":"安徽"},{"name":35,"value":"福建"},{"name":36,"value":"江西"},{"name":37,"value":"山东"},{"name":41,"value":"河南"},{"name":42,"value":"湖北"},{"name":43,"value":"湖南"},{"name":44,"value":"广东"},{"name":45,"value":"广西"},{"name":46,"value":"海南"},{"name":50,"value":"重庆"},{"name":51,"value":"四川"},{"name":52,"value":"贵州"},{"name":53,"value":"云南"},{"name":54,"value":"西藏"},{"name":61,"value":"陕西"},{"name":62,"value":"甘肃"},{"name":63,"value":"青海"},{"name":64,"value":"宁夏"},{"name":65,"value":"新疆"}] for l in province_id: header = ['名称', '省', '市', '区', '地址','介绍' ,'985','211','软科排名','学校类型','学校属性','特色专业',"2022分数线","2021分数线","2020分数线"] with open('D:/PYTHON_CODE/高校分数线/'+l["value"]+'.csv', encoding='utf-8-sig', mode='w',newline='') as f: f_csv = csv.writer(f) f_csv.writerow(header) #f.close() for i in dat: schoolid = i['school_id'] schoolname = i['name'] url1 = 'https://static-data.gaokao.cn/www/2.0/school/'+schoolid+'/info.json' print("正在下载"+schoolname) html1 = get_url(url1) unicodestr1=json.loads(html1) #将string转化为dict if len(unicodestr1) !=0: dat1 = unicodestr1["data"] name = dat1["name"] content = dat1["content"] f985 = dat1["f985"] if f985 =="1": f985 = "是" else: f985 = "否" f211 = dat1["f211"] if f211 =="1": f211 = "是" else: f211 = "否" ruanke_rank = dat1["ruanke_rank"] if ruanke_rank=='0': ruanke_rank ='' type_name= dat1["type_name"] school_nature_name = dat1["school_nature_name"] province_name = dat1["province_name"] city_name = dat1["city_name"] town_name = dat1["town_name"] address = dat1["address"] special =[] for j in dat1["special"]: special.append(j["special_name"]) pro_type_min=dat1["pro_type_min"] fen2021='' fen2020='' fen2022='' for k in pro_type_min.keys(): # print(k) # print(l["name"]) if int(k) == l["name"]: print(pro_type_min[k]) for m in pro_type_min[k]: if m['year'] == 2022: s = ' ' for j in m['type'].keys(): if j == '2073': s = s+'物理类:'+m['type'][j] +' ' if j == '2074': s = s+'历史类:'+m['type'][j] +' ' if j == '1': s = s+'理科:'+m['type'][j] +' ' if j == '2': s = s+'文科:'+m['type'][j] +' ' if j == '3': s = s+'综合类:'+m['type'][j] +' ' fen2022 = s elif m['year'] == 2021: s = ' ' for j in m['type'].keys(): if j == '2073': s = s+'物理类:'+m['type'][j] +' ' if j == '2074': s = s+'历史类:'+m['type'][j] +' ' if j == '1': s = s+'理科:'+m['type'][j] +' ' if j == '2': s = s+'文科:'+m['type'][j] +' ' if j == '3': s = s+'综合类:'+m['type'][j] +' ' fen2021 = s else: s = ' ' for j in m['type'].keys(): if j == '2073': s = s+'物理类:'+m['type'][j] +' ' if j == '2074': s = s+'历史类:'+m['type'][j] +' ' if j == '1': s = s+'理科:'+m['type'][j] +' ' if j == '2': s = s+'文科:'+m['type'][j] +' ' if j == '3': s = s+'综合类:'+m['type'][j] +' ' fen2020 = s tap = (name,province_name,city_name,town_name,address,content,f985,f211,ruanke_rank,type_name,school_nature_name,special,fen2022,fen2021,fen2020) save_data(l["value"],tap)


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3