用python爬取NBA球队的所有比赛记录

您所在的位置:网站首页 nba所有球队名称列表 用python爬取NBA球队的所有比赛记录

用python爬取NBA球队的所有比赛记录

2024-01-18 11:41| 来源: 网络整理| 查看: 265

文章目录 本文摘要1. 首先分析URL2. 获取所有NBA球队的Team_id和中文名3. 爬取所有NBA球队2018-2019赛季的详细比赛记录4. 后记

本文摘要

目标任务:爬取 stat-nba网站上,2018-2019赛季每支NBA球队所有比赛记录 所用技术:python语言、requests库请求数据、lxml库解析数据、xpath匹配数据

1. 首先分析URL

湖人队比赛记录页面url:

http://www.stat-nba.com/query_team.php?crtcol=date_out&order=0&QueryType=game&GameType=season&Team_id=LAL&PageNum=1000&Season0=2018&Season1=2019

勇士队比赛记录页面url:

http://www.stat-nba.com/query_team.php?crtcol=date_out&order=0&QueryType=game&GameType=season&Team_id=GSW&PageNum=1000&Season0=2018&Season1=2019

老鹰队比赛记录页面url:

http://www.stat-nba.com/query_team.php?crtcol=date_out&order=0&QueryType=game&GameType=season&Team_id=ATL&PageNum=1000&Season0=2018&Season1=2019

发现在url中使用不同的Team_id就能跳转到相应的球队。

2. 获取所有NBA球队的Team_id和中文名 在球队页面 http://www.stat-nba.com/teamList.php中找到了 Team_id 和 球队中文名 NBA球队爬取Team_id 和 球队中文名 import requests from lxml import etree def team_name(): # 请求html url = "http://www.stat-nba.com/teamList.php" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0'} res = requests.get(url=url, headers=headers) res.encoding = "utf-8" text = res.text # 解析html parse_html = etree.HTML(text) base_xpath1 = '//td/div[@class="team"]/a/div/text()' base_xpath2 = '//td/div[@class="team"]/a/@href' chinese_name = parse_html.xpath(base_xpath1) english_name = parse_html.xpath(base_xpath2) print(chinese_name) # 存储球队名字 name = {} for e, c in zip(english_name, chinese_name): e = e[-8:-5] name[e] = c with open("teamName.csv", "w", encoding="utf-8") as f: for key, value in name.items(): # python自动解决不同操作系统csv文件的换行符差异 line = key+","+value+"\n" f.write(line) team_name() 3. 爬取所有NBA球队2018-2019赛季的详细比赛记录 import requests from lxml import etree import time def team_record(f, english_name, season0, season1): # 爬取html url = "http://www.stat-nba.com/query_team.php?crtcol=date_out&order=0&QueryType=game&GameType=season\ &Team_id="+english_name+"&PageNum=1000&Season0="+season0+"&Season1="+season1 headers = {'UserAgent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0'} res = requests.get(url=url, headers=headers) res.encoding = "utf-8" html = res.text print(url) print(html) # 解析html parse_html = etree.HTML(html) base_xpath_th = "//th/text()" base_xpath_tr = "//tbody/tr" all_th = parse_html.xpath(base_xpath_th) all_tr = parse_html.xpath(base_xpath_tr) # 把标题栏写入文件 title = ",".join(all_th)+"\n" f.write(title) for tr in all_tr: # td_list = tr.xpath(".//*/text()") td_list = tr.xpath("./td/text() | ./td/a/text()") data = ",".join(td_list)+"\n" # 把该队每场比赛数据写入文件 f.write(data) def main(season_start, season_end): # 打开球队名称文件 team_name = open("teamName.csv", "r", encoding="utf-8") # 打开球队战绩文件 record = open(season_start+"-"+season_end+"teamRecord.csv", "w", encoding="utf-8") # 爬取球队战绩 for name in team_name: name = name.replace("\n", "") english_name = name.split(",")[0] chinese_name = name.split(",")[1] record.write(chinese_name+"队\n") team_record(record, english_name, season_start, season_end) record.write("\n") time.sleep(1) # 关闭球队战绩文件 record.close() # 关闭球队名称文件 team_name.close() if __name__ == "__main__": start = time.time() season_begin = "2018" season_finish = "2019" main(season_begin, season_finish) end = time.time() print("总运行时间:", end-start)

注: 分析不同赛季下,各球队比赛记录页面的url, 发现更改"&Season0=2018&Season1=2019"的取值, 就能切换赛季,故上述代码用变量代替了2018和2019两个字符串。

运行完上述代码,自己的文件夹里会多出两个文件, 点开来看看数据都爬下来了没。 NBA球队Team_id 在这里插入图片描述 彩蛋: 原网站上的url只能显示每个NBA球队82场常规赛的数据, 但只要修改url中的“&GameType=season”字段, 就可以同时显示出各球队常规赛和季后赛的数据!

修改方法如下: 只要在"&GameType=season"后面加一个换行符 / 空格 / 直接它删掉,就行了。

4. 后记

最近需要些NBA相关的数据,无意间发现了stat-nba这个网站, 虽然stat-nba的界面配色复古了些,但数据却是相当的全~ 而且这个网站真的…特别好爬…… hhh大家放心,以上代码爬取的数据量不大, 且爬的慢,不会给该网站造成困扰的~

本篇博客到这里就要接近尾声了, 第一次用Markdown格式写博客,感觉还不错~ 有问题欢迎留言, 喜欢的可以点赞收藏哦~



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3