爬虫练习

2024-06-24 14:11| 来源: 网络整理| 查看: 265

文章目录任务网页分析使用 selenium 爬取使用 requests 爬取完成效果常见错误

任务使用任意方法爬取王者荣耀赛程爬取如下图所示数据在这里插入图片描述

网页分析

在这里插入图片描述从图中可以看出浏览器中的页面是经过JavaScript处理数据后生成的结果，这些数据是通Ajax加载的。对于这种情况，无法直接使用requests直接爬取信息，因为原始的页面最初不会包含某些数据，原始页面加载完成后，会再向服务器请求某个接口获取数据，然后数据才被处理从而呈现到网页上。

使用 selenium 爬取 from selenium import webdriver import time class match: def __init__(self): self.time = '' # 比赛时间 self.status = '' # 比赛状态 self.place = '' # 比赛城市 self.team1 = '' # 队伍1的名字 self.team2 = '' # 队伍2的名字 self.score = '' # 比分 def print_match_info(self): print(self.time,self.status,self.place,self.team1,self.team2,self.score) def main(): match_info_list = [] with webdriver.Chrome() as driver: driver.implicitly_wait(10) # 隐式等待 driver.get("https://pvp.qq.com/match/kpl/index.shtml") # 点击2020秋季赛常规赛 match_type = driver.find_elements_by_xpath('//ul[@class="kpl_schedule_nav"]/li/a') for i,m in enumerate(match_type): m.click() time.sleep(1) # 爬取第几周 week = driver.find_elements_by_xpath('//div[@class="kpl_schedule_date clearfix"][%d]/a'%(i+1)) for j,w in enumerate(week): if j+1 >= 8 and j = 3 and n

【本文地址】

爬虫练习

爬虫练习

今日新闻

推荐新闻