使用Python爬虫抓取某网站电影Top250并保存为Excel文件

2023-05-21 20:16| 来源: 网络整理| 查看: 265

简介

如何使用Python爬虫和数据处理库Openpyxl获取某网站电影Top250信息

使用Python爬虫和数据处理库Openpyxl获取某网站电影Top250的信息，并将数据保存到Excel文件中。本文将分为以下几个部分：

一、爬取某网站电影Top250信息

首先，我们需要使用Python爬虫来获取某网站电影Top250的信息。为了避免被反爬虫机制拦截，我们需要设置一个请求头。我们使用requests和BeautifulSoup库来完成这个任务。

import requests from bs4 import BeautifulSoup headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'} lst=[['编号','名称','推荐语','评分','链接地址']] for i in range(10): url='https://movie.douban.com/top250?start=' + str(i*25) + '&filter=' resp=requests.get(url,headers=headers) bs=BeautifulSoup(resp.text,'html.parser') grid_view=bs.find('ol',class_='grid_view') all_li=grid_view.find_all('li') for item in all_li: no=item.find('em').text title=item.find('span',class_='title').text inq=item.find('span',class_='inq') rat=item.find('span',class_='rating_num').text url_films=item.find('a')['href'] lst.append([no,title,inq.text if inq!=None else '' ,rat,url_films])

在以上代码中，我们使用了一个循环来遍历Top250的每一页。每一页有25个电影，所以我们每次循环增加25个电影。然后，我们使用requests库来获取每一页的HTML代码。接下来，我们使用BeautifulSoup库来解析HTML代码，并找到每个电影的信息。最后，我们将每个电影的信息存储到lst列表中。

二、将数据保存到Excel文件中

现在，我们已经成功地获取了某网站电影Top250的信息。接下来，我们将使用Openpyxl库将数据保存到Excel文件中。

import openpyxl wb=openpyxl.Workbook() sheet=wb.active sheet.title='我的电影' for item in lst: sheet.append(item) wb.save('films.xls')

在以上代码中，我们首先使用openpyxl库创建一个Excel文件。然后，我们创建一个名为“我的电影”的工作表。接下来，我们使用循环将lst列表中的每个电影信息添加到工作表中。最后，我们将Excel文件保存为“films.xls”。

三、完整代码和运行结果 import requests from bs4 import BeautifulSoup import openpyxl headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'} lst=[['编号','名称','推荐语','评分','链接地址']] for i in range(10): url='https://movie.douban.com/top250?start=' + str(i*25) + '&filter=' resp=requests.get(url,headers=headers) bs=BeautifulSoup(resp.text,'html.parser') grid_view=bs.find('ol',class_='grid_view') all_li=grid_view.find_all('li') for item in all_li: no=item.find('em').text title=item.find('span',class_='title').text inq=item.find('span',class_='inq') rat=item.find('span',class_='rating_num').text url_films=item.find('a')['href'] lst.append([no,title,inq.text if inq!=None else '' ,rat,url_films]) wb=openpyxl.Workbook() sheet=wb.active sheet.title='我的电影' for item in lst: sheet.append(item) wb.save('films.xls')

运行结果：

我们成功地获取了某网站电影Top250的信息，并将数据保存到Excel文件中。

【本文地址】

使用Python爬虫抓取某网站电影Top250并保存为Excel文件

使用Python爬虫抓取某网站电影Top250并保存为Excel文件

今日新闻

推荐新闻