利用Python爬虫获取图片（附爬虫模板及项目实战），详解BeautifulSoup爬取时需注意的地方

您所在的位置：网站首页 › 爬虫爬照片 › 利用Python爬虫获取图片（附爬虫模板及项目实战），详解BeautifulSoup爬取时需注意的地方

利用Python爬虫获取图片（附爬虫模板及项目实战），详解BeautifulSoup爬取时需注意的地方

2023-08-11 16:52| 来源: 网络整理| 查看: 265

本案例采用bs解析运行平台：pycharm 导入第三方库：bs4,requests,time(为防止被封，给爬虫程序设定间隙时间。推荐使用) 在写代码之前我们先要理清爬取思路，接下来我们看一看爬虫爬取思路 1. 拿到主页面的源代码，然后提取到子页面的链接地址，href 2. 通过href拿到子页面内容。从子页面中找到图片的下载地址，img->src 3. 下载图片最最最重要的要注意：bs4里面拿属性的值用get()方法！！！ Y(o)Y，进入代码模板部分： import requests from bs4 import BeautifulSoup import time url=’……请求地址……’ #headers用于伪装,请求方式为get，若请求方式为post则采用data headers={ ‘user-agent’=’……………’ } resp=requests.get(url,headers=headers) resp.encoding=’utf-8’ #处理乱码

#接下来把源代码交给bs4

#加html.parser的目的就是防止一大堆报错 main_page=BeautifulSoup(resp.text,’html.parser’) #为了区分class类，我们写成class_;查找标签 alist=main_page.find(‘div’,class_=’…’).find_all(‘a’) for a in alist: href=a.get(‘href’) child_page_resp=requests.get(href) child_page_resp.encoding=’utf-8’ child_page_text=child_page_resp.text #到这一步我们就拿到页面源代码了

#接下来就是获取下载路径,解析 child_page=BeautifulSoup(child_page_text,’html.parser’) p=child_page.find(‘标签’) img=p.find(‘img’) src=img.find(‘src’)

#接下来我们下载图片 img_resp=requests.get(src) img_name=src.split(‘/’)[-1] #取一大串中最后一个斜杠后面的名称作为照片名称

#写入文档 with open(‘img/’+img_name,mode=’wb’) as f: f.write(img_resp.content) print(‘over!’,img_name) #爬取时每一秒休息一次 time.sleep(1) print(‘all over!’)

看到这里，大家是不是想赶紧实战一下呢？ Ok废话不多说，我们进入实战，

爬取回车桌面的图片

一、打开回车桌面，按F12，再接入Ctrl+R，查看网页源代码

在这里插入图片描述

从网页可以得到URL，User-Agent及请求方法为get()

二、点击响应，找到我需要爬取的数据的代码位置在这里插入图片描述观察到dd、dt,两者出现必有dl封装，形成一个表格

三、完整源码展示（亲们久等啦） import requests from bs4 import BeautifulSoup import time url=‘https://mm.enterdesk.com/’ #headers是用来伪装，防止被封~ headers={ ‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’ } resp=requests.get(url,headers=headers) resp.encoding=‘utf-8’ main_page=BeautifulSoup(resp.text,‘html.parser’) alist=main_page.find(‘div’,class_=‘egeli_pic_m center’).find_all(‘a’) #print(alist) for a in alist: href=a.get(‘href’) child_page_resp=requests.get(href) child_page_resp.encoding=‘utf-8’ child_page_text=child_page_resp.text #拿到子页面源代码 #获取下载路径 child_page=BeautifulSoup(child_page_text,‘html.parser’) p=child_page.find(‘dd’) img=p.find(‘img’) #得到 src=img.get(‘src’) #提取

#下载图片 #获取 img_resp=requests.get(src) img_name = src.split('/')[-1] #图片写入img文件夹中，img文件夹须先行创建 with open('img/'+img_name,mode='wb') as f: f.write(img_resp.content) print('over!',img_name) time.sleep(1)

print(‘all over!’)

四、运行结果在这里插入图片描述在img文件夹里查看爬取的图片 Ding,到这里就顺利完成图片的爬取啦！

学习爬虫的路上，欢迎各位大佬提出宝贵意见。

【本文地址】

利用Python爬虫获取图片（附爬虫模板及项目实战），详解BeautifulSoup爬取时需注意的地方

利用Python爬虫获取图片（附爬虫模板及项目实战），详解BeautifulSoup爬取时需注意的地方

今日新闻

推荐新闻