Python爬虫实战(5)

您所在的位置：网站首页 › itsseeme官网女装图片和价格 › Python爬虫实战(5)

Python爬虫实战(5)

2024-06-19 23:21| 来源: 网络整理| 查看: 265

前言

今天我们巩固一下前面学过的知识，通过Selenium+Firefox实现模拟浏览器并自动翻页，爬取图片并写入本地文件中。

以搜索“女装”为例，自动爬取“女装”展示页面的前五页图片信息，先看一下爬取到的图片：

运行后浏览器的显示效果(我只截取了一部分)：

本文学习要点：

掌握selenium+Firefox的常见用法

Selenium下拉滚动+翻页

会在浏览器上查看审查元素(浏览器右键-->检查)

例如我们定位搜索框：

xpath语法

图片写入本地文件

实战

直接上源码：

# coding=utf-8 from selenium import webdriver import requests import os from lxml import etree import time #页面交互 def getPhoto(): # 路径，可以更改成你的路径 path = 'C://Users/Administrator/Desktop/美女图片/' try: driver = webdriver.Firefox() driver.get('https://www.taobao.com/') # 隐示等待，为了等待充分加载好网址 driver.implicitly_wait(5) # 定位到搜索框 write = driver.find_element_by_class_name("search-combobox-input") # 输入"女装" write.send_keys("女装") # 点击搜索 driver.find_element_by_class_name('search-button').click() time.sleep(2) # 爬取淘宝"女装"前五页的图片数据 for i in range(1, 6): time.sleep(1) # 下拉滚动条，分3次拉到底部 for j in range(1, 4): driver.execute_script("window.scrollBy(0,1600)") time.sleep(2) # print(driver.page_source) # 解析 selector = etree.HTML(driver.page_source) # 获取到女装图片的URL集合 photo_urls = selector.xpath('//div/div/div/div/a/img/@src') # 把图片写入本地文件 for item in photo_urls: if not os.path.exists(path): os.makedirs(path) print("path创建成功") data = requests.get("http:" + item) with open(path + item.split('TB')[1][0:18] + ".jpg", 'wb') as f: f.write(data.content) f.close() time.sleep(1) # 点击下一页，我用了name，ID等定位无效,所以用的find_element_by_css_selector next = driver.find_element_by_css_selector( "#mainsrp-pager > div > div > div > ul > li.item.next > a > span:nth-child(1)") next.click() except Exception as e: print(e) if __name__ == '__main__': getPhoto() 复制代码

希望对大家有所帮助！大家可以关注我的微信公众号：「秦子帅」一个有质量、有态度的公众号！

【本文地址】

Python爬虫实战(5)

Python爬虫实战(5)

今日新闻

推荐新闻