道客巴巴爬虫

2023-09-09 10:30| 来源: 网络整理| 查看: 265

道客巴巴爬虫原创

奇点_python_nlp 2022-04-28 23:12:50 ©著作权

文章标签 python django list tornado html 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者奇点_python_nlp的原创作品，请联系作者获取转载授权，否则将追究法律责任

道客巴巴爬虫_list

使用xpathhelp控件import requests, re, json, pandas as pd, timefrom selenium import webdriver # selenium2.48.0 支持phantomjsfrom lxml import etreeimport timeimport os, time# 页 https://www.doc88.com/list-8308-0-1.html# 文件 https://www.doc88.com/p-9139147359378.htmldriver = webdriver.PhantomJS(executable_path=r'C:\Users\wang\Desktop\phantomjs-2.1.1-windows (1)\bin\phantomjs.exe')file_urls_list=[]for i in range(1,30,1): time.sleep(3) url = "https://www.doc88.com/list-8308-0-"+str(i)+"1.html" driver.get(url=url) tree = etree.HTML(driver.page_source) file_urls = tree.xpath(".//h3[@class='sd-type-title']/a/@href") file_urls=[ "https://www.doc88.com/"+str(i) for i in file_urls ] file_urls_list.extend(file_urls) print(file_urls)with open("url.txt","w",encoding="utf-8") as f: for i in file_urls: if len(i)==len("https://www.doc88.com//p-7367816610215.html"): f.write(i) f.write("\n")f.close()

道客巴巴爬虫_django_02

赞收藏评论分享举报

上一篇：python中os关于目录创建和文件移动操作

下一篇：图神经网络的模型图

【本文地址】

道客巴巴爬虫

道客巴巴爬虫

今日新闻

推荐新闻