Python spider (五) Scrapy管道操作

您所在的位置：网站首页 › 中国视觉网 › Python spider (五) Scrapy管道操作

Python spider (五) Scrapy管道操作

#Python spider (五) Scrapy管道操作| 来源: 网络整理| 查看: 265

1.准备环境这里只需要介绍windows的就好，其他system可以直接命令安装即可 pip install wheel 还需要安装Twisted，这个东西需要查看自己的版本和对应的版本关系，不然可能不兼容 pip install win32 pip install scrapy 2.如何使用？ scrapy startproject pro_name scrapy genspider spider_name www.xxx.com

*持久化存储（基于终端指令实现）

思路：

将爬取到的数据进行处理，拿到想要的数据将数据添加到字典当中（而不是字符串）返回字典终端命令执行 class TestSpider(scrapy.Spider): name = 'test' start_urls = ['https://www.xuexila.com/duanzi/nahanduanzi/2870287.html'] def parse(self, response): page_text_list = response.xpath('//*[@id="contentText"]//p/text()').extract() all_datas = [] for text in page_text_list: text = ''.join(text) dic = { 'context':text } all_datas.append(dic) return all_datas

补充：这里会弹出一大堆的没啥用的日志文件，我们需要修改一下日志等级,在settings文件下添加日志等级

回调函数Callback url = 'https://www.xuexila.com/duanzi/huangduanzi/2913969_%d.html' page_num = 2 def parse(self,response): ... if self.page_num 3: new_url = format(self.url%self.page_num) self.page_num += 1 yield scrapy.Request(url=new_url,callback=self.parse) yield scrapy.Request(url=new_url,callback=self.other_func,meta={'key':value}) *持久化存储（基于管道实现） class TestSpider(scrapy.Spider): name = 'test' start_urls = ['https://www.xuexila.com/duanzi/huangduanzi/2913969.html'] def parse(self, response): page_text_list = response.xpath('//*[@id="contentText"]//p/text()').extract() for text in page_text_list: text = ''.join(text) item = ProTestItem() item['text'] = text yield item items文件 class ProTestItem(scrapy.Item): text = scrapy.Field() piplines文件 class BoosproPipeline: fp = None def open_spider(self,spider): print("开始爬虫") self.fp = open('./text.txt','w',encoding='utf-8') def process_item(self, item, spider): text = item['text'] self.fp.write(text) return item def close_spider(self,spider): print("结束爬虫") self.fp.close()

最后运行实现持久化存储

3.ImagesPipeLine使用

思路：

在spider文件中找到图片下载路径配置图片下载管道配置配置文件，管道下载图片路径完成图片下载 name = 'test' start_urls = ['https://qq.yh31.com/sx/zw/'] def img_detail(self,response): img_src = response.xpath('//div[@id="imgBox"]/img/@src').extract_first() item = ProTestItem() item['src'] = img_src yield item def parse(self, response): img_url_list = response.xpath('//*[@id="main_bblm"]//div[@class="za"]//dt/a/@href').extract() for img_url in img_url_list: yield scrapy.Request(img_url,callback=self.img_detail) items文件 class ProTestItem(scrapy.Item): src = scrapy.Field() pipeLine文件 import scrapy from scrapy.pipelines.images import ImagesPipeline class imgPipeLine(ImagesPipeline): def get_media_requests(self,item,info): yield scrapy.Request(item['src']) def file_path(self,request,response=None,info=None): imgName = request.url.split('/')[-1] return imgName def item_completed(self,results,item,info): return item settings文件 ITEM_PIPELINES = { 'pro_test.pipelines.imgPipeLine': 300, } IMAGES_STORE = 'xxx'

下载完成

Original: https://blog.csdn.net/weixin_43409994/article/details/123770230Author: Adorable_RocyTitle: Python spider (五) Scrapy管道操作

相关阅读 Title: 第三方平台使用钉钉账号登录思路

1.成为钉钉开放平台开发者

想要通过钉钉账号实现第三方登录，首先要成为钉钉开放平台开发者，从而获取 APPID和 appSecret

2.前端点击钉钉登录按钮发送登录请求到后端(携带一个code(临时授权码))

3.后端根据前端发送的code和Appid,appSecret构造跳转链接

appid = current_app.config.get('DINGDING_APPID') appSecret = current_app.config.get('DINGDING_SECRET') timestamp = str(int(time.time() * 1000)) base_url = "https://oapi.dingtalk.com/sns/getuserinfo_bycode?signature=" signature = base64.b64encode( hmac.new(appSecret.encode('utf-8'), timestamp.encode('utf-8'), digestmod='sha256').digest()) url = base_url + urllib.parse.quote(signature.decode()) + '×tamp=' + timestamp + '&accessKey=' + appid 4.使用钉钉账号登录 5.根据临时授权码获取用户信息 data = json.dumps({'tmp_auth_code': code}) try: resp = requests.post(url, data, headers={'Content-Type': 'application/json'}) print('resp>>>', resp.json()) except Exception as e: return {'code': 500, 'message': '获取用户标识信息失败'} user_info = resp.json() if user_info['errcode'] != 0: return {'code': 500, 'message': '获取用户标识信息失败'} user_info_dict = user_info['user_info'] return user_info_dict 返回实例 { "errcode":0, "user_info":{ "nick":"名字", "unionid":"dingdkjjojoixxxx", "openid":"dingsdsqwlklklxxxx", "main_org_auth_high_level":true }, "errmsg":"ok" } 6.判断钉钉账号是否与该用户绑定 user_info = self.get_dingding_user(code) openid = user_info['openid'] if openid: oauth_user = OAuthUserModel.query.filter_by(oauth_id=openid).first() if oauth_user: user = User.query.filter_by(uid=oauth_user.user).first() data = { 'username': user.username, 'uid': user.uid } token = _generate_token(data) return {'code': 200, 'account': user.username, 'uid': user.uid, 'token': token} else: return {'code': 500, 'message': '没用绑定用户，请重新绑定', 'uid': openid}

如果已绑定，则登录成功。如果没有绑定，则需要先进行登录绑定。

[En]

If it has been bound, the login is successful. If there is no binding, you need to bind in the login first.

Original: https://blog.csdn.net/weixin_69282249/article/details/125472985Author: 大可不必872Title: 第三方平台使用钉钉账号登录

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/350629/

转载文章受原作者版权保护。转载请注明原作者出处！

【本文地址】

Python spider (五) Scrapy管道操作

Python spider (五) Scrapy管道操作

今日新闻

推荐新闻