pyspider: 强大的Spider可视化爬虫系统pyspider(汉化)

您所在的位置:网站首页 spyder数据可视化用中文 pyspider: 强大的Spider可视化爬虫系统pyspider(汉化)

pyspider: 强大的Spider可视化爬虫系统pyspider(汉化)

2024-07-07 06:20| 来源: 网络整理| 查看: 265

pyspider Build Status Coverage Status Python中强大的Spider(网络爬虫)系统(汉化版)。

用Python编写脚本

强大的WebUI,带有脚本编辑器、任务监视器、项目管理器和结果查看器

MySQL、MongoDB、Redis、SQLite、Elasticsearch;以SQLAlchemy作为数据库后端的PostgreSQL

RabbitMQ、Redis和Kombu作为消息队列

任务优先级、重试、定期、按年龄重新爬网等。。。

分布式架构,抓取Javascript页面,Python 2.{6,7},3.{3,4,5,6}支持,等等。。。 A Powerful Spider(Web Crawler) System in Python.

Write script in Python

Powerful WebUI with script editor, task monitor, project manager and result viewer

MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend

RabbitMQ, Redis and Kombu as message queue

Task priority, retry, periodical, recrawl by age, etc...

Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

A Powerful Spider(Web Crawler) System in Python.

Write script in Python Powerful WebUI with script editor, task monitor, project manager and result viewer MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend RabbitMQ, Redis and Kombu as message queue Task priority, retry, periodical, recrawl by age, etc... Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

Tutorial: http://docs.pyspider.org/en/latest/tutorial/ Documentation: http://docs.pyspider.org/ Release notes: https://github.com/binux/pyspider/releases

Sample Code from pyspider.libs.base_handler import * class Handler(BaseHandler): crawl_config = { } @every(minutes=24 * 60) def on_start(self): self.crawl('http://scrapy.org/', callback=self.index_page) @config(age=10 * 24 * 60 * 60) def index_page(self, response): for each in response.doc('a[href^="http"]').items(): self.crawl(each.attr.href, callback=self.detail_page) def detail_page(self, response): return { "url": response.url, "title": response.doc('title').text(), } Installation pip install -r requirements.txt 运行命令python run.py -c config.json,访问http://localhost:5000/

警告:WebUI在默认情况下是对公众开放的,它可以用来执行任何可能损害您的系统的命令。请在内部网络中使用它或为webui启用需要身份验证。

WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute Use It Open Issue, send PR User Group 中文问答 TODO v0.4.0 a visual scraping interface like portia License

Licensed under the Apache License, Version 2.0



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3