Python实现考试网题目答案解析脚本（网络爬虫）

您所在的位置：网站首页 › 如何快速搜题获取答案 › Python实现考试网题目答案解析脚本（网络爬虫）

Python实现考试网题目答案解析脚本（网络爬虫）

2023-09-07 01:50| 来源: 网络整理| 查看: 265

前言

用Python写网络爬虫是比较常用的做法，原理是将网页下载下来后，用正则表达式清洗数据，获取目标资源。可以是文字、图片或其他URL。然后分文别类进行储存。本文只作简易的文本提取。

正文

代码是用Python2.7版本撰写，经测试可以通过。运行结果参看下文。

# -*- coding: UTF-8 -*- import urllib2 import re def ppkao_getanwser(questionURL) : questionPage = urllib2.urlopen(questionURL).read() # 解析题目 questionTypeURL = re.search('以下试题来自：(.*?)(.*?)', questionPage, re.S).group(2) questionType = re.search('以下试题来自：(.*?)(.*?)', questionPage, re.S).group(3) questionTypeID = re.search('ViewAnswers\(\'//(.*?)\',\'(\d+)\'', questionPage, re.S).group(2) questionModel = re.search('(.*?)', questionPage, re.S).group(1) questionName = re.search('(.*?)', questionPage, re.S).group(2) if questionModel == '简答题' : questionItems = [] else : questionItems = re.search('(.*?)

', questionPage, re.S).group(1).replace('

', '').replace('\t', '').replace('\r\n', '').replace('，', '').split('') #questionItems = re.search('(.*?)

', questionPage, re.S).group(1).replace('

', '').replace('\t', '').replace('\r\n', '').strip().replace('，', '').split('') # 获取答案 anwserURL = re.search('ViewAnswers\(\'//(.*?)\',', questionPage, re.S).group(1) anwserPage = urllib2.urlopen('https://' + anwserURL).read() if questionModel == '简答题' : questionAnwser = re.search('试题答案(.*?)(.*?)

(.*?)

', anwserPage, re.S).group(3).strip() else : questionAnwser = re.search('(.*?)', anwserPage, re.S).group(1).strip() # 展示 print '题库：', questionType print '题库标识：', questionTypeID print '题库链接：', questionTypeURL print '题目链接：', questionURL print '题型：', questionModel print '题目：', questionName print '题干：', ','.join(questionItems) print '答案链接：', 'https://' + anwserURL print '答案：', questionAnwser 测试结果

ppkao_getanwser(“https://www.ppkao.com/tiku/shiti/1594015.html“) 测试结果 ppkao_getanwser(“https://www.ppkao.com/tiku/shiti/1594014.html“) 这里写图片描述

【本文地址】

Python实现考试网题目答案解析脚本（网络爬虫）

Python实现考试网题目答案解析脚本（网络爬虫）

今日新闻

推荐新闻