Python

#Python| 来源: 网络整理| 查看: 265

这是我试图抓的网站http://livingwage.mit.edu/

具体的URL来自

http://livingwage.mit.edu/states/01 http://livingwage.mit.edu/states/02 http://livingwage.mit.edu/states/04 (For some reason they skipped 03) ...all the way to... http://livingwage.mit.edu/states/56

在每个这些URL上,我需要第二个表的最后一行:

http://livingwage.mit.edu/states/01的示例

所需税前年收入$ 20,260 $ 42,786 $ 51,642 $ 64,767 $ 34,325 $ 42,305 $ 47,345 $ 53,206 $ 34,325 $ 47,691 $ 56,934 $ 66,997

欲望输出:

阿拉巴马州$ 20,260 $ 42,786 $ 51,642 $ 64,767 $ 34,325 $ 42,305 $ 47,345 $ 53,206 $ 34,325 $ 47,691 $ 56,934 $ 66,997

阿拉斯加24,070美元49,295美元60,933美元79,871美元38,561美元47,136美元52,233美元61,531美元38,561美元54,433美元66,316美元82,403美元

...

怀俄明州$ 20,867 $ 42,689 $ 52,007 $ 65,892 $ 34,988 $ 41,887 $ 46,983 $ 53,549 $ 34,988 $ 47,826 $ 57,391 $ 68,424

经过2个小时的乱搞,这就是我到目前为止(我是初学者):

import requests, bs4 res = requests.get('http://livingwage.mit.edu/states/01') res.raise_for_status() states = bs4.BeautifulSoup(res.text) state_name=states.select('h1') table = states.find_all('table')[1] rows = table.find_all('tr', 'odd')[4:] result=[] result.append(state_name) result.append(rows)

当我在Python Console中查看state_name和行时,它会给我html元素

[Living Wag...Alabama]

和

[import requests from bs4 import BeautifulSoup from urllib.parse import urljoin # python2 -> from urlparse import urljoin base = "http://livingwage.mit.edu" res = requests.get(base) res.raise_for_status() states = [] # Get all state urls and state name from the anchor tags on the base page. # td + td skips the first td which is *Required annual income before taxes* # get all the anchors inside each li that are children of the # ul with the css class "states list". for a in BeautifulSoup(res.text, "html.parser").select("ul.states.list-unstyled li a"): # The hrefs look like "/states/51/locations". # We want everything before /locations so we split on / from the right -> /states/51/ # and join to the base url. The anchor text also holds the state name, # so we return the full url and the state, i.e "http://livingwage.mit.edu/states/01 "Alabama". states.append((urljoin(base, a["href"].rsplit("/", 1)[0]), a.text)) def parse(soup): # Get the second table, indexing in css starts at 1, so table:nth-of-type(2)" gets the second table. table = soup.select_one("table:nth-of-type(2)") # To get the text, we just need find all the tds and call .text on each. # Each td we want has the css class "odd results", td + td starts from the second as we don't want the first. return [td.text.strip() for td in table.select_one("tr.odd.results").select("td + td")] # Unpack the url and state from each tuple in our states list. for url, state in states: soup = BeautifulSoup(requests.get(url).content, "html.parser") print(state, parse(soup))

如果您运行代码,您将看到如下输出:

Alabama ['$21,144', '$43,213', '$53,468', '$67,788', '$34,783', '$41,847', '$46,876', '$52,531', '$34,783', '$48,108', '$58,748', '$70,014'] Alaska ['$24,070', '$49,295', '$60,933', '$79,871', '$38,561', '$47,136', '$52,233', '$61,531', '$38,561', '$54,433', '$66,316', '$82,403'] Arizona ['$21,587', '$47,153', '$59,462', '$78,112', '$36,332', '$44,913', '$50,200', '$58,615', '$36,332', '$52,483', '$65,047', '$80,739'] Arkansas ['$19,765', '$41,000', '$50,887', '$65,091', '$33,351', '$40,337', '$45,445', '$51,377', '$33,351', '$45,976', '$56,257', '$67,354'] California ['$26,249', '$55,810', '$64,262', '$81,451', '$42,433', '$52,529', '$57,986', '$68,826', '$42,433', '$61,328', '$70,088', '$84,192'] Colorado ['$23,573', '$51,936', '$61,989', '$79,343', '$38,805', '$47,627', '$52,932', '$62,313', '$38,805', '$57,283', '$67,593', '$81,978'] Connecticut ['$25,215', '$54,932', '$64,882', '$80,020', '$39,636', '$48,787', '$53,857', '$61,074', '$39,636', '$60,074', '$70,267', '$82,606']

你可以循环从1-53的范围内,但提取的锚从基本页面也给我们国家的名字在一个单一的步骤,使用H1从该网页也给你输出为阿拉巴马州生活工资计算,你将不得不然后尝试解析只是得到一个不会微不足道的名称,因为有些州有更多的单词名称.

python web api http ip import request get text 写下你的评论吧 ! 推荐阅读 ip SwiftUI 实战：从 0 到 1 研发一个 App 心得感悟起初看到WWDC上的演示SwiftUI时，我就觉得SwiftUI有种陌生的熟悉感（声明式语法），所以体验下，看看有没有什么启发。先说下整体项目完成下来的感受：用Swift+ ... [详细] 蜡笔小新 2023-03-03 17:13:36 ip Spring学习笔记（4）Spring 事件原理及其应用

在JDK中已经提供相应的自定义事件发布功能的基础类:java.util.EventObject类：自定义事件类型java.util.EventListener接口：事件的监听器首先 ... [详细] 蜡笔小新 2023-03-03 17:09:27 blob 学习机器学习——练习训练模型（2）（图像分割细胞图片）

图像分割——细胞图片基于Pytorch，UNET模型模型学习来自：（很感谢这么详细的文章）https:github.comJ ... [详细] 蜡笔小新 2023-03-03 16:57:28 ip BASE64成像角2 如何解决《BASE64成像角2》经验，为你挑选了4个好方法。 ... [详细] 蜡笔小新 2023-03-03 16:55:44 ip javascript – RegExp不区分大小写的多字高亮 javascript – RegExp不区分大小写的多字高亮 ... [详细] 蜡笔小新 2023-03-03 16:51:28 ip Bokeh:ValueError:超出范围的浮点值不符合JSON 如何解决《Bokeh:ValueError:超出范围的浮点值不符合JSON》经验，为你挑选了1个好方法。 ... [详细] 蜡笔小新 2023-03-03 16:22:03 ip Angular2 ngModel验证器就在组件中如何解决《Angular2ngModel验证器就在组件中》经验，为你挑选了1个好方法。 ... [详细] 蜡笔小新 2023-03-03 17:23:53 ip tensorflow实现svm iris二分类——本质上在使用梯度下降法求解线性回归（loss是定制的而已）...

tensorflow实现svm iris二分类——本质上在使用梯度下降法求解线性回归（loss是定制的而已）...

iris二分类#LinearSupportVectorMachine:SoftMargin #---------------------------------- # #Thisf ... [详细] 蜡笔小新 2023-03-03 17:16:37 ip IdentityServer4需要配置密钥材料，但是存在证书当我在IIS服务器上发布IdentityServer4解决方案时，日志显示“System.Exception：需要配 ... [详细] 蜡笔小新 2023-03-03 17:15:59 ip 如何在Entity Framework Core中构建几个左连接查询如何解决《如何在EntityFrameworkCore中构建几个左连接查询》经验，为你挑选了1个好方法。 ... [详细] 蜡笔小新 2023-03-03 16:55:10 ip json.loads()容易受到任意代码执行的影响吗？如何解决《json.loads()容易受到任意代码执行的影响吗？》经验，为你挑选了1个好方法。 ... [详细] 蜡笔小新 2023-03-03 16:54:00 ip 如何将React-native与后端服务器集成？如何解决《如何将React-native与后端服务器集成？》经验，为你挑选了1个好方法。 ... [详细] 蜡笔小新 2023-03-03 16:42:04 stream Spark基础：（七）Spark Streaming入门

介绍1、是sparkcore的扩展，针对实时数据流处理,具有可扩展、高吞吐量、容错.数据可以是来自于kafka,flume,tcpsocket,使用高级函数(mapreduce ... [详细] 蜡笔小新 2023-03-03 16:36:54 ip VS2010配合ICE3.4开发ICE程序流程

1、下载并安装Ice-3.4.2.msi，安装目录为【IceInstallationRootDirectory】;2、在环境变量Path中添加ice相关小工具的执行目录【IceInsta ... [详细] 蜡笔小新 2023-03-03 16:36:40 list JPA03 Spring Data JPA02

一、Specification动态查询有时候我们在执行查询操作的时候，给定的条件时不固定的，这时候就需要动态的构建相应的查询语句。在SpringDataJPA中可以通过JpaSpe ... [详细] 蜡笔小新 2023-03-03 16:20:09 author-avatar

缘来是你明这个家伙很懒，什么也没留下！ Tags | 热门标签 emoji python3 timestamp cookie copy actionscrip byte sum bytecode php future main java cSharp list regex require foreach stream web hook bit tree flutter chat ip case blob bitmap uml RankList | 热门文章 1Android使用MobSDK短信验证 2Android自定义竖直方向SeekBar多色进度条 3Android中使用Expandablelistview实现微信通讯录界面 4Android中mvp模式使用实例详解 5android与asp.net服务端共享session的方法详解 6php代码优化及php相关问题总结 7PHP与SQL注入攻击[一] 8使用Limit参数优化MySQL查询的方法 9php 一元分词算法 10PHP中的MYSQL常用函数(php下操作数据库必备) 11基于MySQL体系结构的分析 12zend framework框架中url大小写问题解决方法 13Laravel6.0.4中将添加计划任务事件的方法步骤 14PHP如何解决微信文章图片防盗链 15Java使用JDBC连接Oracle_MSSQL实例代码 PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具 Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved |

【本文地址】

Python

Python

今日新闻

推荐新闻