Python |
您所在的位置:网站首页 › python如何更新包 › Python |
这是我试图抓的网站http://livingwage.mit.edu/ 具体的URL来自 http://livingwage.mit.edu/states/01 http://livingwage.mit.edu/states/02 http://livingwage.mit.edu/states/04 (For some reason they skipped 03) ...all the way to... http://livingwage.mit.edu/states/56在每个这些URL上,我需要第二个表的最后一行: http://livingwage.mit.edu/states/01的示例 所需税前年收入$ 20,260 $ 42,786 $ 51,642 $ 64,767 $ 34,325 $ 42,305 $ 47,345 $ 53,206 $ 34,325 $ 47,691 $ 56,934 $ 66,997 欲望输出: 阿拉巴马州$ 20,260 $ 42,786 $ 51,642 $ 64,767 $ 34,325 $ 42,305 $ 47,345 $ 53,206 $ 34,325 $ 47,691 $ 56,934 $ 66,997 阿拉斯加24,070美元49,295美元60,933美元79,871美元38,561美元47,136美元52,233美元61,531美元38,561美元54,433美元66,316美元82,403美元 ... ... 怀俄明州$ 20,867 $ 42,689 $ 52,007 $ 65,892 $ 34,988 $ 41,887 $ 46,983 $ 53,549 $ 34,988 $ 47,826 $ 57,391 $ 68,424 经过2个小时的乱搞,这就是我到目前为止(我是初学者): import requests, bs4 res = requests.get('http://livingwage.mit.edu/states/01') res.raise_for_status() states = bs4.BeautifulSoup(res.text) state_name=states.select('h1') table = states.find_all('table')[1] rows = table.find_all('tr', 'odd')[4:] result=[] result.append(state_name) result.append(rows)当我在Python Console中查看state_name和行时,它会给我html元素 [Living Wag...Alabama]和 [import requests from bs4 import BeautifulSoup from urllib.parse import urljoin # python2 -> from urlparse import urljoin base = "http://livingwage.mit.edu" res = requests.get(base) res.raise_for_status() states = [] # Get all state urls and state name from the anchor tags on the base page. # td + td skips the first td which is *Required annual income before taxes* # get all the anchors inside each li that are children of the # ul with the css class "states list". for a in BeautifulSoup(res.text, "html.parser").select("ul.states.list-unstyled li a"): # The hrefs look like "/states/51/locations". # We want everything before /locations so we split on / from the right -> /states/51/ # and join to the base url. The anchor text also holds the state name, # so we return the full url and the state, i.e "http://livingwage.mit.edu/states/01 "Alabama". states.append((urljoin(base, a["href"].rsplit("/", 1)[0]), a.text)) def parse(soup): # Get the second table, indexing in css starts at 1, so table:nth-of-type(2)" gets the second table. table = soup.select_one("table:nth-of-type(2)") # To get the text, we just need find all the tds and call .text on each. # Each td we want has the css class "odd results", td + td starts from the second as we don't want the first. return [td.text.strip() for td in table.select_one("tr.odd.results").select("td + td")] # Unpack the url and state from each tuple in our states list. for url, state in states: soup = BeautifulSoup(requests.get(url).content, "html.parser") print(state, parse(soup))如果您运行代码,您将看到如下输出: Alabama ['$21,144', '$43,213', '$53,468', '$67,788', '$34,783', '$41,847', '$46,876', '$52,531', '$34,783', '$48,108', '$58,748', '$70,014'] Alaska ['$24,070', '$49,295', '$60,933', '$79,871', '$38,561', '$47,136', '$52,233', '$61,531', '$38,561', '$54,433', '$66,316', '$82,403'] Arizona ['$21,587', '$47,153', '$59,462', '$78,112', '$36,332', '$44,913', '$50,200', '$58,615', '$36,332', '$52,483', '$65,047', '$80,739'] Arkansas ['$19,765', '$41,000', '$50,887', '$65,091', '$33,351', '$40,337', '$45,445', '$51,377', '$33,351', '$45,976', '$56,257', '$67,354'] California ['$26,249', '$55,810', '$64,262', '$81,451', '$42,433', '$52,529', '$57,986', '$68,826', '$42,433', '$61,328', '$70,088', '$84,192'] Colorado ['$23,573', '$51,936', '$61,989', '$79,343', '$38,805', '$47,627', '$52,932', '$62,313', '$38,805', '$57,283', '$67,593', '$81,978'] Connecticut ['$25,215', '$54,932', '$64,882', '$80,020', '$39,636', '$48,787', '$53,857', '$61,074', '$39,636', '$60,074', '$70,267', '$82,606']你可以循环从1-53的范围内,但提取的锚从基本页面也给我们国家的名字在一个单一的步骤,使用H1从该网页也给你输出为阿拉巴马州生活工资计算,你将不得不然后尝试解析只是得到一个不会微不足道的名称,因为有些州有更多的单词名称. python web api http ip import request get text 写下你的评论吧 ! 推荐阅读 ip SwiftUI 实战:从 0 到 1 研发一个 App 心得感悟起初看到WWDC上的演示SwiftUI时,我就觉得SwiftUI有种陌生的熟悉感(声明式语法),所以体验下,看看有没有什么启发。先说下整体项目完成下来的感受:用Swift+ ... [详细] 蜡笔小新 2023-03-03 17:13:36 ip Spring学习笔记(4)Spring 事件原理及其应用![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |