Python Requests库进阶用法

您所在的位置：网站首页 › hook和loop › Python Requests库进阶用法

Python Requests库进阶用法

#Python Requests库进阶用法| 来源: 网络整理| 查看: 265

Python HTTP 请求库在所有编程语言中是比较实用的程序。它简单、直观且在 Python 社区中无处不在。大多数与 HTTP 接口程序使用标准库中的request或 urllib3。

由于简单的API，请求很容易立即生效，但该库还为高级需求提供了可扩展性。假如你正在编写一个API密集型客户端或网路爬虫，可能需要考虑网络故障、靠谱的调试跟踪和语法分析。

Request hooks

在使用第三方API时，通常需要验证返回的响应是否确实有效。Requests提供简单有效的方法raise_for_status()，它断言响应HTTP状态代码不是4xx或5xx，即校验请求没有导致客户端或服务器错误。

比如：

response = requests.get('https://api.github.com/user/repos?page=1') # 断言没有错误 response.raise_for_status()

如果每次调用都需要使用raise_for_status()，则此操作可能会重复。幸运的是，request库提供了一个“hooks”(钩子)接口，可以附加对请求过程某些部分的回调，确保从同一session对象发出的每个请求都会被检查。

我们可以使用hooks来确保为每个响应对象调用raise_for_status（）。

# 创建自定义请求对象时，修改全局模块抛出错误异常 http = requests.Session() assert_status_hook = lambda response, *args, **kwargs: response.raise_for_status() http.hooks["response"] = [assert_status_hook] http.get("https://api.github.com/user/repos?page=1") > HTTPError: 401 Client Error: Unauthorized for url: https://api.github.com/user/repos?page=1 设置base URLs

requests中可以用两种方法指定URL： 1、假设你只使用一个托管在API.org上的API，每次调用使用全部的URL地址

requests.get('https://api.org/list/') requests.get('https://api.org/list/3/item')

2、安装requests_toolbelt库，使用BaseUrlSession指定base_url

from requests_toolbelt import sessions http = sessions.BaseUrlSession(base_url="https://api.org") http.get("/list") http.get("/list/item") 设置默认timeout值

Request官方文档建议对所有的代码设置超时。如果你的python程序是同步的，忘记设置请求的默认timeout可能会导致你的请求或者有应用程序挂起。 timeout的设定同样有两种方法： 1、每次都在get语句中指定timeout的值。（不可取，只对本次请求有效）。

requests.get('https://github.com/', timeout=0.001)

2、使用Transport Adapters设置统一的timeout时间（使用Transport Adapters，我们可以为所有HTTP调用设置默认超时，这确保了即使开发人员忘记在他的单个调用中添加timeout=1参数，也可以设置一个合理的超时，但这是允许在每个调用的基础上重写。）：下面是一个带有默认超时的自定义Transport Adapters的例子，在构造http client和send()方法时，我们重写构造函数以提供默认timeout，以确保在没有提供timeout参数时使用默认超时。

from requests.adapters import HTTPAdapter DEFAULT_TIMEOUT = 5 # seconds class TimeoutHTTPAdapter(HTTPAdapter): def __init__(self, *args, **kwargs): self.timeout = DEFAULT_TIMEOUT if "timeout" in kwargs: self.timeout = kwargs["timeout"] del kwargs["timeout"] super().__init__(*args, **kwargs) def send(self, request, **kwargs): timeout = kwargs.get("timeout") if timeout is None: kwargs["timeout"] = self.timeout return super().send(request, **kwargs)

还可以这样使用：

import requests http = requests.Session() # 此挂载对http和https都有效 adapter = TimeoutHTTPAdapter(timeout=2.5) http.mount("https://", adapter) http.mount("http://", adapter) # 设置默认超时为2.5秒 response = http.get("https://api.twilio.com/") # 通常为特定的请求重写超时时间 response = http.get("https://api.twilio.com/", timeout=10) 失败时重试

网络连接有丢包、拥挤，服务器出现故障。如果我们想要构建一个真正健壮的程序，我们需要考虑失败并制定重试策略。

向HTTP client添加重试策略非常简单。我们创建了一个适配器来适应我们的策略。

from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry retry_strategy = Retry( total=3, status_forcelist=[429, 500, 502, 503, 504], method_whitelist=["HEAD", "GET", "OPTIONS"] ) adapter = HTTPAdapter(max_retries=retry_strategy) http = requests.Session() http.mount("https://", adapter) http.mount("http://", adapter) response = http.get("https://en.wikipedia.org/w/api.php")

其他参数：

最大重试次数total=10引起重试的HTTP状态码status_forcelist=[413, 429, 503]允许重试的请求方法method_whitelist=["HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"]两次重试的间隔参数backoff_factor=0 合并timeouts和retries–超时与重试

综合上面学到的，我们可以通过这种方法将timeouts与retries结合到同一个Adapter中

retries = Retry(total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504]) http.mount("https://", TimeoutHTTPAdapter(max_retries=retries)) 调试HTTP请求

如果一个HTTP请求失败了，可以用下面两种方法获取失败的信息：

使用内置的调试日志使用request hooks 打印HTTP头部信息

将logging debug level设置为大于0的值都会将HTTP请求的头部打印在日志中。当返回体过大或为字节流不便于日志时，打印头部将非常有用。

import requests import http http.client.HTTPConnection.debuglevel = 1 requests.get("https://www.google.com/") # Output send: b'GET / HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Fri, 28 Feb 2020 12:13:26 GMT header: Expires: -1 header: Cache-Control: private, max-age=0 打印所有HTTP内容

当API返回内容不太大时，我们可以使用request hooks与requests_toolbelt的dump工具库输出所有HTTP请求相应内容。

import requests from requests_toolbelt.utils import dump def logging_hook(response, *args, **kwargs): data = dump.dump_all(response) print(data.decode('utf-8')) http = requests.Session() http.hooks["response"] = [logging_hook] http.get("https://api.openaq.org/v1/cities", params={"country": "BA"}) # Output 输出信息如下： Content-Type: application/json; charset=utf-8 > Transfer-Encoding: chunked > Connection: keep-alive > { "meta":{ "name":"openaq-api", "license":"CC BY 4.0", "website":"https://docs.openaq.org/", "page":1, "limit":100, "found":1 }, "results":[ { "country":"BA", "name":"Goražde", "city":"Goražde", "count":70797, "locations":1 } ] }

dump工具的用法：https://toolbelt.readthedocs.io/en/latest/dumputils.html

测试与模拟请求

测试第三方API有时不能一直发送真实的请求（比如按次收费的接口，还有没开发完的=_=），测试中我们可以用getsentry/responses作为桩模块拦截程序发出的请求并返回预定的数据，造成返回成功的假象。

class TestAPI(unittest.TestCase): @responses.activate # intercept HTTP calls within this method def test_simple(self): response_data = { "id": "ch_1GH8so2eZvKYlo2CSMeAfRqt", "object": "charge", "customer": {"id": "cu_1GGwoc2eZvKYlo2CL2m31GRn", "object": "customer"}, } # mock the Stripe API responses.add( responses.GET, "https://api.stripe.com/v1/charges", json=response_data, ) response = requests.get("https://api.stripe.com/v1/charges") self.assertEqual(response.json(), response_data)

一旦拦截成立就不能再向其他未设定过的URL发请求了不然会报错。

模仿浏览器行为

有些网页会根据不同浏览器发送不同HTML代码（为了反爬或适配设备），可以在发送请求时指定User-Agent将自己伪装成特定浏览器。

import requests http = requests.Session() http.headers.update({ "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" })

综上所有的点，实际使用是这样子的：

self._enable_https = true self.host = xxxx self.port = xxxx class XxxxxXxxxx(object): def _get_api_session(self, timeout=30): address_prefix = 'http://' if self._enable_https: address_prefix = 'https://' #设置URL sess = sessions.BaseUrlSession(base_url=f"{address_prefix}{self.host}:{self.port}") #设置hooks assert_status_hook = lambda response, *args, **kwargs: response.raise_for_status() sess.hooks["response"] = [assert_status_hook] # 重试 retries = Retry(total=3, backoff_factor=1, status_forcelist=[429]) sess.mount(address_prefix, TimeoutHTTPAdapter(max_retries=retries, timeout=timeout)) return sess class TimeoutHTTPAdapter(HTTPAdapter): def __init__(self, *args, **kwargs): self.timeout = 5 if "timeout" in kwargs: self.timeout = kwargs["timeout"] del kwargs["timeout"] super().__init__(*args, **kwargs) def send(self, request, **kwargs): timeout = kwargs.get("timeout") if timeout is None: kwargs["timeout"] = self.timeout return super().send(request, **kwargs)

总结：以上就是Python-Requests库的进阶用法，在实际的代码编写中将会很有用，不管是开发编写API还是测试在编写自动化测试代码，都会极大的提高所编写代码的稳定性。

【本文地址】

今日新闻

hook和loop的区别

Python Requests库进阶用法

Python Requests库进阶用法

今日新闻

推荐新闻