Selenium处理异步加载请求获取XHR消息体的2种方法

您所在的位置:网站首页 python爬取xhr Selenium处理异步加载请求获取XHR消息体的2种方法

Selenium处理异步加载请求获取XHR消息体的2种方法

2023-12-11 14:00| 来源: 网络整理| 查看: 265

目录

通过Log读取XHR

简单使用示例

异步加载情况下,不涉及浏览器全局的加载,因此selenium会直接往下执行,这就导致异步结果还没返回,脚本就继续执行了。

方法一、通过Log读取XHR

构造chrome driver:

chrome_options = Options() # -------------------------------------------------------------------- # chrome_options.add_argument("--allow-running-insecure-content") chrome_options.add_argument("--ignore-certificate-errors") chrome_options.add_argument("--disable-single-click-autofill") chrome_options.add_argument("--disable-autofill-keyboard-accessory-view[8]") chrome_options.add_argument("--disable-full-form-autofill-ios") chrome_options.add_experimental_option('perfLoggingPrefs', { 'enableNetwork': True, 'enablePage': False, }) caps = DesiredCapabilities.CHROME caps['goog:loggingPrefs'] = { 'browser': 'ALL', 'performance': 'ALL', } caps['perfLoggingPrefs'] = { 'enableNetwork': True, 'enablePage': False, 'enableTimeline': False } # -------------------------------------------------------------------- # driver = webdriver.Chrome(options=chrome_options, desired_capabilities=caps)

通过log来获取xhr:

def get_xhr_logs(driver): log_xhr_array = [] for typelog in driver.log_types: perfs = driver.get_log(typelog) for row in perfs: log_data = row message_ = log_data['message'] try: log_json = json.loads(message_) log = log_json['message'] if log['method'] == 'Network.responseReceived': # 去掉静态js、css等,仅保留xhr请求 type_ = log['params']['type'] id = log['params']['requestId'] if type_.upper() == "XHR": # log_xhr_array.append(log) log_xhr_array.append(id) except: pass return log_xhr_array

其中,上述中“message”的消息如下:

{ 'method': 'Network.responseReceived', 'params': { 'frameId': '77E0FFEEDA6B3CE3ADACCD6133701429', 'loaderId': 'DA184885509BC77DB2426FCDB768E5FA', 'requestId': '5620.89', 'response': { 'connectionId': 512, 'connectionReused': False, 'encodedDataLength': 295, 'fromDiskCache': False, 'fromPrefetchCache': False, 'fromServiceWorker': False, 'headers': { 'access-control-allow-origin': '*', 'cache-control': 'no-cache', 'content-length': '271', 'content-type': 'application/json', 'date': 'Thu, 14 Apr 2022 08:15:24 GMT', 'via': '1.1 4fe583422d0b309b9b1d4505e54b137c.cloudfront.net (CloudFront)', 'x-amz-cf-id': 'bhkU5eqTsWXmJRXa1AUu2mto5kMsWoWR-ePxEFpXHeS3uUIRd-7seA==', 'x-amz-cf-pop': 'JFK51-C1', 'x-branch-request-id': '95066afcbce046c482bdea654034402a-2022041408', 'x-cache': 'Miss from cloudfront' }, 'mimeType': 'application/json', 'protocol': 'h2', 'remoteIPAddress': '192.154.249.210', 'remotePort': 9000, 'responseTime': 1649924123904.849, 'securityDetails': { 'certificateId': 0, 'certificateTransparencyCompliance': 'unknown', 'cipher': 'AES_128_GCM', 'issuer': 'DigiCert TLS RSA SHA256 2020 CA1', 'keyExchange': '', 'keyExchangeGroup': 'X25519', 'protocol': 'TLS 1.3', 'sanList': ['*.branch.io', 'branch.io'], 'signedCertificateTimestampList': [], 'subjectName': '*.branch.io', 'validFrom': 1635292800, 'validTo': 1669593599 }, 'securityState': 'secure', 'status': 200, 'statusText': '', 'timing': { 'connectEnd': 1470.717, 'connectStart': 0.071, 'dnsEnd': -1, 'dnsStart': -1, 'proxyEnd': -1, 'proxyStart': -1, 'pushEnd': 0, 'pushStart': 0, 'receiveHeadersEnd': 2177.895, 'requestTime': 233026.325475, 'sendEnd': 1471.578, 'sendStart': 1471.22, 'sslEnd': 1470.707, 'sslStart': 961.743, 'workerFetchStart': -1, 'workerReady': -1, 'workerRespondWithSettled': -1, 'workerStart': -1 }, 'url': 'https://api2.branch.io/v1/open' }, 'timestamp': 233028.504486, 'type': 'XHR' } }

通过requestId可以获得详细的消息体:

def get_xhr_body(driver, requestId): response_body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId}) return response_body 简单使用示例 driver.find_element(by=By.XPATH, value='//*[@id="main"]/div[1]/form/button').send_keys(Keys.ENTER) response = None login_type = LoginType.Fail while True: ids = get_xhr_logs(driver) print('>> 等待异步加载中...') if ids: for id in ids: try: body = get_xhr_body(driver, id) response = eval(body['body']) print(response) if response.get('token'): login_type = LoginType.Success break except Exception: pass break time.sleep(0.5) return login_type, response

方法二、使用开源工具selenium-wire

Github:https://github.com/wkeeling/selenium-wire

与selenium无缝衔接,非常好用~~

示例代码后期再补,可先自行前往官网查看。



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3