欧美亚洲中文,在线国自产视频,欧洲一区在线观看视频,亚洲综合中文字幕在线观看

<dfn id="rfwes"></dfn>

<object id="rfwes"></object>

當(dāng)前位置：站長(zhǎng)資訊網(wǎng) > 編程知識(shí) > 正文

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

2022-10-08 分類：編程知識(shí) 閱讀(5148) 評(píng)論(0)

芯片大家都不陌生。在當(dāng)今疫情下，顯卡，車機(jī)的芯片產(chǎn)量銳減影響了不少人的購(gòu)物需求（反正你也買不到），也讓不少人重新認(rèn)識(shí)了半導(dǎo)體行業(yè)。閑來(lái)無(wú)事，我們可以獲取一下T網(wǎng)站的芯片庫(kù)存和芯片信息。

一、列表頁(yè)請(qǐng)求分析

進(jìn)入頁(yè)面，就能看到我們需求的信息了。

但是，在頁(yè)面請(qǐng)求完成之前，有一點(diǎn)點(diǎn)不對(duì)勁，就是頁(yè)面的各個(gè)部份請(qǐng)求的速度是不一樣的：

所以啊，需要的數(shù)據(jù)，大概率不是簡(jiǎn)單的get請(qǐng)求，所以要進(jìn)一步去看，特意在開(kāi)發(fā)者模式—Fetch/XHR選項(xiàng)卡中有一個(gè)請(qǐng)求，返回值正好是我們需要的內(nèi)容：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

程序員必備接口測(cè)試調(diào)試工具：立即使用
Apipost = Postman + Swagger + Mock + Jmeter
Api設(shè)計(jì)、調(diào)試、文檔、自動(dòng)化測(cè)試工具
后端、前端、測(cè)試，同時(shí)在線協(xié)作，內(nèi)容實(shí)時(shí)同步

這一條鏈接返回了所有的數(shù)據(jù)，無(wú)需翻頁(yè)，下面開(kāi)始請(qǐng)求鏈接。

二、列表頁(yè)請(qǐng)求

根據(jù)上面的鏈接，直接get請(qǐng)求，分析json即可，上代碼：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取


 def getItemList():        url = "https://www.xx.com.cn/selectiontool/paramdata/family/3658/results?lang=cn&output=json"        headers = {            'authority': 'www.xx.com.cn',            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",        }        res = getRes(url,headers,'','','GET')//自己寫的請(qǐng)求方法        nodes = res.json()['ParametricResults']        for node in nodes:            data = {}            data["itemName"] = node["o3"] #名稱            data["inventory"] = node["p3318"] #庫(kù)存            data["price"] = node["p1130"]['multipair1']['l'] #價(jià)格            data["infoUrl"] = f"https://www.xx.com.cn/product/cn/{node['o1']}"#詳情URL
登錄后復(fù)制

分析上面的json，可知 o3 是商品名，p3318是庫(kù)存，p1130里面的內(nèi)容有一個(gè)帶單位的價(jià)格，o1是型號(hào)，可湊出詳情鏈接，下面是請(qǐng)求結(jié)果：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

三、詳情頁(yè)分析

終于拿到詳情頁(yè)鏈接了，該獲取剩下的內(nèi)容了。

打開(kāi)開(kāi)發(fā)者模式，沒(méi)有額外的請(qǐng)求，只有一個(gè)包含內(nèi)容的get請(qǐng)求。

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

那直接請(qǐng)求不就得了，上代碼：


def getItemInfo(url):       logger.info(f'正在請(qǐng)求詳情url-{url}')       headers = {           'authority': 'www.xx.com.cn',           'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",           'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",           'referer':'https://www.xx.com.cn/product/cn/THS4541-DIE',           }        res = getRes(url, headers,'', '', 'GET')//自己寫的請(qǐng)求方法        content = res.content.decode('utf-8')
登錄后復(fù)制

但是發(fā)現(xiàn)，請(qǐng)求的詳情頁(yè)，跟開(kāi)發(fā)者模式的預(yù)覽怎么不太一樣？

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

我這里的第一反應(yīng)就覺(jué)得，完了，這個(gè)需要cookie。

繼續(xù)分析，清屏開(kāi)發(fā)者模式，清除cookie，再次訪問(wèn)詳情鏈接，在All選項(xiàng)卡中，可以發(fā)現(xiàn)：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

本以為該請(qǐng)求一次的詳情頁(yè)鏈接請(qǐng)求了兩次，兩次中間還有一個(gè)xhr請(qǐng)求。

預(yù)覽第一次請(qǐng)求，可以發(fā)現(xiàn)跟剛才本地請(qǐng)求的內(nèi)容相差無(wú)幾：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

所以問(wèn)題出在第二次的請(qǐng)求，進(jìn)一步分析：

查看第二次的get請(qǐng)求，與第一次的請(qǐng)求相差了一堆cookie

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

簡(jiǎn)化cookie，發(fā)現(xiàn)這些cookie最關(guān)鍵的參數(shù)是ak_bmsc這一部分，而這一部分參數(shù)，就來(lái)自上一個(gè)xhr請(qǐng)求中的響應(yīng)頭set-cookie中：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

分析這個(gè)xhr請(qǐng)求，請(qǐng)求鏈接

這是個(gè)post請(qǐng)求，先從payload參數(shù)下手：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

這個(gè)bm-verify參數(shù)，是不是有些眼熟？這就是第一次的get請(qǐng)求返回的內(nèi)容嗎，下面還有一個(gè)pow參數(shù)：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

"pow":j，這個(gè)j參數(shù)就在上面，聲明了i和兩個(gè)拼接的數(shù)字字符串轉(zhuǎn)成int之后相加之后的結(jié)果：

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

通過(guò)這一系列請(qǐng)求，返回了最終get請(qǐng)求所需要的cookie，講的比較瑣碎，上代碼：


 #詳情需要cookie    def getVerify(url):        infourl = url        headers = {            'authority': 'www.xx.com.cn',            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",        }        proxies = getApiIp()//取代理        if proxies:            #無(wú)cookie訪問(wèn)詳情頁(yè)拿參數(shù)bm-verify,pow            res = getRes(infourl,headers,proxies,'','GET')            if res:                #拿第一次請(qǐng)求的ak_bmsc                cookie = re.findall("ak_bmsc=.*?;",res.headers['set-cookie'])[0]                #拿bm-verify                verifys = re.findall('"bm-verify": "(.*?)"', res.text)[0]                #合并字符串轉(zhuǎn)int相加取pow                a = re.findall('var i = (d+);',res.text)[0]                b = re.findall('Number("(.*?)");',res.text)[0]                b = int(b.replace('" + "',''))                pow = int(a)+b                post_data = {                    'bm-verify': verifys,                    'pow':pow                }                #轉(zhuǎn)json                post_data = json.dumps(post_data)                if verifys:                    logger.info('第一次參數(shù)獲取完畢')                    return post_data,proxies,cookie                else:                    print('verify獲取異常')            else:                print('verify請(qǐng)求出錯(cuò)')         # 第二次帶參數(shù)訪問(wèn)驗(yàn)證鏈接    def getCookie(url):        post_headers = {            "authority": "www.xx.com.cn",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",            "accept": "*/*",            "content-type": "application/json",            "origin": "https://www.xx.com.cn",            "referer":url,        }        post_data,proxies,c_cookie = getVerify(url)        post_headers['Cookie'] = c_cookie        posturl = "https://www.xx.com.cn/_sec/verify?provider=interstitial"        check = getRes(posturl,post_headers,proxies,post_data,'POST')        if check:        #從請(qǐng)求頭拿到ak_bmsc cookie            cookie = check.headers['Set-Cookie']            cookie = re.findall("ak_bmsc=.*?;",cookie)[0]            if cookie:                logger.info('Cookie獲取完畢')                return cookie,proxies            else:                print('cookie獲取異常')        else:            print('cookie請(qǐng)求出錯(cuò)')
登錄后復(fù)制

簡(jiǎn)單的概括一下詳情頁(yè)的請(qǐng)求流程：

第一次請(qǐng)求，取得所需參數(shù)bm-verify，pow，cookie，提供給下一次的post請(qǐng)求（getVerify方法）

第二次請(qǐng)求，根據(jù)已知條件進(jìn)行post請(qǐng)求，并獲取響應(yīng)頭cookie的ak_bmsc（getCookie）

切記，在整個(gè)獲取cookie的三次請(qǐng)求過(guò)程中，第二、三兩次請(qǐng)求都需要伴隨著上一次請(qǐng)求的ak_bmsc作為cookie傳遞，第二次請(qǐng)求需要第一次的ak_bmsc，最終請(qǐng)求需要第二次的ak_bmsc。

四、詳情頁(yè)請(qǐng)求


 def getItemInfo(url):        logger.info(f'正在請(qǐng)求詳情url-{url}')        cookie,proxies = getCookie(url)        headers = {            'authority': 'www.xx.com.cn',            'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",            'referer':'https://www.xx.com.cn/product/cn/THS4541-DIE',            'cookie':cookie        }        res = getRes(url, headers,proxies, '', 'GET')        content = res.content.decode('utf-8')        print(content)        exit()        sel = Selector(text=content)        Parameters = sel.xpath('//ti-tab-panel[@tab-title="參數(shù)"]/ti-view-more/div').extract_first()        Features = sel.xpath('//ti-tab-panel[@tab-title="特性"]/ti-view-more/div').extract_first()        Description = sel.xpath('//ti-tab-panel[@tab-title="描述"]/ti-view-more').extract_first()        if Parameters and Features and Description:            return Parameters,Features,Description
登錄后復(fù)制

通過(guò)上一步cookie的獲取，帶著cookie再次訪問(wèn)詳情鏈接，就可以順利的獲取內(nèi)容并可以使用xpath進(jìn)行解析，獲取需要的內(nèi)容。

五、代理設(shè)置

T網(wǎng)站詳情頁(yè)帶cookie請(qǐng)求有100多次，如果用本地代理一直去請(qǐng)求，會(huì)有IP封鎖的可能性出現(xiàn)，導(dǎo)致無(wú)法正常獲取。所以，需要高效請(qǐng)求的話，優(yōu)質(zhì)穩(wěn)定的代理IP必不可少，我這里使用的ipidea代理請(qǐng)求的T網(wǎng)站，數(shù)據(jù)很快就訪問(wèn)出來(lái)了。

地址：http://www.ipidea.net/?utm-source=csdn&utm-keyword=?wb ，首次可以白嫖流量哦。本次使用的api獲取，代碼如下：


 # api獲取ip    def getApiIp():        # 獲取且僅獲取一個(gè)ip        api_url = 'http://tiqu.ipidea.io:81/abroad?num=1&type=2&lb=1&sb=0&flow=1?ions=&port=1'        res = requests.get(api_url, timeout=5)        try:            if res.status_code == 200:                api_data = res.json()['data'][0]                proxies = {                    'http': 'http://{}:{}'.format(api_data['ip'], api_data['port']),                    'https': 'http://{}:{}'.format(api_data['ip'], api_data['port']),                }                print(proxies)                return proxies            else:                print('獲取失敗')        except:            print('獲取失敗')
登錄后復(fù)制

六、代碼匯總


 # coding=utf-8    import requests    from scrapy import Selector    import re    import json    from loguru import logger         # api獲取ip    def getApiIp():        # 獲取且僅獲取一個(gè)ip        api_url = '獲取代理地址'        res = requests.get(api_url, timeout=5)        try:            if res.status_code == 200:                api_data = res.json()['data'][0]                proxies = {                    'http': 'http://{}:{}'.format(api_data['ip'], api_data['port']),                    'https': 'http://{}:{}'.format(api_data['ip'], api_data['port']),                }                print(proxies)                return proxies            else:                print('獲取失敗')        except:            print('獲取失敗')         def getItemList():        url = "https://www.xx.com.cn/selectiontool/paramdata/family/3658/results?lang=cn&output=json"        headers = {            'authority': 'www.xx.com.cn',            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",        }        proxies = getApiIp()        if proxies:            # res = requests.get(url, headers=headers, proxies=proxies)            res = getRes(url,headers,proxies,'','GET')            nodes = res.json()['ParametricResults']            for node in nodes:                data = {}                data["itemName"] = node["o3"] #名稱                data["inventory"] = node["p3318"] #庫(kù)存                data["price"] = node["p1130"]['multipair1']['l'] #價(jià)格                data["infoUrl"] = f"https://www.ti.com.cn/product/cn/{node['o1']}"#詳情URL                Parameters, Features, Description = getItemInfo(data["infoUrl"])                data['Parameters'] = Parameters                data['Features'] = Features                data['Description'] = Description                print(data)         #詳情需要cookie    def getVerify(url):        infourl = url        headers = {            'authority': 'www.xx.com.cn',            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",        }        proxies = getApiIp()        if proxies:            #訪問(wèn)詳情頁(yè)拿參數(shù)bm-verify,pow            res = getRes(infourl,headers,proxies,'','GET')            if res:                #拿第一次請(qǐng)求的ak_bmsc                cookie = re.findall("ak_bmsc=.*?;",res.headers['set-cookie'])[0]                #拿bm-verify                verifys = re.findall('"bm-verify": "(.*?)"', res.text)[0]                #字符串轉(zhuǎn)int相加取pow                a = re.findall('var i = (d+);',res.text)[0]                b = re.findall('Number("(.*?)");',res.text)[0]                b = int(b.replace('" + "',''))                pow = int(a)+b                post_data = {                    'bm-verify': verifys,                    'pow':pow                }                #轉(zhuǎn)json                post_data = json.dumps(post_data)                if verifys:                    logger.info('第一次參數(shù)獲取完畢')                    return post_data,proxies,cookie                else:                    print('verify獲取異常')            else:                print('verify請(qǐng)求出錯(cuò)')         # 第二次帶參數(shù)訪問(wèn)驗(yàn)證鏈接    def getCookie(url):        post_headers = {            "authority": "www.xx.com.cn",            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",            "accept": "*/*",            "content-type": "application/json",            "origin": "https://www.xx.com.cn",            "referer":url,        }        post_data,proxies,c_cookie = getVerify(url)        post_headers['Cookie'] = c_cookie        posturl = "https://www.xx.com.cn/_sec/verify?provider=interstitial"        check = getRes(posturl,post_headers,proxies,post_data,'POST')        if check:        #從請(qǐng)求頭拿到ak_bmsc cookie            cookie = check.headers['Set-Cookie']            cookie = re.findall("ak_bmsc=.*?;",cookie)[0]            if cookie:                logger.info('Cookie獲取完畢')                return cookie,proxies            else:                print('cookie獲取異常')        else:            print('cookie請(qǐng)求出錯(cuò)')         def getItemInfo(url):        logger.info(f'正在請(qǐng)求詳情url-{url}')        cookie,proxies = getCookie(url)        headers = {            'authority': 'www.xx.com.cn',            'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",            'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",            'referer':'https://www.xx.com.cn/product/cn/THS4541-DIE',            'cookie':cookie        }        res = getRes(url, headers,proxies, '', 'GET')        content = res.content.decode('utf-8')        sel = Selector(text=content)        Parameters = sel.xpath('//ti-tab-panel[@tab-title="參數(shù)"]/ti-view-more/div').extract_first()        Features = sel.xpath('//ti-tab-panel[@tab-title="特性"]/ti-view-more/div').extract_first()        Description = sel.xpath('//ti-tab-panel[@tab-title="描述"]/ti-view-more').extract_first()        if Parameters and Features and Description:            return Parameters,Features,Description         #專門發(fā)送請(qǐng)求的方法,代理請(qǐng)求三次，三次失敗返回錯(cuò)誤    def getRes(url,headers,proxies,post_data,method):        if proxies:            for i in range(3):                try:                    # 傳代理的post請(qǐng)求                    if method == 'POST':                        res = requests.post(url,headers=headers,data=post_data,proxies=proxies)                    # 傳代理的get請(qǐng)求                    else:                        res = requests.get(url, headers=headers,proxies=proxies)                    if res:                        return res                except:                    print(f'第{i}次請(qǐng)求出錯(cuò)')                else:                    return None        else:            for i in range(3):                proxies = getApiIp()                try:                    # 請(qǐng)求代理的post請(qǐng)求                    if method == 'POST':                        res = requests.post(url, headers=headers, data=post_data, proxies=proxies)                    # 請(qǐng)求代理的get請(qǐng)求                    else:                        res = requests.get(url, headers=headers, proxies=proxies)                    if res:                        return res                except:                    print(f"第{i}次請(qǐng)求出錯(cuò)")                else:                    return None         if __name__ == '__main__':       getItemList()
登錄后復(fù)制

基于Python通過(guò)cookie對(duì)某芯片網(wǎng)站信息的獲取

通過(guò)上述步驟，已經(jīng)能獲取所需內(nèi)容。

總結(jié)

整個(gè)T網(wǎng)站的數(shù)據(jù)獲取，難點(diǎn)就在詳情頁(yè)的cookie，（其實(shí)也不是很難，只不過(guò)cookie太長(zhǎng)比較費(fèi)眼）理順了整個(gè)請(qǐng)求流程，剩下的就是請(qǐng)求的過(guò)程。穩(wěn)定高效的IP代理會(huì)讓你事半功倍，通過(guò)api獲取可變的代理也不易被網(wǎng)站封禁，從而更好地獲取數(shù)據(jù)。簡(jiǎn)化cookie的時(shí)候使用合適的請(qǐng)求工具會(huì)更方便，比如postman，burp。

這次的整個(gè)流程到此結(jié)束，講的比較啰嗦，若有錯(cuò)誤或者更好的方法請(qǐng)大佬指正！

【

贊(0)

標(biāo)簽：AI AMD app Description inter list php python set source word 代理地址顯卡程序員購(gòu)物

相關(guān)推薦

網(wǎng)站地圖滬ICP備18035694號(hào)-2

滬公網(wǎng)安備31011702889846號(hào)