<ul id="eyeki"></ul>

<tfoot id="eyeki"></tfoot>

<fieldset id="eyeki"></fieldset>

Python 爬蟲 利器之Beautiful Soup入門詳解，實戰總結?。?！

網友投稿 1248 2022-05-29

1、簡介

2、解析庫

3、講解

3.1、Tag（標簽選擇器）

3.2、標準選擇器（find、find_all）

3.2.1、find_all()

3.2.2、find()

1、簡介

2、解析庫

3、講解

3.1、Tag（標簽選擇器）

3.2、標準選擇器（find、find_all）

3.2.1、find_all()

3.2.2、find()

3.3、Select選擇器

4、實戰

1、簡介

Beautiful Soup 是一個可以從HTML或XML文件中提取數據的Python庫.它能夠通過你喜歡的轉換器實現慣用的文檔導航,查找,修改文檔的方式.Beautiful Soup會幫你節省數小時甚至數天的工作時間.

2、解析庫

靈活又方便的網頁解析庫，處理高效，支持多種解析器。

利用它不用編寫正則表達式即可方便地實現網頁信息的提取。

3、講解

3.1、Tag（標簽選擇器）

==選擇元素==

import requests from bs4 import BeautifulSoup html = ''' The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

''' #使用BeautifulSoup對網頁代碼進行解析 #我這里使用的是Python標準庫——html.parser soup = BeautifulSoup(html, "html.parser") # 獲取html代碼中的titile標簽 print(soup.title)

注意：這里默認只匹配第一個，如果文章中有多個相同的標簽，而且想要獲取之后的標簽，可根據class值或者一些其他方法進行定位，之后我會一一道來。

==獲取名稱==

print(soup.title.name)

==獲取屬性==

==獲取內容==

==嵌套選擇==

==子節點==

tag的 .contents 屬性可以將tag的子節點以列表的方式輸出

通過tag的 .children 生成器,可以對tag的子節點進行循環

import requests from bs4 import BeautifulSoup html = ''' The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

''' soup = BeautifulSoup(html, "html.parser") print(soup.p.contents) print("="*30) for i in soup.p.children: print(i)

==父節點==

通過 .parent 屬性來獲取某個元素的父節點

通過元素的 .parents 屬性可以遞歸得到元素的所有父輩節點

==兄弟節點==

3.2、標準選擇器（find、find_all）

find_all( name , attrs , recursive , string , **kwargs )

find_all() 方法搜索當前tag的所有tag子節點,并判斷是否符合過濾器的條件

==keyword 參數==

Python爬蟲利器之Beautiful Soup入門詳解，實戰總結?。?！

如果一個指定名字的參數不是搜索內置的參數名,搜索時會把該參數當作指定名字tag的屬性來搜索,如果包含一個名字為 id 的參數,Beautiful Soup會搜索每個tag的”id”屬性.

==自定義參數查找：attrs==

find( name , attrs , recursive , text , **kwargs )

find返回單個元素，find_all返回所有元素

3.3、Select選擇器

==select==

匹配全部

import requests from bs4 import BeautifulSoup html = ''' The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

''' soup = BeautifulSoup(html, "html.parser") print(soup.select("p b")) print(soup.select("p a")) print(soup.select("head title"))

==select_one==

select_one只選擇滿足條件的第一個元素

4、實戰

本次實戰以百度首頁為例

import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" } url = "https://www.baidu.com" response = requests.get(url=url,headers=headers) soup = BeautifulSoup(response.text,"html.parser") #獲取全部class為mnav c-font-normal c-color-t的標簽，進行遍歷 divs = soup.find_all(class_="mnav c-font-normal c-color-t") for div in divs: print(div) print("="*40)

可見獲取成功

接下來獲取每個模塊對應的URL和文本值

for div in divs: print(div['href']) print(div.text)

import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" } url = "https://www.baidu.com" response = requests.get(url=url,headers=headers) soup = BeautifulSoup(response.text,"html.parser") #第一種方法 #通過contents，獲取子節點信息 a_data = soup.find(class_="hot-title").contents print(a_data[0].text) #第二種方法 #先通過find使用class值定位，在使用find找到其下的div標簽也就是我們需要的 a_data2 = soup.find(class_="hot-title").find("div") print(a_data2.text)

博主會持續更新，有興趣的小伙伴可以、關注和下哦，你們的支持就是我創作最大的動力！

HTML

標簽：Python 爬蟲利器 Beautiful Soup

辦公 自動化(三) | 借助服務器定時爬數據發郵件">python辦公 自動化(三) | 借助服務器定時爬數據發郵件

1248 2022-05-29

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

1248 2022-05-29

Python 庫的安裝">Elasticsearch Python 庫的安裝

1248 2022-05-29

<tfoot id="qgegi"></tfoot>

<fieldset id="qgegi"></fieldset>

Python 爬蟲 利器之Beautiful Soup入門詳解，實戰總結?。?！

辦公 自動化(三) | 借助服務器定時爬數據發郵件">python辦公 自動化(三) | 借助服務器定時爬數據發郵件

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

Python 庫的安裝">Elasticsearch Python 庫的安裝

推薦文章

企業生產管理是什么，企業生產管理軟件

進盤點進銷存軟件排行榜前十名

進銷存系統哪個簡單好用？進銷存系統優點

工廠生產管理（工廠生產管理流程及制度）

生產管理軟件，機械制造業生產管理，制造業生產過程管理軟件

進銷存軟件和ERP有什么區別？進銷存與erp軟件理解

進銷存如何進行庫存管理

如何利用excel制作銷售訂單管理系統？

數據庫訂單管理系統有哪些功能？數據庫訂單管理系統怎么設計？

什么是數據庫管理系統？

最近發表

熱評文章

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

進銷存庫存管理 系統（智慧進銷存）">智能進銷存庫存管理系統（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

WPS2016怎么繪制簡單的價格表?

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工

進銷存庫存管理盤點">簡單進銷存庫存管理盤點

友情鏈接

Python爬蟲利器之Beautiful Soup入門詳解，實戰總結?。?！

辦公自動化(三) | 借助服務器定時爬數據發郵件">python辦公自動化(三) | 借助服務器定時爬數據發郵件

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

Python 庫的安裝">Elasticsearch Python 庫的安裝

推薦文章

最近發表

熱評文章

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

進銷存庫存管理系統（智慧進銷存）">智能進銷存庫存管理系統（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工

進銷存庫存管理盤點">簡單進銷存庫存管理盤點

友情鏈接

Python 爬蟲利器之Beautiful Soup入門詳解，實戰總結?。?！

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工