BeautifulSoup實現博文簡介與過濾惡意標籤(xxs攻擊)

1、BeautifulSoup模塊
2、博文簡介
3、過濾惡意標籤
 
 
1、BeautifulSoup模塊
pip install bs4  # 安裝bs4
 
from bs4 import BeautifulSoup  # 導入BeautifulSoup
 
2、博文簡介
from bs4 import BeautifulSoup
 
content = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(content, 'html.parser')
overview = soup.text[0:9]
print(overview)
 
3、過濾惡意標籤
from bs4 import BeautifulSoup
 
content = '<a href="http://example.com/">I linked to <i>example.com</i></a><div><img src=""></img>image</div><a>link</a><script>alert(123)</script>'
soup = BeautifulSoup(content, 'html.parser')
print(soup)  # 這裏帶有script標籤的腳本
 
for tag in soup.find_all():
    if tag.name in ['script', 'link']:
        tag.decompose()
 
print(soup)  # 這裏已經把帶有script標籤的腳本去掉了
相關文章
相關標籤/搜索