Beautifulsoup-基礎知識

時間 2019-12-19

標籤 beautifulsoup 基礎知識简体版

原文原文鏈接

soup = BeautifulSoup(html_doc,features='lxml')tag1 = soup.find(name='a')  #找到第一個a標籤，返回一soup對象tag2 = soup.find_all(name='a')  #找到全部a標籤，返回一列表，列表中全部元素爲soup對象tag3 = soup.select('#link2')  #找到id=link2的標籤name = tag3.name  #獲取標籤名attrs = tag3.attrs  #獲取屬性，返回字典類型tag3.attrs = {'href'='www.baidu.com'}  #修改或添加標籤屬性del tag3.attrs['href']  #刪除標籤屬性#判斷是標籤對象仍是文本：from bs4.elementimport Tagtags = soup.find('body').childrenfor tag in tags:　　if type(tag) == Tag:　　　　print(tag)　　else:　　　　print('文本。。。')children  #body中全部兒子標籤descendants  #body中全部子子孫孫標籤body = soup.find('body')v = body.descendantsclear  #將標籤的全部子標籤所有清空（保留body標籤名）soup.find('body').clear()print(soup)decompose  #將標籤的全部子標籤所有清空（包括body標籤名）soup.find('body').decompose()print(soup)extract  #將標籤的全部子標籤所有清空（包括body標籤名）,返回刪除的標籤（相似pop）find_allv = soup.find_all(name=['a','div'])  #找到全部a標籤和div標籤v = soup .find_all(id=['link1','link2'])  #找到全部id=link1或id=link2的標籤import rerep = re.compile('^p')v = soup.find_all(name=rep)  #找以p開頭的全部標籤rep = re.compile('class.*')  # .*匹配除換之外的任意字符，而且有或沒有v = soup.find_all(class_=rep)  #找class等於sister開頭的rep = re.compile('http://www.baidu.com/static/.*')v = soup.find_all(href=rep)  #通常用於匹配頁碼get  #獲取標籤屬性tag = soup.find('a')v = tag.get('id')  #獲取a標籤中的id鍵值has_attr  #判斷是否含有某屬性tag = soup.find('a')v = tag.has_attr('id')  #判斷a標籤是否含有id屬性get_text  #獲取標籤內部文本內容tag = soup.find('a')v = tag.get_text()  #獲取a標籤內部文本內容index  #標籤在某標籤中的索引位置tag = soup.find('body')v = tag.index(tag.find('div'))  #找div標籤在body中的索引位置is_empty_element  #檢查是不是空標籤或自閉合標籤判斷以下標籤：br hr input img meta spacer link  frame base當前標籤的關聯標籤soup.next --->soup.find_next(...)soup.next_element --->soup.find_nexxt_element(...)soup.next_elementssoup.next_siblingsoup.next_siblinstag.previoustag.previous_elementtag.previous_elementstag.previous_siblingtag.previous_siblingstag.parenttag.parentsselect,select_one  #CSS選擇器append  #追加標籤到最後insert  #插入標籤到指定位置warp  #包裹unwarp  #解包裹

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。