爬蟲定位 2 正則表達式 <1>

時間 2019-12-19

原文原文鏈接

# -*- coding:utf-8 -*-
import re
# re 模塊是python中內置的用來支持正則表達式的模塊

# 正則表達式
"""

"""
string = "hello word"
#準備正則
pattern = re.compile("hello")
# 2 使用正則，從大字符串搜索符合正則的字符串
# match()1 正則表達式  2 要查詢的大字符串
    # match() 若是找到告終果，返回對象,沒有找到返回None
    # 要查找的字符串必須位於大字符串的開頭位置才能匹配成功，若是不在
    #匹配失敗，返回None
res = re.match(pattern, string)
# print(res)
if res:
    # group()獲取分組信息，分組信息在compile()正則表達式中設置
    print(res.group())
else:
    print("沒有匹配到數據")
    # search()1 正則表達式  2 要查詢的大字符串
    # search() 若是找到告終果，返回對象,沒有找到返回None
    # 要查找的字符串位於大字符串的任意位置，若是不在
    # 匹配失敗，返回None

res = re.search(pattern, string)
print(res)
if res:
    print(res.group())
string2 = "bacccccsbbafwerewdgfddef"
# .匹配任意字符 *匹配前一個字符0次或無限次
# 默認.*是貪婪模式（儘量多的匹配數據）
pattern = re.compile("a.*b")
res = re.search(pattern, string2)
print("3", res.group())
# 通常使用的是非貪婪模式(儘量少的作數據匹配)
# .*？非貪婪模式
pattern = re.compile("a.*?b")
res = re.search(pattern, string2)
print("4", res.group())
# if res:
#     print(res.group())
# .+? +表示一個字符1次或無限次  .+?非貪婪模式
pattern = re.compile("a.+?b")
res = re.search(pattern, string2)
print("5", res.group())
# |表示或者，兩邊正則符合一個便可,都知足左面爲準
pattern = re.compile("a.*b|c.*?b")
res = re.search(pattern, string2)
print(res.group())
"""
hello
<_sre.SRE_Match object; span=(0, 5), match='hello'>
hello
3 acccccsbb
4 acccccsb
5 acccccsb
acccccsbb
"""

概念：python

正則表達式是對字符串操做的一種邏輯公式，就是用事先定義好的一些特定字符、及這些特定字符的組合，組成一個「規則字符串」，這個「規則字符串」用來表達對字符串的一種過濾邏輯正則表達式