python14次課

正則經常使用語法

熟悉最經常使用的正則語法。html

  • 單字符匹配
    . 匹配除換行符以外的任意一個字符。
    [...] 表示匹配一個字符集集合,如[A-Za-z0-9]表示匹配全部字母和數字。 [^...] 表示匹配除該字符集集合指定字符外的任意字符。如[^0-9]表示匹配除數字以外的全部字符。
    \ 轉義字符,用來改變特殊字符的原有含義(使其表示自己)。
  • 預約義字符集
    \d 表示數字
    \D 表示非數字
    \s 表示空白字符
    \S 表示非空白字符
    \w 表示字母和數字
    \W 表示非字母和數字
  • 字符次數匹配
    * 匹配前一個字符0或者無限次
    + 匹配前一個字符1或者無限次
    ? 匹配前一個字符0或者1次
  • 邊界匹配
    ^ 匹配字符串開頭
    $ 匹配字符串結尾
  • 分組
    (...) 分組
    (?P<NAME>) 分組,而且指定該分組的名稱爲NAME。
    (?P=NAME) 引用名稱爲NAME的分組所匹配到的字符串,配合上一個使用。

    題目一

    從地址http://qwd.jd.com/fcgi-bin/qwd_searchitem_ex?skuid=26878432382%7C1658610413%7C26222795271%7C25168000024%7C11731514723%7C26348513019%7C20000220615%7C4813030%7C25965247088%7C5327182%7C19588651151%7C1780924%7C15495544751%7C10114188069%7C27036535156%7C10123099847%7C26016197600%7C10503200866%7C16675691362%7C15904713681獲得的json字符串,使用正則匹配,查找出商品對應的skuid(商品惟一編碼)和skuimgurl(商品圖片)。
  • 題目分析
  1. 首先使用簡單的爬蟲功能獲得須要匹配的數據;
  2. 根據json字符串的規律編寫對應的正則表達式
  3. 輸出
  • 代碼實現
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import re
    import requests
    
    url = "http://qwd.jd.com/fcgi-bin/qwd_searchitem_ex?skuid=26878432382%7C1658610413%7C26222795271%7C25168000024%7C11731514723%7C26348513019%7C20000220615%7C4813030%7C25965247088%7C5327182%7C19588651151%7C1780924%7C15495544751%7C10114188069%7C27036535156%7C10123099847%7C26016197600%7C10503200866%7C16675691362%7C15904713681"
    session = requests.session()
    r = session.get(url)    #簡單爬蟲使用示意,後面會講到
    html = r.text
    
    reg = re.compile(r"\s*\"skuid\":\"(\d+)\",\s*\S*\s*\S*\s*\"skuimgurl\":\"(\S*.jpg)\"")    #正則表達式
    result = reg.findall(html)
    print(result)    #使用()分組,輸出結果爲2個分組的數據

輸出結果正則表達式

1
[('26878432382', 'https://img20.360buyimg.com/n7/jfs/t18226/169/1318243724/390477/5b0718ff/5ac44edcNa350dbd9.jpg'), ('5327182', 'https://img20.360buyimg.com/n7/jfs/t17461/138/1837663326/68820/5f8da5cd/5ad9b1e2N42bce837.jpg'), ('11731514723', 'https://img20.360buyimg.com/n7/jfs/t19231/337/2147939016/196162/4210a6ae/5aea6250N0235cd05.jpg'), ('19588651151', 'https://img20.360buyimg.com/n7/jfs/t11341/60/1553062810/120774/ab9534ff/5a02c3f4Naebe34b7.jpg'), ('15495544751', 'https://img20.360buyimg.com/n7/jfs/t18088/43/2048465630/167669/dd3c8b7b/5ae12c40N57c98ea8.jpg'), ('16675691362', 'https://img20.360buyimg.com/n7/jfs/t18490/21/2141098141/120513/b3ca521a/5ae90247N3b4909ae.jpg'), ('26222795271', 'https://img20.360buyimg.com/n7/jfs/t19441/291/1597121495/310550/9bc2e141/5ad05fc0N1510cae5.jpg'), ('1780924', 'https://img20.360buyimg.com/n7/jfs/t17167/97/1957869461/43204/d064647b/5adda3e0Ne1d3aa86.jpg'), ('4813030', 'https://img20.360buyimg.com/n7/jfs/t19198/83/1908967366/189260/7538e84b/5adda865N8f547981.jpg'), ('27036535156', 'https://img20.360buyimg.com/n7/jfs/t19399/140/2175516321/123017/41e6d6a8/5aea87d3N9736cc9d.jpg'), ('26348513019', 'https://img20.360buyimg.com/n7/jfs/t14857/240/2643838980/220943/c982fda1/5aaf2002Ndd25bc52.jpg'), ('26016197600', 'https://img20.360buyimg.com/n7/jfs/t19894/76/195725612/190103/23c60ca1/5aeabb94N3e0266bc.jpg'), ('25168000024', 'https://img20.360buyimg.com/n7/jfs/t17629/301/2062161127/434152/aa3560a5/5ae319f9N1ae1146c.jpg'), ('25965247088', 'https://img20.360buyimg.com/n7/jfs/t19270/67/2232771964/253207/25f41fd9/5aea61b0Nfd21a809.jpg'), ('10123099847', 'https://img20.360buyimg.com/n7/jfs/t15511/14/1469153129/729958/b0af0ca1/5a533063N15fea56c.jpg'), ('20000220615', 'https://img20.360buyimg.com/n7/jfs/t16426/172/2638358261/151693/87020840/5ab869ddN30621fec.jpg'), ('15904713681', 'https://img20.360buyimg.com/n7/jfs/t17287/197/2249621651/366556/d36ae213/5aeadb4cN97f413f3.jpg'), ('10114188069', 'https://img20.360buyimg.com/n7/jfs/t19927/88/179058964/386205/afd08ef1/5ae9717fN07f116d9.jpg'), ('10503200866', 'https://img20.360buyimg.com/n7/jfs/t18139/246/1628563908/114414/9315ac7c/5ad0647eNa9f1e2af.jpg'), ('1658610413', 'https://img20.360buyimg.com/n7/jfs/t19411/79/1017814440/108641/1b185d6d/5ab8b479Nd2417e97.jpg')]

 

題目二

根據文件ga10.wms5.jd.com.txt中的內容,分別匹配upstreamlocation{}中的內容,將對應內容分別寫入文件夾upstreamlocation,文件夾中分別是以配置名稱命名的配置內容。顯示結果以下
regularregular。json

  • 題目分析
  1. 正則匹配upstream內容,分組應包括名稱及所有內容,名稱用於文件命名,所有內容用於寫入文件。
  2. 利用os模塊進行文件夾判斷、建立、切換等功能的實現。
  3. 最後寫入文件。
  4. location處理方法基本一致。
  • 代碼實現
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    import codecs
    import re
    import os
    
    regupstream = re.compile(r"\s*(upstream\s+(\S+)\s+{[^}]+})")
    with codecs.open("ga10.wms5.jd.com.txt") as fum:
        upstmlist = regupstream.findall(fum.read())
        if not os.path.exists("upstream"):
            os.mkdir("upstream")
        os.chdir("upstream")
        for item in upstmlist:
            with codecs.open(item[1], "w") as fumw:
                fumw.write(item[0])
        os.chdir("..")
    
    
    reglocation = re.compile(r"\s*(location\s+\/(\S+)\/\s+{[^}]+})")
    with codecs.open("ga10.wms5.jd.com.txt") as flc:
        lcalist = reglocation.findall(flc.read())
        if not os.path.exists("location"):
            os.mkdir("location")
        os.chdir("location")
        for ilocal in lcalist:
            filename1 = ilocal[1]+".conf"
            with codecs.open(filename1, "w") as flcw:
                flcw.write(ilocal[0])

輸出結果
regular_rexsession

相關文章
相關標籤/搜索