python文本處理

時間 2019-12-10

標籤 python 文本處理欄目 Python 简体版

原文原文鏈接

1.在文本提取URLhtml

這個主要用於爬蟲技術：

把爬取的html頁面保存爲一個字符串，再從字符串中進行提取URL

好比把一個字符串保存在文件中

Now a days you can learn almost anything by just visiting http://www.google.com. But if you are completely new to computers or internet then first you need to leanr those fundamentals. Next
you can visit a good e-learning site like - https://www.codingdict.com to learn further on a variety of subjects.

而後使用findall()函數進行查找和正則表達式有關的實例。
import re

with open("path\url_example.txt") as file:
        for line in file:
            urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', line)
            print(urls)

相關標籤/搜索