話很少說,直接乾貨走起:html
看到了嗎?字體加密了。那就找。。定位到加密字體的地方,而後看右邊的styles,你是否是看到了與字體相關的名字—font-family,就是他,複製他而後源碼裏面全局搜索,你會看到以下東西:python
woff?就是這個字體文件。正則拿下來,接下來是正題
機器學習
# 獲取源碼裏面的字體加密源碼 base_str = re.findall(r"base64,(.*?)\)",response.text)[0]
將base64編碼的字體字符串解碼成爲二進制格式,寫成.woff文件,再經過BytesIO把一個二進制內存塊當成一個文件來操做
def make_font_file(base64_string: str): #將base64編碼的字體字符串解碼成爲二進制格式 bin_data = base64.decodebytes(base64_string.encode()) with open('testotf.woff','wb') as f: f.write(bin_data) return bin_data def convert_font_to_xml(bin_data): #BytesIO把一個二進制內存塊當成一個文件來操做 font = TTFont(BytesIO(bin_data)) #將解碼字體保存爲xml font.saveXML('test1.xml')
再用畫筆把他畫出來,順便使用pytesseract庫把他識別出來
def draw(path): #接下來就是畫字體了 #建立一張空的畫板,大小爲80*30,默認填充爲白色 im = Image.new('RGB',(90,30),(255,255,255)) #獲取繪製的上下文環境(繪製的起始地址) dr = ImageDraw.Draw(im) #建立要繪製的字體 font = ImageFont.truetype(BytesIO(make_font_file(base_str)),18) #開始繪製(10,5)是起始繪製座標,fill指明字體的填充色爲黑色 dr.text((10,5),text,font=font,fill="#000000") im.save('hh.jpg') result = pytesseract.image_to_string(path).replace(' ','.') return result
這樣就能夠獲得解碼後的數字了。。學習
不過畢竟識別確定不是百分之百能成功的。另外還有一種能夠徹底實現,可是他的對應關係還沒找到,搞了一半,半成品(我發現這個能夠https://www.jianshu.com/p/5aa978e9823d)不過又驗證了一下,發現他的字體庫是在變化的,因此說這個好像不能夠用了。這個是字體解碼對應的軟件下載地址http://www.mydown.com/soft/359/509448859.shtml字體
base_str1 = 'd09GRgABAAAAAAjoAAsAAAAADMAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZW7lgXY21hcAAAAYAAAAC8AAACTG3tR1lnbHlmAAACPAAABFsAAAU8N3aJFmhlYWQAAAaYAAAALwAAADYW4w2KaGhlYQAABsgAAAAcAAAAJAeKAzlobXR4AAAG5AAAABIAAAAwGp4AAGxvY2EAAAb4AAAAGgAAABoIkgcwbWF4cAAABxQAAAAfAAAAIAEZAEZuYW1lAAAHNAAAAVcAAAKFkAhoC3Bvc3QAAAiMAAAAWgAAAI/VTcOReJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2BksmCcwMDKwMHUyXSGgYGhH0IzvmYwYuRgYGBiYGVmwAoC0lxTGBwYKn5MZdb5r8MQw6zDcAUozAiSAwDmCAuqeJzFksEJg0AQRf9Gs4maQ44BO5AUYQuCVViBDZgKYiM5BavxLoIK4t7MX8dLQK/JLG9h/iwzw8wCOAJwyJ24gHpDwdqLqlp0B/6iu3jQv+FK5Yy8rpq4jTrdZ0M4FlNiUlPOM1/sR7ZMMePWsRHN7hzWPSHAgTV9eJT1TqYfmPpf6W+7LPdz9QKSr7DFuhI4PzSxYPfcRoLdd6cFu/M+EzhnDKFgc46FwNljSgRuASYV7L8wpQDvA79tPsh4nE2US2/jVBTH73ViO86rDXHstEmTxklsx07cvHydJnUebSbptJO+J606LW1pNUPpFDqPBbNgpIEZDSA0mh18ANjwWhUkNgioEBukQUi8xAJYsOMrkOHm0RHXkqV7ZN/z+//PORdAAJ7+A7KABQQARs7HhlgZ4GXB8e8JF/EdGAEJMAmAV9TzBsplec7H0iGIX5RESTiIDI7nclkDGXkxStEUb2FxgC9DA+UlUaCpz16enNxbmisd1xu1uavHC2hWpYQIysSCZD2HxgW3GhWjyrjuU1T1XqAAbyWOV4qZQrF1lOBemNmttFx00zxo7n1kaM9Fwpl0nR5XQp1P5SbLWq0cKz/QYgrqMne1/Eo4MbOOtQyodAxBC7RB8bksMjQYxUS0hPpSeA7LoLmuJioqiJLxwB+MGdWkKitmIO61WKjiFYapVlJRwbHpKZbaU5l8IDDCoVos9s7Rzj1m87QlCDGzUJZTVvsGA6caSjXMLQxvpbMBb7bYFiNBG5qY3Xl4a8nV4/uN8BBfgPiAD/X5pB6TN0KTNOIxGHfurvSWXZzQaqlUMnkhOaL6Vi/AY2vnTzkiROM79/OtOzczVfKThiC4o/mNaU2Fw3zYD48uur6NQuf9w/1W2+Eo1NawN0Qvtxt7EwM5bJUoiT0n3NCHIXjEczSLN1EBl1QcmAPZEOyaNvgWvu/k/NNqRHe5XBFtIne5MmsZC12budGqDm3Xb68bhrPzWG5KxirDyAlTKyYIRufFkVAxtFbQPAJD16YPa5VSZWX5659215DS7pyl1pNbi/PpRDI5qN9fuH5nwI/7zSB9XTzM2K/X4CFTFvi7EBMaYZ931LYNSfeYxwxFIkLnPWJ/l2nGM+m789LC1g2rhFD54X5JKOvu/tl/EyTWnwEgTkdwLxvis1N1JEbF//VF1//z3Lg/aJKwwk7QYR+DibFEYEpOuxVi0axJEtxiyvrN6yXT457gZjKJUW7IyVA8vDbV5uTRqCs9zkUyohiazOlzbj3etB/qSK+ua4ZfC9YSAs878XKRfb6f4V3iCRgCgKS93bx9tgr0wlfm7XJBL3AzRZ4fI/NUjCJaYS7LXvWXw5Tss/3rB89qzGCNPMBD7O393m9zHzsE8ZHd8ko5vDfwZEgIb3DQp5K+mFqav20GPcMUG0wXll4yZ4m2TRm7ZCqq3a4mKpW4Bhffbl7MqtvqtHlSRsvPy+qlzR8+nGnQ2cbpqVlimHz9y3OG7r0hAtTvcpwNXwZlaMK80ZWlQXxp5LLdOwQ3HG4uyYD9IaV4tju0SH/kDIUzaigsuyka10opXHnt9YPGq+bUnfqBbjjaMKAok6aWJBSUKy+/2/By5JDgrabHC34HSamBUZvtjesnb7YeP3j0wYYNCet22KhqGWtmYmlWSW7O9zl/wV6dAUfPcYOUIO/J0RIv6R6LAcu+zld2uGKe2LaHX/zjx9HPmT34xNXJrlq/sT2FlzsfM+A/GuL2mQB4nGNgZGBgAOLNnQdmxvPbfGXgZmEAgZsPn/xE0P/PsDAwnQdyORiYQKIAi1MOHgB4nGNgZGBg1vmvwxDDwgACQJKRARXwAAAzYgHNeJxjYQCCFAYGJkviMABCNgK3AAAAAAAAAAwAZgC2APQBRgF0AcQB5gIoAnwCngAAeJxjYGRgYOBhsGJgZgABJiDmAkIGhv9gPgMAD30BYAB4nGWRu27CQBRExzzyAClCiZQmirRN0hDMQ6lQOiQoI1HQG7MGI7+0XpBIlw/Id+UT0qXLJ6TPYK4bxyvvnjszd30lA7jGNxycnnu+J3ZwwerENZzjQbhO/Um4QX4WbqKNF+Ez6jPhFrp4FW7jBm+8wWlcshrjQ9hBB5/CNVzhS7hO/Ue4Qf4VbuLWaQqfoePcCbewcLrCbTw67y2lJkZ7Vq/U8qCCNLE93zMm1IZO6KfJUZrr9S7yTFmW50KbPEwTNXQHpTTTiTblbfl+PbI2UIFJYzWlq6MoVZlJt9q37sbabNzvB6K7fhpzPMU1gYGGB8t9xXqJA/cAKRJqPfj0DFdI30hPSPXol6k5vTV2iIps1a3Wi+KmnPqxVhjCxeBfasZUUiSrs+XY82sjqpbp46yGPTFpKr2ak0RkhazwtlR86i42RVfGn93nCip5t5gh/gPYnXLBAHicbcg7DoAgEIThHV8o4l2AsIilRL2LjZ2JxzcurX/zZYYqKmn6z6BCjQYtOij0GKAxwmAiPOq+zoMTf+6bX2VbG2XnYEWby+9icWYnpoWLHETvMtELFxwXaAAA' baseFont = TTFont(BytesIO(make_font_file(base_str))) def decode_font_advance(font_str): match_font = TTFont(BytesIO(make_font_file(font_str))) numDic = {} uniList = match_font['cmap'].tables[0].ttFont.getGlyphOrder()[1:] baseNumList = ['.', '4', '7', '8', '9', '0', '6', '2', '1', '5', '3'] baseUnicode = ['x','uniEBDD','uniECB5','uniEBC9','uniF0D3','uniECCE','uniEE54','uniE90C','uniE027','uniE4BF','uniE791'] for i in range(11): #找到相應字形對應繪製圖元的對象 matchGlyph = match_font['glyf'][baseUnicode[i]] for j in range(11): baseGlyph = baseFont['glyf'][baseUnicode[j]] #若是相應繪製圖元相等,認爲兩個字形相等 if matchGlyph == baseGlyph: #從已知對應關係列表中查出對應文字 numDic[uniList[i]] = baseNumList[j] print(numDic[uniList[i]],baseNumList[j]) break result = '' j = 0 # 標記小數點位置 for i in range(len(text)): if text[i] != '.': num = 'uni' + text[i].encode('unicode-escape').decode()[2:].upper() if num in numDic: result += numDic[num] else: j = i result = result[0:j] + '.' + result[j:] return result
這一塊的對應關係不對,因此這個暫時能夠不用!!編碼
到這裏,確定都是想要源碼的,這裏直接貼上加密
import re import base64 import requests import lxml.html from io import BytesIO from fontTools.ttLib import TTFont from PIL import Image,ImageDraw,ImageFont from pytesseract import pytesseract #fontTools 對字體進行分析解析及建立 #PIL(pillow)圖片處理庫 #base64 對base64字符串進行編碼和解碼 #pytesseract python 提供的一個操做tesseract庫的封裝 #tesseract 進行文字OCR(將圖片的文字轉換爲文本)識別的機器學習庫 tts(將語音轉換爲文本或文本轉換爲語音) #字體文件本質是編碼到繪製方式的一種映射方式,同時存儲了字體的繪製矩陣 #動態獲取base_str headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0s",} url = 'http://piaofang.maoyan.com/?ver=normal&isid_key='這個本身登錄以後會有的,本身加上 # url = 'http://piaofang.maoyan.com/?ver=normal' response = requests.get(url=url,headers=headers) tree = lxml.html.fromstring(response.text) #獲取加密字體列表裏面的數字是能夠本身選擇,一次只能走一個 text = tree.xpath('//div[@id="ticket_tbody"]/ul[@class="canTouch"]/li/b/i/text()')[0].strip() # 獲取源碼裏面的字體加密源碼 base_str = re.findall(r"base64,(.*?)\)",response.text)[0] def make_font_file(base64_string: str): #將base64編碼的字體字符串解碼成爲二進制格式 bin_data = base64.decodebytes(base64_string.encode()) with open('testotf.woff','wb') as f: f.write(bin_data) return bin_data def convert_font_to_xml(bin_data): #BytesIO把一個二進制內存塊當成一個文件來操做 font = TTFont(BytesIO(bin_data)) #將解碼字體保存爲xml font.saveXML('test1.xml') #先把拿到的字體源碼生成xml格式的,你能夠進去看一下里面是什麼 def draw(path): #接下來就是畫字體了 #建立一張空的畫板,大小爲80*30,默認填充爲白色 im = Image.new('RGB',(90,30),(255,255,255)) #獲取繪製的上下文環境(繪製的起始地址) dr = ImageDraw.Draw(im) #建立要繪製的字體 font = ImageFont.truetype(BytesIO(make_font_file(base_str)),18) #開始繪製(10,5)是起始繪製座標,fill指明字體的填充色爲黑色 dr.text((10,5),text,font=font,fill="#000000") im.save('hh.jpg') result = pytesseract.image_to_string(path).replace(' ','.') return result if __name__ == '__main__': convert_font_to_xml(make_font_file(base_str)) print(draw('hh.jpg'))
我知道你還缺乏pytesseract 庫,https://digi.bib.uni-mannheim.de/tesseract/直接下載吧,記得下載完,修改一下pytesseract.py這個文件下的tesseract_cmd,這個後面跟你的路徑url