Python有哪些好用的語言翻譯方法

時間 2019-12-06

原文原文鏈接

最近有個需求，要將幾萬條數據從日語翻譯成中文。由於數據的獲取和處理用的是python代碼，因此想先嚐試翻譯部分也用python實現。python

目前網上查到的翻譯方法有百度、有道以及谷歌翻譯，下面會對這三個方法進行簡單的測試和分析。若是你們知道有更好的方法（速度快、結果準確），還請分享！web

模塊導入json

1 import re
2 import urllib.parse, urllib.request
3 import hashlib
4 import urllib
5 import random
6 import json
7 import time
8 from translate import Translator

非python自帶的庫，如python google translator，須要手動安裝，命令pip install module_name。api

1. 百度翻譯服務器

 1 appid = 'your_appid'
 2 secretKey = 'your_secretKey'
 3 url_baidu = 'http://api.fanyi.baidu.com/api/trans/vip/translate'
 4 
 5 def translateBaidu(text, f='ja', t='zh'):
 6     salt = random.randint(32768, 65536)
 7     sign = appid + text + str(salt) + secretKey
 8     sign = hashlib.md5(sign.encode()).hexdigest()
 9     url = url_baidu + '?appid=' + appid + '&q=' + urllib.parse.quote(text) + '&from=' + f + '&to=' + t + \
10             '&salt=' + str(salt) + '&sign=' + sign
11     response = urllib.request.urlopen(url)
12     content = response.read().decode('utf-8')
13     data = json.loads(content)
14     result = str(data['trans_result'][0]['dst'])
15     print(result)

參數：text--待翻文本，f--初始語言，t--目標語言，後面方法相似。session

2. 有道翻譯多線程

 1 url_youdao = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=' \
 2       'http://www.youdao.com/'
 3 dict = {}
 4 dict['type'] = 'AUTO'
 5 dict['doctype'] = 'json'
 6 dict['xmlVersion'] = '1.8'
 7 dict['keyfrom'] = 'fanyi.web'
 8 dict['ue'] = 'UTF-8'
 9 dict['action'] = 'FY_BY_CLICKBUTTON'
10 dict['typoResult'] = 'true'
11 
12 def translateYoudao(text):
13     global dict
14     dict['i'] = text
15     data = urllib.parse.urlencode(dict).encode('utf-8')
16     response = urllib.request.urlopen(url_youdao, data)
17     content = response.read().decode('utf-8')
18     data = json.loads(content)
19     result = data['translateResult'][0][0]['tgt']
20     print(result)

參數主要由字典dict指定，發現沒有地方能夠指定語言（多是我沒找到），測試結果是無論輸入什麼語言的文本，輸出均是中文。app

3. 谷歌翻譯dom

 1 url_google = 'http://translate.google.cn'
 2 reg_text = re.compile(r'(?<=TRANSLATED_TEXT=).*?;')
 3 user_agent = r'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ' \
 4                  r'Chrome/44.0.2403.157 Safari/537.36'
 5 
 6 def translateGoogle(text, f='ja', t='zh-cn'):
 7     values = {'hl': 'zh-cn', 'ie': 'utf-8', 'text': text, 'langpair': '%s|%s' % (f, t)}
 8     value = urllib.parse.urlencode(values)
 9     req = urllib.request.Request(url_google + '?' + value)
10     req.add_header('User-Agent', user_agent)
11     response = urllib.request.urlopen(req)
12     content = response.read().decode('utf-8')
13     data = reg_text.search(content)
14     result = data.group(0).strip(';').strip('\'')
15     print(result)

和上面兩種方法同樣，採用的是訪問網頁的形式來進行翻譯。測試

還有一種是利用python谷歌翻譯模塊Translator：

1 def translateGoogle2(text):
2     result = translator.translate(text)
3     print(result)

4. 測試代碼

測試過程：

翻譯5個字串爲一個小的單位，輸出消耗時間；

循環10次爲一個大的單位，輸出消耗時間；

對不一樣的語言字串和循環次數作過屢次測試，發現狀況基本相似，因此這裏選擇了10次。

 1 text_list = ['こんにちは', 'こんばんは', 'おはようございます', 'お休(やす)みなさい', 'お元気(げんき)ですか']
 2 
 3 time_baidu = 0
 4 time_youdao = 0
 5 time_google = 0
 6 time_google2 = 0
 7 
 8 for i in list(range(1, 11)):
 9     time1 = time.time()
10     for text in text_list:
11         translateBaidu(text)
12     time2 = time.time()
13     print('百度翻譯第%s次時間：%s'  %  (i, time2 - time1))
14     time_baidu += (time2 - time1)
15 
16     time1 = time.time()
17     for text in text_list:
18         translateYoudao(text)
19     time2 = time.time()
20     print('有道翻譯第%s次時間：%s' % (i, time2 - time1))
21     time_youdao += (time2 - time1)
22 
23     time1 = time.time()
24     for text in text_list:
25         translateGoogle(text)
26     time2 = time.time()
27     print('谷歌翻譯第%s次時間：%s'  %  (i, time2 - time1))
28     time_google += (time2 - time1)
29 
30     time1 = time.time()
31     for text in text_list:
32         translateGoogle2(text)
33     time2 = time.time()
34     print('谷歌2翻譯第%s次時間：%s' % (i, time2 - time1))
35     time_google2 += (time2 - time1)
36 
37 
38 print('百度翻譯時間：%s' % (time_baidu / 10))
39 print('有道翻譯時間：%s' % (time_youdao / 10))
40 print('谷歌翻譯時間：%s' % (time_google / 10))
41 print('谷歌2翻譯時間：%s' % (time_google2 / 10))

5. 結果分析

日語字串原意爲['你好', '晚上好', '早上好', '晚安', '您還好吧']。

測試代碼輸出結果：

您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第1次時間：0.5849709510803223
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第1次時間：0.46173906326293945
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第1次時間：3.84399676322937
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第1次時間：6.819758892059326
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第2次時間：0.4968142509460449
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第2次時間：0.3870818614959717
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第2次時間：3.5689375400543213
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第2次時間：6.108794450759888
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第3次時間：0.4832003116607666
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第3次時間：0.40560245513916016
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第3次時間：3.875128984451294
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第3次時間：5.547708034515381
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第4次時間：0.4904344081878662
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第4次時間：0.3860180377960205
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第4次時間：3.5466465950012207
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第4次時間：7.052653551101685
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第5次時間：0.4754292964935303
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第5次時間：0.37929368019104004
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第5次時間：3.503594160079956
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第5次時間：4.944894552230835
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第6次時間：0.4637324810028076
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第6次時間：0.3679838180541992
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第6次時間：3.4939000606536865
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第6次時間：4.786132335662842
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第7次時間：0.4783976078033447
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第7次時間：0.3760185241699219
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第7次時間：3.485666036605835
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第7次時間：6.591272592544556
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第8次時間：0.4756813049316406
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第8次時間：0.4083871841430664
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第8次時間：3.3123676776885986
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第8次時間：5.902927875518799
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第9次時間：0.46607208251953125
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第9次時間：0.5259883403778076
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第9次時間：3.919294834136963
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第9次時間：6.256660223007202
您好
晚上好
早上好！
請您休息。
您身體好嗎？
百度翻譯第10次時間：0.5158905982971191
你好
晚安
早上好。
您休息吧、)
好(身體)好嗎?
有道翻譯第10次時間：0.38652658462524414
您好
晚上好
早上好
看看你的假期（康）
當心（元氣）是
谷歌翻譯第10次時間：3.3273775577545166
你好
問候
問候
請休息
照顧 （玄龜） 嗎？
谷歌2翻譯第10次時間：6.155425071716309
百度翻譯時間：0.4930623292922974
有道翻譯時間：0.4084639549255371
谷歌翻譯時間：3.5876910209655763
谷歌2翻譯時間：6.016622757911682