神經機器翻譯
機器翻譯的目標是將文本從一種語言自動翻譯成另外一種語言,給定一個待翻譯的語言的文本序列, 不存在一個翻譯是當前文本的最佳翻譯。
這是由於人類語言天生的模糊性和靈活性.這使得自動機器翻譯這一挑戰變得困難, 也許這是人工智能中最難的一項挑戰。
常規的機器翻譯方法有統計機器翻譯和神經機器翻譯,這裏咱們主要討論神經機器翻譯。python
![](http://static.javashuo.com/static/loading.gif)
從上圖中咱們能夠看到,翻譯的主要任務是在學習源端詞到目標端詞的一種映射關係,同時還包括調序,例如先翻譯了read a book 而不是on Sunday。git
那麼如何評價翻譯質量如何呢?github
- 翻譯專員人工評價(準確度更高,但費時費力)
- 自動評價(速度快,方便模型迭代,但存在缺陷)
實驗操做1
# run prediction !tar -zxf /home/aistudio/data/data13032/ddle_ai_course.t -C /home/aistudio WORK_PATH = "/home/aistudio/paddle_ai_course" # decompress pretrained models !tar -zxf {WORK_PATH}/model_big.tgz -C {WORK_PATH} !tar -zxf {WORK_PATH}/model_small.tgz -C {WORK_PATH} !cd {WORK_PATH} && sh infer_small.sh model_small trans_res eval/test_enzh FILE !cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/small_trans_res test_reference FILE !cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor W0920 13:58:19.488519 2083 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 13:58:19.492918 2083 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 13:58:19.634228 2083 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 13:58:19.638108 2083 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 BLEU = 22.21, 57.5/29.8/16.6/8.6 (BP=1.000, ratio=1.037, hyp_len=2318, ref_len=2236) last week the US Senate Finance Committee overwhelmingly approved a bill requiring the Treasury to identify a list of countries with " fundamentally misaligned " currency exchange rates . this opens the door for potential economic sanctions to be brought against Beijing . 上週 , 美國參議院 財務 委員會 以 壓倒性 多數 批准 了 一項 法案 , 要求 財政部 肯定 一個 「 根本 錯位 」 匯率 國家 名單 , 這 開啓 了 可能 對 北京 實施 經濟制裁 的 大門 。
自動評價
在機器翻譯中,常見的自動評價指標是BLEU,在介紹具體作法以前咱們先引入一些基礎概念算法
N-gram
懲罰因子
BLEU算法
1. N-gram
N-gram是一種統計語言模型,該模型能夠將一句話表示n個連續的單詞序列,利用上下文中相鄰詞間的搭配信息,計算出句子的機率,從而判斷一句話是否通順。
BLEU也是採用了N-gram的匹配規則,經過它可以算出比較譯文和參考譯文之間n組詞的類似的一個佔比。架構
例如:app
1.1 1-gram
![](http://static.javashuo.com/static/loading.gif)
能夠看到機器翻譯6個詞,有5個詞命中參考覺得,那麼它的匹配度爲 5/6。
python2.7
1.2 2-gram
![](http://static.javashuo.com/static/loading.gif)
2元詞組的匹配度則是 3/5。
1.3 3-gram
![](http://static.javashuo.com/static/loading.gif)
3元詞組的匹配度則是 1/4,4元及以上均爲0
1.4 計算方法的修正
可是還存在一些狀況,經過n-gram是沒辦法反映譯文的正確性的,例如: ide
![](http://static.javashuo.com/static/loading.gif)
若是計算1-gram,全部的the都匹配上了,匹配度是7/7,這個顯然是錯誤的,因此BLEU修正了這個算法,算N-gram出現次數變爲譯文中和參考譯文中出現次數的最小值
因此上面的例子中,1-gram的匹配度爲2/7。oop
因此n-gram的計算方式以下:(公式中的i表明長度爲i-gram)
post
表示取n-gram在翻譯譯文和參考譯文中出現的最小次數
![](http://static.javashuo.com/static/loading.gif)
表示取n-gram在翻譯譯文中出現次數
2. 懲罰因子
咱們再舉一個例子,好比翻譯的句子爲: The dog,參考譯文是: The dog is on the floor. 若是根據上面的公式來計算,得分最後應該是1。
但這個句子翻譯不完整,理論上得分應該比較低,因此咱們引入下面的式子來對得分作一些懲罰。
這裏c爲機器翻譯譯文的詞數,r是參考譯文的詞數。若是c同r的差距很大,那邊BP的值就會很小,那麼最後的得分也會變得很小。
3. BLEU算法
最終BLEU的計算方式以下:
這裏Wi表明了i-gram的權重,通常認爲全部的i-gram的權重至關,爲1/N。
實驗操做2
# 訓練模型 !cd {WORK_PATH} && sh train_small.sh
+ export FLAGS_eager_delete_tensor_gb=0.0 + export FLAGS_fraction_of_gpu_memory_to_use=0.98 + CUDA_VISIBLE_DEVICES=0 python -u src/train.py --src_vocab_fpath ./data/vocab.source --trg_vocab_fpath ./data/vocab.target --train_file_pattern ./data/translate-train-000* --token_delimiter --use_token_batch True --batch_size 4096 --sort_type pool --pool_size 200000 --fetch_steps 50 save_freq 50 n_head 8 d_model 256 d_inner_hid 1024 n_layer 3 prepostprocess_dropout 0.1 ckpt_path ./model_small 14 ['save_freq', '50', 'n_head', '8', 'd_model', '256', 'd_inner_hid', '1024', 'n_layer', '3', 'prepostprocess_dropout', '0.1', 'ckpt_path', './model_small'] 10 [2019-09-20 13:58:34,031 INFO train.py:656] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=50, local=True, opts=['save_freq', '50', 'n_head', '8', 'd_model', '256', 'd_inner_hid', '1024', 'n_layer', '3', 'prepostprocess_dropout', '0.1', 'ckpt_path', './model_small'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['0', '<EOS>', 'UNK'], src_vocab_fpath='./data/vocab.source', sync=True, token_delimiter=' ', train_file_pattern='./data/translate-train-000*', trg_vocab_fpath='./data/vocab.target', update_method='pserver', use_mem_opt=True, use_py_reader=False, use_token_batch=True, val_file_pattern=None) [2019-09-20 13:58:34,158 INFO train.py:707] before adam memory_optimize is deprecated. Use CompiledProgram and Executor [2019-09-20 13:58:40,254 INFO train.py:725] local start_up: W0920 13:58:41.012140 2131 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 13:58:41.016459 2131 device_context.cc:267] device: 0, cuDNN Version: 7.3. [2019-09-20 13:58:41,043 INFO train.py:505] load checkpoint from ./model_small [2019-09-20 13:58:41,165 INFO train.py:512] begin reader [2019-09-20 13:58:46,721 INFO train.py:539] begin executor I0920 13:58:46.764124 2131 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 13:58:46.832293 2131 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [2019-09-20 13:58:46,870 INFO train.py:561] begin train [2019-09-20 13:58:47,462 INFO train.py:594] step_idx: 0, epoch: 0, batch: 0, avg loss: 3.179328, normalized loss: 1.775048 [2019-09-20 13:58:52,159 INFO train.py:602] step_idx: 50, epoch: 0, batch: 50, avg loss: 3.173360, normalized loss: 1.769080, speed: 10.64 step/s [2019-09-20 13:58:57,980 INFO train.py:602] step_idx: 100, epoch: 0, batch: 100, avg loss: 3.126150, normalized loss: 1.721869, speed: 8.59 step/s [2019-09-20 13:59:06,231 INFO train.py:602] step_idx: 150, epoch: 0, batch: 150, avg loss: 3.107801, normalized loss: 1.703520, speed: 6.06 step/s [2019-09-20 13:59:14,703 INFO train.py:602] step_idx: 200, epoch: 0, batch: 200, avg loss: 3.102083, normalized loss: 1.697803, speed: 5.90 step/s [2019-09-20 13:59:23,036 INFO train.py:602] step_idx: 250, epoch: 0, batch: 250, avg loss: 3.259453, normalized loss: 1.855173, speed: 6.00 step/s ^C Traceback (most recent call last): File "src/train.py", line 807, in <module> train(args) File "src/train.py", line 727, in train token_num, predict, pyreader) File "src/train.py", line 579, in train_loop feed=feed_dict_list) File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 280, in run return_numpy=return_numpy) File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 666, in run return_numpy=return_numpy) File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 521, in _run_parallel tmp.set(tensor, program._places[i]) KeyboardInterrupt
# 使用開發集挑選模型 !cd {WORK_PATH} && rm trans_res/* !cd {WORK_PATH} && sh infer_small.sh trained_models trans_res eval/dev_enzh DIR !cd {WORK_PATH}/eval && sh eval.sh ../trans_res dev_reference DIR
memory_optimize is deprecated. Use CompiledProgram and Executor W0920 13:59:38.703399 2178 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 13:59:38.708161 2178 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 13:59:38.845857 2178 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 13:59:38.849900 2178 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 memory_optimize is deprecated. Use CompiledProgram and Executor W0920 13:59:45.884814 2216 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 13:59:45.889055 2216 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 13:59:46.049649 2216 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 13:59:46.053624 2216 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 memory_optimize is deprecated. Use CompiledProgram and Executor W0920 13:59:53.189880 2254 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 13:59:53.194249 2254 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 13:59:53.344911 2254 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 13:59:53.348949 2254 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:00:00.514881 2292 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:00:00.518779 2292 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:00:00.664440 2292 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:00:00.668455 2292 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:00:07.745339 2330 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:00:07.749605 2330 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:00:07.892396 2330 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:00:07.896674 2330 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 iter_100.infer.model BLEU = 11.22, 43.9/16.2/6.9/3.2 (BP=1.000, ratio=1.057, hyp_len=2436, ref_len=2305) iter_150.infer.model BLEU = 11.54, 44.0/16.5/7.2/3.4 (BP=1.000, ratio=1.057, hyp_len=2437, ref_len=2305) iter_200.infer.model BLEU = 10.84, 43.8/16.0/6.6/3.0 (BP=1.000, ratio=1.050, hyp_len=2420, ref_len=2305) iter_250.infer.model BLEU = 11.23, 44.0/16.3/6.9/3.2 (BP=1.000, ratio=1.051, hyp_len=2422, ref_len=2305) iter_50.infer.model BLEU = 11.56, 44.0/16.4/7.1/3.5 (BP=1.000, ratio=1.054, hyp_len=2430, ref_len=2305)
#運行訓練代碼 # 根據挑選出來的訓練模型跑預測,查看在測試集上的表現 !cd {WORK_PATH} && rm trans_res/* !cd {WORK_PATH} && sh infer_small.sh trained_models/iter_50.infer.model trans_res eval/test_enzh FILE !cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/small_trans_res test_reference FILE !cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:00:24.761777 2385 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:00:24.765815 2385 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:00:24.912528 2385 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:00:24.916476 2385 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 BLEU = 22.04, 57.3/29.6/16.4/8.5 (BP=1.000, ratio=1.038, hyp_len=2320, ref_len=2236) last week the US Senate Finance Committee overwhelmingly approved a bill requiring the Treasury to identify a list of countries with " fundamentally misaligned " currency exchange rates . this opens the door for potential economic sanctions to be brought against Beijing . 上週 , 美國參議院 財務 委員會 以 壓倒性 多數 批准 了 一項 法案 , 要求 財政部 肯定 一個 「 根本 錯位 」 匯率 國家 名單 , 這 開啓 了 可能 對 北京 實施 經濟制裁 的 大門 。
Transformer解析
這一節咱們介紹Transformer的總體流程和self-attention的機制
![](http://static.javashuo.com/static/loading.gif)
總體架構分爲兩個部分,一部分爲encoder,主要是用來對輸入的源語言進行語義化的向量表示;另外一部分爲decoder,解碼器,用來生成目標端的句子。 從圖上能夠看出,Transformer主要由Multi-Head Attention和MLP組成,圖中的Nx表示重複N次,這裏重複表示堆疊多層,
下面咱們具體來看一下Multi-Head Attention的實現:
![](http://static.javashuo.com/static/loading.gif)
![](http://static.javashuo.com/static/loading.gif)
Multi-Head Attention是將輸入的Q,K,V切分紅多個通道,而後在每一個通道上分別計算Scaled Dot-Product Attention,最後再concat起來,Scaled Dot-Product Attention的主要目的是經過Q和K來計算出V中值的權重,對應到翻譯任務中即,在翻譯目標端第T個詞的時候,我須要着重看源端的那些詞來決定翻譯的結果。
![](http://static.javashuo.com/static/loading.gif)
BeamSearch
在解碼的過程當中,由於搜索空間巨大(指數量級),因此通常會採用剪枝的方式來減小搜索空間,常見的算法爲BeamSearch,下圖爲beam爲2的一個示例。
![](http://static.javashuo.com/static/loading.gif)
綜上,Transformer + BeamSearch便可完成生成序列任務的預測。
Deeper/Bigger is Better
在Transformer模型結構下,更寬(hidden_size),更深(層數)的結構通常會顯著帶來效果的變化,接下來,咱們簡單的將hidden_size,layers和heads參數增大,觀察一下在英中翻譯任務上BLEU的變化。
Go Further
- bpe(一種新的切詞方法,能夠很大程度緩解oov的問題):
https://arxiv.org/abs/1508.07909;
github: https://github.com/rsennrich/subword-nmt - T2T模型論文:https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
- 更改模型結構 src/model.py -> def dense_encoder
實驗操做3
# 訓練更深更寬的模型 !cd {WORK_PATH} && rm -rf trained_* !cd {WORK_PATH} && sh train.sh
+ export FLAGS_eager_delete_tensor_gb=0.0 + export FLAGS_fraction_of_gpu_memory_to_use=0.98 + CUDA_VISIBLE_DEVICES=0 python -u src/train.py --src_vocab_fpath ./data/vocab.source --trg_vocab_fpath ./data/vocab.target --train_file_pattern ./data/translate-train-000* --token_delimiter --use_token_batch True --batch_size 4096 --sort_type pool --pool_size 200000 --fetch_steps 10 save_freq 20 n_head 16 d_model 1024 d_inner_hid 4096 prepostprocess_dropout 0.3 ckpt_path ./model_big 12 ['save_freq', '20', 'n_head', '16', 'd_model', '1024', 'd_inner_hid', '4096', 'prepostprocess_dropout', '0.3', 'ckpt_path', './model_big'] 10 [2019-09-20 14:00:42,344 INFO train.py:656] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=10, local=True, opts=['save_freq', '20', 'n_head', '16', 'd_model', '1024', 'd_inner_hid', '4096', 'prepostprocess_dropout', '0.3', 'ckpt_path', './model_big'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['0', '<EOS>', 'UNK'], src_vocab_fpath='./data/vocab.source', sync=True, token_delimiter=' ', train_file_pattern='./data/translate-train-000*', trg_vocab_fpath='./data/vocab.target', update_method='pserver', use_mem_opt=True, use_py_reader=False, use_token_batch=True, val_file_pattern=None) [2019-09-20 14:00:42,665 INFO train.py:707] before adam memory_optimize is deprecated. Use CompiledProgram and Executor [2019-09-20 14:01:04,859 INFO train.py:725] local start_up: W0920 14:01:05.594372 2435 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:01:05.598951 2435 device_context.cc:267] device: 0, cuDNN Version: 7.3. [2019-09-20 14:01:05,667 INFO train.py:505] load checkpoint from ./model_big [2019-09-20 14:01:06,284 INFO train.py:512] begin reader [2019-09-20 14:01:11,938 INFO train.py:539] begin executor I0920 14:01:12.019623 2435 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:01:12.219225 2435 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 [2019-09-20 14:01:12,312 INFO train.py:561] begin train [2019-09-20 14:01:13,236 INFO train.py:594] step_idx: 0, epoch: 0, batch: 0, avg loss: 2.630473, normalized loss: 1.226193 [2019-09-20 14:01:19,283 INFO train.py:602] step_idx: 10, epoch: 0, batch: 10, avg loss: 2.690705, normalized loss: 1.286424, speed: 1.65 step/s [2019-09-20 14:01:25,591 INFO train.py:602] step_idx: 20, epoch: 0, batch: 20, avg loss: 2.393026, normalized loss: 0.988745, speed: 1.59 step/s [2019-09-20 14:01:39,230 INFO train.py:602] step_idx: 30, epoch: 0, batch: 30, avg loss: 2.518420, normalized loss: 1.114139, speed: 0.73 step/s [2019-09-20 14:01:45,339 INFO train.py:602] step_idx: 40, epoch: 0, batch: 40, avg loss: 2.413766, normalized loss: 1.009486, speed: 1.64 step/s [2019-09-20 14:02:40,083 INFO train.py:602] step_idx: 50, epoch: 0, batch: 50, avg loss: 2.590959, normalized loss: 1.186678, speed: 0.18 step/s ^C Traceback (most recent call last): File "src/train.py", line 807, in <module> train(args) File "src/train.py", line 727, in train token_num, predict, pyreader) File "src/train.py", line 575, in train_loop init_flag, dev_count) File "src/train.py", line 375, in prepare_feed_dict_list ModelHyperParams.d_model) File "src/train.py", line 250, in prepare_batch_input [inst[0] for inst in insts], src_pad_idx, n_head, is_target=False) File "src/train.py", line 233, in pad_batch_data return_list += [slf_attn_bias_data.astype("float32")] KeyboardInterrupt
# 使用開發集挑選模型,並在測試集上驗證效果 !cd {WORK_PATH} && rm trans_res/* !cd {WORK_PATH} && sh infer.sh trained_models trans_res eval/dev_enzh DIR !cd {WORK_PATH}/eval && sh eval.sh ../trans_res dev_reference DIR
memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:02:52.170383 2482 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:02:52.174947 2482 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:02:52.953883 2482 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:02:52.961632 2482 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:03:05.511006 2520 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:03:05.514842 2520 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:03:06.248040 2520 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:03:06.256811 2520 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 iter_20.infer.model BLEU = 14.35, 47.1/19.3/9.4/5.0 (BP=1.000, ratio=1.046, hyp_len=2412, ref_len=2305) iter_40.infer.model BLEU = 14.51, 47.1/19.5/9.5/5.1 (BP=1.000, ratio=1.046, hyp_len=2411, ref_len=2305)
# 根據挑選出來的訓練模型跑預測,查看在測試集上的表現 !cd {WORK_PATH} && rm trans_res/* !cd {WORK_PATH} && sh infer.sh trained_models/iter_40.infer.model trans_res eval/test_enzh FILE !cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/big_trans_res test_reference FILE !cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor W0920 14:05:38.702764 2713 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0920 14:05:38.706779 2713 device_context.cc:267] device: 0, cuDNN Version: 7.3. I0920 14:05:39.428685 2713 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0920 14:05:39.436707 2713 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
實驗效果
Base Model | Big Model | |
---|---|---|
BLEU | 22.21 | 30.39 |
能夠看出,更寬更深的模型效果提高很明顯
點擊連接,使用AI Studio一鍵上手實踐項目吧:https://aistudio.baidu.com/aistudio/projectdetail/120044
下載安裝命令
## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle
## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu
>> 訪問 PaddlePaddle 官網,瞭解更多相關內容。