生成任務:語言的中外翻譯,你應該知道的NLP生成任務

神經機器翻譯

機器翻譯的目標是將文本從一種語言自動翻譯成另外一種語言,給定一個待翻譯的語言的文本序列, 不存在一個翻譯是當前文本的最佳翻譯。
這是由於人類語言天生的模糊性和靈活性.這使得自動機器翻譯這一挑戰變得困難, 也許這是人工智能中最難的一項挑戰。
常規的機器翻譯方法有統計機器翻譯和神經機器翻譯,這裏咱們主要討論神經機器翻譯。python

從上圖中咱們能夠看到,翻譯的主要任務是在學習源端詞到目標端詞的一種映射關係,同時還包括調序,例如先翻譯了read a book 而不是on Sunday。git

那麼如何評價翻譯質量如何呢?github

  1. 翻譯專員人工評價(準確度更高,但費時費力)
  2. 自動評價(速度快,方便模型迭代,但存在缺陷)
 

實驗操做1

In[36]
# run prediction

!tar -zxf /home/aistudio/data/data13032/ddle_ai_course.t -C /home/aistudio
WORK_PATH = "/home/aistudio/paddle_ai_course"

# decompress pretrained models
!tar -zxf {WORK_PATH}/model_big.tgz -C {WORK_PATH}
!tar -zxf {WORK_PATH}/model_small.tgz -C {WORK_PATH}

!cd {WORK_PATH} && sh infer_small.sh model_small trans_res eval/test_enzh FILE
!cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/small_trans_res test_reference FILE
!cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 13:58:19.488519  2083 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 13:58:19.492918  2083 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 13:58:19.634228  2083 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 13:58:19.638108  2083 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
BLEU = 22.21, 57.5/29.8/16.6/8.6 (BP=1.000, ratio=1.037, hyp_len=2318, ref_len=2236)
last week the US Senate Finance Committee overwhelmingly approved a bill requiring the Treasury to identify a list of countries with " fundamentally misaligned " currency exchange rates . this opens the door for potential economic sanctions to be brought against Beijing .
上週 , 美國參議院 財務 委員會 以 壓倒性 多數 批准 了 一項 法案 , 要求 財政部 肯定 一個 「 根本 錯位 」 匯率 國家 名單 , 這 開啓 了 可能 對 北京 實施 經濟制裁 的 大門 。
 

自動評價

在機器翻譯中,常見的自動評價指標是BLEU,在介紹具體作法以前咱們先引入一些基礎概念算法

N-gram

懲罰因子

BLEU算法

1. N-gram

N-gram是一種統計語言模型,該模型能夠將一句話表示n個連續的單詞序列,利用上下文中相鄰詞間的搭配信息,計算出句子的機率,從而判斷一句話是否通順。
BLEU也是採用了N-gram的匹配規則,經過它可以算出比較譯文和參考譯文之間n組詞的類似的一個佔比。架構

例如:app

1.1 1-gram


 
 

能夠看到機器翻譯6個詞,有5個詞命中參考覺得,那麼它的匹配度爲 5/6。 

 python2.7

1.2 2-gram


 
2元詞組的匹配度則是 3/5。  

 

1.3 3-gram


 
3元詞組的匹配度則是 1/4,4元及以上均爲0  

 

1.4 計算方法的修正

可是還存在一些狀況,經過n-gram是沒辦法反映譯文的正確性的,例如: ide

 

若是計算1-gram,全部的the都匹配上了,匹配度是7/7,這個顯然是錯誤的,因此BLEU修正了這個算法,算N-gram出現次數變爲譯文中和參考譯文中出現次數的最小值

 

因此上面的例子中,1-gram的匹配度爲2/7。oop

因此n-gram的計算方式以下:(公式中的i表明長度爲i-gram)

 post


表示取n-gram在翻譯譯文和參考譯文中出現的最小次數


表示取n-gram在翻譯譯文中出現次數

2. 懲罰因子

咱們再舉一個例子,好比翻譯的句子爲: The dog,參考譯文是: The dog is on the floor. 若是根據上面的公式來計算,得分最後應該是1。
但這個句子翻譯不完整,理論上得分應該比較低,因此咱們引入下面的式子來對得分作一些懲罰。

這裏c爲機器翻譯譯文的詞數,r是參考譯文的詞數。若是c同r的差距很大,那邊BP的值就會很小,那麼最後的得分也會變得很小。

3. BLEU算法

最終BLEU的計算方式以下:

這裏Wi表明了i-gram的權重,通常認爲全部的i-gram的權重至關,爲1/N。

 

實驗操做2

In[37]
# 訓練模型
!cd {WORK_PATH} && sh train_small.sh
+ export FLAGS_eager_delete_tensor_gb=0.0
+ export FLAGS_fraction_of_gpu_memory_to_use=0.98
+ CUDA_VISIBLE_DEVICES=0 python -u src/train.py --src_vocab_fpath ./data/vocab.source --trg_vocab_fpath ./data/vocab.target --train_file_pattern ./data/translate-train-000* --token_delimiter   --use_token_batch True --batch_size 4096 --sort_type pool --pool_size 200000 --fetch_steps 50 save_freq 50 n_head 8 d_model 256 d_inner_hid 1024 n_layer 3 prepostprocess_dropout 0.1 ckpt_path ./model_small
14
['save_freq', '50', 'n_head', '8', 'd_model', '256', 'd_inner_hid', '1024', 'n_layer', '3', 'prepostprocess_dropout', '0.1', 'ckpt_path', './model_small']
10
[2019-09-20 13:58:34,031 INFO train.py:656] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=50, local=True, opts=['save_freq', '50', 'n_head', '8', 'd_model', '256', 'd_inner_hid', '1024', 'n_layer', '3', 'prepostprocess_dropout', '0.1', 'ckpt_path', './model_small'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['0', '<EOS>', 'UNK'], src_vocab_fpath='./data/vocab.source', sync=True, token_delimiter=' ', train_file_pattern='./data/translate-train-000*', trg_vocab_fpath='./data/vocab.target', update_method='pserver', use_mem_opt=True, use_py_reader=False, use_token_batch=True, val_file_pattern=None)
[2019-09-20 13:58:34,158 INFO train.py:707] before adam
memory_optimize is deprecated. Use CompiledProgram and Executor
[2019-09-20 13:58:40,254 INFO train.py:725] local start_up:
W0920 13:58:41.012140  2131 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 13:58:41.016459  2131 device_context.cc:267] device: 0, cuDNN Version: 7.3.
[2019-09-20 13:58:41,043 INFO train.py:505] load checkpoint from ./model_small
[2019-09-20 13:58:41,165 INFO train.py:512] begin reader
[2019-09-20 13:58:46,721 INFO train.py:539] begin executor
I0920 13:58:46.764124  2131 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 13:58:46.832293  2131 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[2019-09-20 13:58:46,870 INFO train.py:561] begin train
[2019-09-20 13:58:47,462 INFO train.py:594] step_idx: 0, epoch: 0, batch: 0, avg loss: 3.179328, normalized loss: 1.775048
[2019-09-20 13:58:52,159 INFO train.py:602] step_idx: 50, epoch: 0, batch: 50, avg loss: 3.173360, normalized loss: 1.769080, speed: 10.64 step/s
[2019-09-20 13:58:57,980 INFO train.py:602] step_idx: 100, epoch: 0, batch: 100, avg loss: 3.126150, normalized loss: 1.721869, speed: 8.59 step/s
[2019-09-20 13:59:06,231 INFO train.py:602] step_idx: 150, epoch: 0, batch: 150, avg loss: 3.107801, normalized loss: 1.703520, speed: 6.06 step/s
[2019-09-20 13:59:14,703 INFO train.py:602] step_idx: 200, epoch: 0, batch: 200, avg loss: 3.102083, normalized loss: 1.697803, speed: 5.90 step/s
[2019-09-20 13:59:23,036 INFO train.py:602] step_idx: 250, epoch: 0, batch: 250, avg loss: 3.259453, normalized loss: 1.855173, speed: 6.00 step/s
^C
Traceback (most recent call last):
  File "src/train.py", line 807, in <module>
    train(args)
  File "src/train.py", line 727, in train
    token_num, predict, pyreader)
  File "src/train.py", line 579, in train_loop
    feed=feed_dict_list)
  File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 280, in run
    return_numpy=return_numpy)
  File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 666, in run
    return_numpy=return_numpy)
  File "/opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 521, in _run_parallel
    tmp.set(tensor, program._places[i])
KeyboardInterrupt
In[38]
# 使用開發集挑選模型
!cd {WORK_PATH} && rm trans_res/*
!cd {WORK_PATH} && sh infer_small.sh trained_models trans_res eval/dev_enzh DIR
!cd {WORK_PATH}/eval && sh eval.sh ../trans_res dev_reference DIR
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 13:59:38.703399  2178 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 13:59:38.708161  2178 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 13:59:38.845857  2178 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 13:59:38.849900  2178 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 13:59:45.884814  2216 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 13:59:45.889055  2216 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 13:59:46.049649  2216 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 13:59:46.053624  2216 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 13:59:53.189880  2254 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 13:59:53.194249  2254 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 13:59:53.344911  2254 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 13:59:53.348949  2254 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:00:00.514881  2292 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:00:00.518779  2292 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:00:00.664440  2292 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:00:00.668455  2292 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:00:07.745339  2330 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:00:07.749605  2330 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:00:07.892396  2330 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:00:07.896674  2330 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
iter_100.infer.model
BLEU = 11.22, 43.9/16.2/6.9/3.2 (BP=1.000, ratio=1.057, hyp_len=2436, ref_len=2305)
iter_150.infer.model
BLEU = 11.54, 44.0/16.5/7.2/3.4 (BP=1.000, ratio=1.057, hyp_len=2437, ref_len=2305)
iter_200.infer.model
BLEU = 10.84, 43.8/16.0/6.6/3.0 (BP=1.000, ratio=1.050, hyp_len=2420, ref_len=2305)
iter_250.infer.model
BLEU = 11.23, 44.0/16.3/6.9/3.2 (BP=1.000, ratio=1.051, hyp_len=2422, ref_len=2305)
iter_50.infer.model
BLEU = 11.56, 44.0/16.4/7.1/3.5 (BP=1.000, ratio=1.054, hyp_len=2430, ref_len=2305)
In[39]
#運行訓練代碼
# 根據挑選出來的訓練模型跑預測,查看在測試集上的表現
!cd {WORK_PATH} && rm trans_res/*
!cd {WORK_PATH} && sh infer_small.sh trained_models/iter_50.infer.model trans_res eval/test_enzh FILE
!cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/small_trans_res test_reference FILE
!cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:00:24.761777  2385 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:00:24.765815  2385 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:00:24.912528  2385 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:00:24.916476  2385 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
BLEU = 22.04, 57.3/29.6/16.4/8.5 (BP=1.000, ratio=1.038, hyp_len=2320, ref_len=2236)
last week the US Senate Finance Committee overwhelmingly approved a bill requiring the Treasury to identify a list of countries with &quot; fundamentally misaligned &quot; currency exchange rates . this opens the door for potential economic sanctions to be brought against Beijing .
上週 , 美國參議院 財務 委員會 以 壓倒性 多數 批准 了 一項 法案 , 要求 財政部 肯定 一個 「 根本 錯位 」 匯率 國家 名單 , 這 開啓 了 可能 對 北京 實施 經濟制裁 的 大門 。
 

Transformer解析

這一節咱們介紹Transformer的總體流程和self-attention的機制

 

總體架構分爲兩個部分,一部分爲encoder,主要是用來對輸入的源語言進行語義化的向量表示;另外一部分爲decoder,解碼器,用來生成目標端的句子。 從圖上能夠看出,Transformer主要由Multi-Head Attention和MLP組成,圖中的Nx表示重複N次,這裏重複表示堆疊多層,
下面咱們具體來看一下Multi-Head Attention的實現: 

 
 

Multi-Head Attention是將輸入的Q,K,V切分紅多個通道,而後在每一個通道上分別計算Scaled Dot-Product Attention,最後再concat起來,Scaled Dot-Product Attention的主要目的是經過Q和K來計算出V中值的權重,對應到翻譯任務中即,在翻譯目標端第T個詞的時候,我須要着重看源端的那些詞來決定翻譯的結果。

 

BeamSearch

在解碼的過程當中,由於搜索空間巨大(指數量級),因此通常會採用剪枝的方式來減小搜索空間,常見的算法爲BeamSearch,下圖爲beam爲2的一個示例。

 
 

綜上,Transformer + BeamSearch便可完成生成序列任務的預測。

 

Deeper/Bigger is Better

在Transformer模型結構下,更寬(hidden_size),更深(層數)的結構通常會顯著帶來效果的變化,接下來,咱們簡單的將hidden_size,layers和heads參數增大,觀察一下在英中翻譯任務上BLEU的變化。

Go Further

  1. bpe(一種新的切詞方法,能夠很大程度緩解oov的問題): 
    https://arxiv.org/abs/1508.07909;
    github: https://github.com/rsennrich/subword-nmt
  2. T2T模型論文:https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf 
  3. 更改模型結構 src/model.py -> def dense_encoder 
 

實驗操做3

In[40]
# 訓練更深更寬的模型
!cd {WORK_PATH} && rm -rf trained_*
!cd {WORK_PATH} && sh train.sh
+ export FLAGS_eager_delete_tensor_gb=0.0
+ export FLAGS_fraction_of_gpu_memory_to_use=0.98
+ CUDA_VISIBLE_DEVICES=0 python -u src/train.py --src_vocab_fpath ./data/vocab.source --trg_vocab_fpath ./data/vocab.target --train_file_pattern ./data/translate-train-000* --token_delimiter   --use_token_batch True --batch_size 4096 --sort_type pool --pool_size 200000 --fetch_steps 10 save_freq 20 n_head 16 d_model 1024 d_inner_hid 4096 prepostprocess_dropout 0.3 ckpt_path ./model_big
12
['save_freq', '20', 'n_head', '16', 'd_model', '1024', 'd_inner_hid', '4096', 'prepostprocess_dropout', '0.3', 'ckpt_path', './model_big']
10
[2019-09-20 14:00:42,344 INFO train.py:656] Namespace(batch_size=4096, device='GPU', enable_ce=False, fetch_steps=10, local=True, opts=['save_freq', '20', 'n_head', '16', 'd_model', '1024', 'd_inner_hid', '4096', 'prepostprocess_dropout', '0.3', 'ckpt_path', './model_big'], pool_size=200000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['0', '<EOS>', 'UNK'], src_vocab_fpath='./data/vocab.source', sync=True, token_delimiter=' ', train_file_pattern='./data/translate-train-000*', trg_vocab_fpath='./data/vocab.target', update_method='pserver', use_mem_opt=True, use_py_reader=False, use_token_batch=True, val_file_pattern=None)
[2019-09-20 14:00:42,665 INFO train.py:707] before adam
memory_optimize is deprecated. Use CompiledProgram and Executor
[2019-09-20 14:01:04,859 INFO train.py:725] local start_up:
W0920 14:01:05.594372  2435 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:01:05.598951  2435 device_context.cc:267] device: 0, cuDNN Version: 7.3.
[2019-09-20 14:01:05,667 INFO train.py:505] load checkpoint from ./model_big
[2019-09-20 14:01:06,284 INFO train.py:512] begin reader
[2019-09-20 14:01:11,938 INFO train.py:539] begin executor
I0920 14:01:12.019623  2435 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:01:12.219225  2435 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[2019-09-20 14:01:12,312 INFO train.py:561] begin train
[2019-09-20 14:01:13,236 INFO train.py:594] step_idx: 0, epoch: 0, batch: 0, avg loss: 2.630473, normalized loss: 1.226193
[2019-09-20 14:01:19,283 INFO train.py:602] step_idx: 10, epoch: 0, batch: 10, avg loss: 2.690705, normalized loss: 1.286424, speed: 1.65 step/s
[2019-09-20 14:01:25,591 INFO train.py:602] step_idx: 20, epoch: 0, batch: 20, avg loss: 2.393026, normalized loss: 0.988745, speed: 1.59 step/s
[2019-09-20 14:01:39,230 INFO train.py:602] step_idx: 30, epoch: 0, batch: 30, avg loss: 2.518420, normalized loss: 1.114139, speed: 0.73 step/s
[2019-09-20 14:01:45,339 INFO train.py:602] step_idx: 40, epoch: 0, batch: 40, avg loss: 2.413766, normalized loss: 1.009486, speed: 1.64 step/s
[2019-09-20 14:02:40,083 INFO train.py:602] step_idx: 50, epoch: 0, batch: 50, avg loss: 2.590959, normalized loss: 1.186678, speed: 0.18 step/s
^C
Traceback (most recent call last):
  File "src/train.py", line 807, in <module>
    train(args)
  File "src/train.py", line 727, in train
    token_num, predict, pyreader)
  File "src/train.py", line 575, in train_loop
    init_flag, dev_count)
  File "src/train.py", line 375, in prepare_feed_dict_list
    ModelHyperParams.d_model)
  File "src/train.py", line 250, in prepare_batch_input
    [inst[0] for inst in insts], src_pad_idx, n_head, is_target=False)
  File "src/train.py", line 233, in pad_batch_data
    return_list += [slf_attn_bias_data.astype("float32")]
KeyboardInterrupt
In[41]
# 使用開發集挑選模型,並在測試集上驗證效果
!cd {WORK_PATH} && rm trans_res/*
!cd {WORK_PATH} && sh infer.sh trained_models trans_res eval/dev_enzh DIR
!cd {WORK_PATH}/eval && sh eval.sh ../trans_res dev_reference DIR
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:02:52.170383  2482 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:02:52.174947  2482 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:02:52.953883  2482 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:02:52.961632  2482 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:03:05.511006  2520 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:03:05.514842  2520 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:03:06.248040  2520 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:03:06.256811  2520 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
iter_20.infer.model
BLEU = 14.35, 47.1/19.3/9.4/5.0 (BP=1.000, ratio=1.046, hyp_len=2412, ref_len=2305)
iter_40.infer.model
BLEU = 14.51, 47.1/19.5/9.5/5.1 (BP=1.000, ratio=1.046, hyp_len=2411, ref_len=2305)
In[44]
# 根據挑選出來的訓練模型跑預測,查看在測試集上的表現
!cd {WORK_PATH} && rm trans_res/*
!cd {WORK_PATH} && sh infer.sh trained_models/iter_40.infer.model trans_res eval/test_enzh FILE
!cd {WORK_PATH}/eval && sh eval.sh {WORK_PATH}/trans_res/big_trans_res test_reference FILE
!cd {WORK_PATH}/eval && sed -r 's/(@@ )|(@@ ?$)//g' test_enzh > input.tok.txt && head -1 input.tok.txt && head -1 predict.tok.txt
memory_optimize is deprecated. Use CompiledProgram and Executor
W0920 14:05:38.702764  2713 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0920 14:05:38.706779  2713 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0920 14:05:39.428685  2713 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0920 14:05:39.436707  2713 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
 

實驗效果

  Base Model Big Model
BLEU 22.21 30.39

能夠看出,更寬更深的模型效果提高很明顯

 

請點擊此處查看本環境基本用法. 
Please click here for more detailed instructions.

點擊連接,使用AI Studio一鍵上手實踐項目吧:https://aistudio.baidu.com/aistudio/projectdetail/120044 

下載安裝命令

## CPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安裝命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 訪問 PaddlePaddle 官網,瞭解更多相關內容

相關文章
相關標籤/搜索