Tensorflow word2vec編譯運行

  1. Word2vec 更完整版本(demo)的代碼在

    tensorflow/models/embedding/ linux

   

  1. 首先須要安裝bazel 來進行編譯

    bazel能夠下載最新的binary安裝文件,這裏下載0.1.0版本的bazel git

    https://github.com/bazelbuild/bazel/releases/download/0.1.0/bazel-0.1.0-installer-linux-x86_64.sh github

    貌似須要root安裝 session

    sh bazel-0.1.0-installer-linux-x86_64.sh svn

       

  2. 編譯word2vec

    參考README.md ui

    bazel build -c opt tensorflow/models/embedding:all google

       

  3. 下載訓練和驗證數據

    wget http://mattmahoney.net/dc/text8.zip -O text8.gz spa

    gzip -d text8.gz -f .net

    wget https://word2vec.googlecode.com/svn/trunk/questions-words.txt code

       

  4. 運行word2vec

pwd

/home/users/chenghuige/other/tensorflow/bazel-bin/tensorflow/models/embedding

執行命令

./word2vec_optimized --train_data ./data/text8 --eval_data ./data/questions-words.txt --save_path ./data/result/

   

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24

I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 24

I tensorflow/models/embedding/word2vec_kernels.cc:149] Data file: ./data/text8 contains 100000000 bytes, 17005207 words, 253854 unique words, 71290 unique frequent words.

Data file: ./data/text8

Vocab size: 71290 + UNK

Words per epoch: 17005207

Eval analogy file: ./data/questions-words.txt

Questions: 17827

Skipped: 1717

Epoch 1 Step 151381: lr = 0.023 words/sec = 25300

Eval 1419/17827 accuracy = 8.0%

Epoch 2 Step 302768: lr = 0.022 words/sec = 48503

Eval 2445/17827 accuracy = 13.7%

Epoch 3 Step 454147: lr = 0.020 words/sec = 46666

Eval 3211/17827 accuracy = 18.0%

Epoch 4 Step 605540: lr = 0.018 words/sec = 53928

Eval 3608/17827 accuracy = 20.2%

Epoch 5 Step 756907: lr = 0.017 words/sec = 81255

Eval 4081/17827 accuracy = 22.9%

Epoch 6 Step 908251: lr = 0.015 words/sec = 46954

相關文章
相關標籤/搜索