PredictionIO 是一個用Scala編寫的開源機器學習服務器應用,能夠幫助你方便地使用RESTFul API搭建推薦引擎。 PredictionIO的核心使用的是一個可伸縮的機器學習庫,基於Spark一個完整的端到端Pipeline,讓使用者能夠很是簡單的從零開始搭建一個推薦系統。 "python
PredictionIO 是由三個元件所組成:算法
官方有提供快速的一鍵安裝方法,固然也能夠手動安裝。json
$ bash -c "$(curl -s https://install.prediction.io/install.sh)"
$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH
複製代碼
透過如下指定能夠檢查是否安裝成功,會回傳每一種套件所鏈接的情況bash
$ pio status
### Return:
[INFO] [Console$] Inspecting PredictionIO...
[INFO] [Console$] PredictionIO 0.9.6 is installed at ...
[INFO] [Console$] Inspecting Apache Spark...
[INFO] [Console$] Apache Spark is installed at ...
[INFO] [Console$] Apache Spark 1.6.0 detected ...
[INFO] [Console$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: MYSQL)...
[INFO] [Storage$] Verifying Model Data Backend (Source: MYSQL)...
[INFO] [Storage$] Verifying Event Data Backend (Source: MYSQL)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [Console$] (sleeping 5 seconds for all messages to show up...)
[INFO] [Console$] Your system is all ready to go.
複製代碼
先執行 PredictionIO 主程式,針對不一樣的儲存器,有不一樣的執行方法。服務器
$ pio eventserver &
# If you are using PostgreSQL or MySQL, run the following to start PredictionIO Event Server
or
$ pio-start-all
# If instead you are running HBase and Elasticsearch, run the following to start all PredictionIO Event Server, HBase, and Elasticsearch
複製代碼
選擇 Engine Templates 一個適合的 Engine。app
$ pio template get <template-repo-path> <your-app-directory>
$ cd MyRecommendation
複製代碼
能夠從 Engine Templates 選擇,也能夠自定義,在這邊咱們使用 Universal Recommender
做爲範例。curl
執行指定從 Engine 產生一個 APP 並取得對應的 Key。機器學習
$ pio app new MyRecommendation
### Return:
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$] Name: MyRecommendation
[INFO] [App$] ID: 1
[INFO] [App$] Access Key: ...
$ pio app list
### Return:
[INFO] [App$] Name | ID | Access Key | Allowed Event(s)
[INFO] [App$] MyRecommendation | 1 | ... | (all)
[INFO] [App$] Finished listing 1 app(s).
複製代碼
接着要匯入資料,最基本的推薦演算法(Cooperative Filtering, CF)格式支元: user
- action
- item
三種元素。使用 data/import_eventserver.py
能夠將符合格式的資料匯入資料庫。oop
$ curl <sample_data> --create-dirs -o data/<sample_data>
$ python data/import_eventserver.py --access_key <access-key>
複製代碼
...
0::2::3
0::3::1
3::9::4
6::9::1
...
複製代碼
在部署應用程式以前,先在 Engine.json 中設定基礎資料,像是 appName 或是演算法要運行幾回之類的。post
...
"datasource": {
"params" : {
"appName": MyRecommendation
# make sure the appName parameter match your App Name
}
},
...
複製代碼
部署系統到 Web Service 時,過程當中分紅三個步驟: pio build -> pio train -> pio deploy Building 負責準備 Spark 的基礎環境及資料準備。 Training 負責執行演算法建模。 Deployment 則是將結果運行在 Web Service 上,並以 Restful API 開放。
$ pio build
### Return:
[INFO] [Console$] Your engine is ready for training.
$ pio train
### Return:
[INFO] [CoreWorkflow$] Training completed successfully.
$ pio deploy
### Return:
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.
複製代碼
而後就是執行了,預設會開在 port 8000,參數輸入 使用者
即要推薦的 商品數量
。
$ curl -H "Content-Type: application/json" \
-d '{ "user": "1", "num": 4 }' https://localhost:8000/queries.json
### Retnrn:
{
"itemScores":[
{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},
{"item":"75","score":4.046063009943821},
{"item":"68","score":3.8153661512945325}
]
}
複製代碼
本著做由Chang Wei-Yaun (v123582)製做, 以創用CC 姓名標示-相同方式分享 3.0 Unported受權條款釋出。