項目:python
是一個發展中的推薦系統linux
http://www.oschina.net/p/crab git
安裝:github
( 優先用easy_install )算法
* numpydom
Q:遇到缺乏vcvarsall.bat的問題ide
A: 當已經安裝了vs20xx時,可設置環境變量測試
SET VS90COMNTOOLS=%VS100COMNTOOLS%ui
Q:遇到各類編譯錯誤idea
A:直接上win的安裝版,注意和python版本對應
http://sourceforge.net/projects/numpy/files/NumPy/
* Scipy
直接上win的安裝版,注意和python版本對應
http://sourceforge.net/projects/scipy/files/Scipy
* scikits.learn
依賴numpy
* matplotlib
爲了構建文檔和一些示例代碼
依賴較多,可跳過
* crab
easy_install安裝效果很差,import scikits.crab提示找不到
改在 https://github.com/muricoca/crab 下載源碼包,用python setup.py install 安裝
(在linux下也是一樣狀況)
----------------------------------
* 測試
保存爲文件,直接python執行便可
#!/usr/bin/env python #coding=utf-8 def base_demo(): # 基礎數據-測試數據 from scikits.crab import datasets movies = datasets.load_sample_movies() #print movies.data #print movies.user_ids #print movies.item_ids #Build the model from scikits.crab.models import MatrixPreferenceDataModel model = MatrixPreferenceDataModel(movies.data) #Build the similarity # 選用算法 pearson_correlation from scikits.crab.metrics import pearson_correlation from scikits.crab.similarities import UserSimilarity similarity = UserSimilarity(model, pearson_correlation) # 選擇 基於User的推薦 from scikits.crab.recommenders.knn import UserBasedRecommender recommender = UserBasedRecommender(model, similarity, with_preference=True) print recommender.recommend(5) # 輸出個結果看看效果 Recommend items for the user 5 (Toby) # 選擇 基於Item 的推薦(一樣的基礎數據,選擇角度不一樣) from scikits.crab.recommenders.knn import ItemBasedRecommender recommender = ItemBasedRecommender(model, similarity, with_preference=True) print recommender.recommend(5) # 輸出個結果看看效果 Recommend items for the user 5 (Toby) def itembase_demo(): from scikits.crab.models.classes import MatrixPreferenceDataModel from scikits.crab.recommenders.knn.classes import ItemBasedRecommender from scikits.crab.similarities.basic_similarities import ItemSimilarity from scikits.crab.recommenders.knn.item_strategies import ItemsNeighborhoodStrategy from scikits.crab.metrics.pairwise import euclidean_distances movies = { 'Marcel Caraciolo': \ {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0}, \ 'Paola Pow': \ {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, \ 'Leopoldo Pires': \ {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0}, 'Lorena Abreu': \ {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5}, \ 'Steve Gates': \ {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0}, \ 'Sheldom':\ {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, \ 'Penny Frewman': \ {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0, 'Superman Returns':4.0}, 'Maria Gabriela': {} } model = MatrixPreferenceDataModel(movies) items_strategy = ItemsNeighborhoodStrategy() similarity = ItemSimilarity(model, euclidean_distances) recsys = ItemBasedRecommender(model, similarity, items_strategy) print recsys.most_similar_items('Lady in the Water') #Return the recommendations for the given user. print recsys.recommend('Leopoldo Pires') #Return the 2 explanations for the given recommendation. print recsys.recommended_because('Leopoldo Pires', 'Just My Luck', 2) #Return the similar recommends print recsys.most_similar_items('Lady in the Water') #估算評分 print recsys.estimate_preference('Leopoldo Pires','Lady in the Water') base_demo() itembase_demo()
推薦算法:
這裏不細究算法自己,只介紹概念,方便理解crab的實現
* kNN算法
簡單的分類/聚類算法,從訓練集中找到和新數據最接近的k條記錄,而後根據他們的主要分類來決定新數據的類別。
3個主要因素:訓練集、距離或類似的衡量、k的大小
* SVD
帶有社交因素,根據已有的評分狀況,分析出評分者對各個因子的喜愛程度以及電影包含各個因子的程度,最後再反過來根據分析結果預測評分