1、協做型過濾(Collaborative Filtering)
python
2、尋找相近用戶網站
數據集ui
critics = { 'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0}, 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0}, 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5}, 'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0}, 'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, 'Toby': {'Snakes on a Plane':4.5, 'You, Me and Dupree':1.0, 'Superman Returns':4.0} }
Lady in the Water | Snakes on a plane | Just My Luck | Superman Returns | You, Me and Dupree | The Night Listener | |
Rose | 2.5 | 3.5 | 3.0 | 3.5 | 2.5 | 3.0 |
Seymour | 3.0 | 3.5 | 5.0 | 5.0 | 2.5 | 3.0 |
Phillips | 2.5 | 3.0 | 3.5 | 3.5 | 4.0 | |
Puig | 3.5 | 3.0 | 4.0 | 2.5 | 4.5 | |
LaSalle | 3.0 | 4.0 | 2.0 | 3.0 | 2.0 | 3.0 |
Mattnews | 3.0 | 4.0 | 5.0 | 3.5 | 3.0 | |
Toby | ? | 4.5 | ? | 4.0 | 1.0 | ? |
歐幾里得距離spa
>> from math import sqrtcode
>>sqrt(pow(x1-x2,2) + pow(y1-y2,2))blog
>> 1 / (1 + sqrt(pow(x1-x2,2) + pow(y1-y2,2))) ==> 歸一化 0~1排序
def sim_distance(prefs, person1, person2): si = {} for item in prefs[person1]: ===> 尋找p1和p2經過評論過的movie if item in prefs[person2]: si[item] = 1 if len(si) == 0: return 0 sum_of_squares = 0.0 for item in prefs[person1]: ==> 歐幾里得距離公式計算類似度 if item in prefs[person2]: sum_of_squares += pow(prefs[person1][item] - prefs[person2][item], 2) return 1 / (1 + sum_of_squares) ==> 歸一化
皮爾遜相關度
ip
http://lobert.iteye.com/blog/2024999get
def sim_pearson(prefs, p1, p2): si = {} for item in prefs[p1]: if item in prefs[p2]: si[item] = 1 n = len(si) if n == 0: return 1 sum1 = 0.0 sum2 = 0.0 sum1Sq = 0.0 sum2Sq = 0.0 pSum = 0.0 for it in si: sum1 += prefs[p1][it] sum2 += prefs[p2][it] sum1Sq += pow(prefs[p1][it], 2) sum2Sq += pow(prefs[p2][it], 2) pSum += prefs[p1][it] * prefs[p2][it] num = pSum - (sum1 * sum2 / n) den = sqrt((sum1Sq - pow(sum1, 2) / n) * (sum2Sq - pow(sum2, 2) / n)) if den == 0: return 0 return num / den
推薦物品
it
爲Toby推薦:
計算全部用戶與Toby的類似度(sim_distance,sim_pearson)
def getRecommendations(prefs, person, similarity=sim_pearson): totals = {} simSums = {} for other in prefs: if other == person: continue # sim = similarity(prefs, person, other) ==> 計算參數person與其餘全部用戶的類似度 if sim <= 0: continue for item in prefs[other]: if item not in prefs[person] or prefs[person][item] == 0: ==> 推薦沒有看過的movie totals.setdefault(item, 0) totals[item] += prefs[other][item] * sim simSums.setdefault(item, 0) simSums[item] += sim # rankings = [] # for item,total in totals.items(): # rankings[total / simSums[item]] = item rankings = [(total / simSums[item], item) for item, total in totals.items()] rankings.sort() ==> 按照類似度降序排序 rankings.reverse() return rankings
類似度 | Night | sim * Night | Lady | sim * Lady | Luck | sim * Luck | |
Rose | 0.99 | 3.0 | 0.99 * 3.0 | 2.5 | 0.99 * 2.5 | 3.0 | 0.99 * 3.0 |
Seymour | 0.38 | 3.0 | 0.38 * 3.0 | 3.0 | 0.38 * 3.0 | 1.5 | 0.38 * 1.5 |
Puig | 0.89 | 4.5 | 0.89 * 4.5 | 3.0 | 0.89 * 3.0 | ||
LaSalle | 0.92 | 3.0 | 0.92 * 3.0 | 3.0 | 0.92 * 3.0 | 2.0 | 0.92 * 2.0 |
Matthews | 0.66 | 3.0 | 0.66 * 3.0 | 3.0 | 0.66 * 3.0 | ||
總計 | 12.89 | 8.38 | 8.07 | ||||
類似度總計 | 0.99+0.38+0.89+0.92+0.66=3.84 | 0.99+0.38+0.92+0.66=2.83 | 0.99+0.38+0.89+0.92=3.18 | ||||
總計/類似度總計 | 3.35 | 2.83 | 2.53 |
3、基於物品的過濾
基於用戶的協做型過濾,要求咱們使用來自每一位的所有評分構建數據集。這種方法對於數量以千計的用戶或是物品規模或是沒有問題,可是對於上百萬客戶的商品的大型網站而言,將一個用戶與其餘全部用戶進行比較,而後再對每位用戶評過度的商品進行比較,其速度多是沒法忍受的。一樣。一個商品銷售量爲數百萬的網站,也許用戶偏好方面彼此間不多見會有重疊,這可能會令用戶類似性判斷變得十分困難。
在擁有大量數據集的狀況下,基於物品的協做型過濾可以更好的得出結論,並且容許咱們將大量計算任務預先執行,從而