一、基礎html
建立本身的預測算法很是簡單:算法只不過是一個派生自AlgoBase
具備estimate
方法的類。這是該方法調用的predict()
方法。它接受內部用戶ID,內部項ID,並返回估計評級r算法
from surprise import AlgoBase from surprise import Dataset from surprise.model_selection import cross_validate class MyOwnAlgorithm(AlgoBase): def __init__(self): # Always call base method before doing anything. AlgoBase.__init__(self) def estimate(self, u, i): # 存儲有關預測的其餘信息,還能夠返回包含給定詳細信息的字典 details = {'info1' : 'That was', 'info2' : 'easy stuff :)'} return 3, details data = Dataset.load_builtin('ml-100k') algo = MyOwnAlgorithm() cross_validate(algo, data, verbose=True)
以上代碼實現了一個最簡單的自定義預測方法。函數
二、fit()方法ui
如今,讓咱們製做一個稍微聰明的算法來預測列車集的全部評級的平均值。因爲這是一個不依賴於當前用戶或項目的常量值,咱們寧願一勞永逸地計算它。這能夠經過定義fit
方法來完成:atom
class MyOwnAlgorithm(AlgoBase): def __init__(self): # Always call base method before doing anything. AlgoBase.__init__(self) def fit(self, trainset): # Here again: call base method before doing anything. AlgoBase.fit(self, trainset) # Compute the average rating. We might as well use the # trainset.global_mean attribute ;) self.the_mean = np.mean([r for (_, _, r) in self.trainset.all_ratings()]) return self def estimate(self, u, i): return self.the_mean
fit
方法例如經過cross_validate
交叉驗證過程的每一個摺疊處的函數調用(也能夠本身調用它)。在作任何事情以前,你應該調用基類fit()
方法。spa
請注意,該fit()
方法返回self
。這容許使用表達式algo.fit(trainset).test(testset)
。rest
三、trainset屬性code
fit()
返回基類方法後,您須要的有關當前訓練集的全部信息(評級值等)都存儲在self.trainset
屬性中。這是一個Trainset
具備許多預測屬性和方法的對象。orm
爲了說明它的用法,讓咱們製做一個算法來預測全部評級的平均值,用戶的平均評分和項目的平均評級之間的平均值:xml
def estimate(self, u, i): sum_means = self.trainset.global_mean div = 1 if self.trainset.knows_user(u): sum_means += np.mean([r for (_, r) in self.trainset.ur[u]]) div += 1 if self.trainset.knows_item(i): sum_means += np.mean([r for (_, r) in self.trainset.ir[i]]) div += 1 return sum_means / div
四、預測不可能
由算法決定是否可以產生預測。若是預測不可能,則能夠提出 PredictionImpossible
異常。您須要先導入它:
from surprise import PredictionImpossible
該異常將被該predict()
方法和估計r捕獲^你我[R^ü一世將根據default_prediction()
方法設置,能夠覆蓋。默認狀況下,它返回列車集中全部評級的平均值。
五、類似性和基線
若是算法使用類似性度量或基線估計,您將須要接受bsl_options
並sim_options
做爲__init__
方法的參數 ,並將它們傳遞給Base類。
class MyOwnAlgorithm(AlgoBase): def __init__(self, sim_options={}, bsl_options={}): AlgoBase.__init__(self, sim_options=sim_options, bsl_options=bsl_options) def fit(self, trainset): AlgoBase.fit(self, trainset) # Compute baselines and similarities self.bu, self.bi = self.compute_baselines() self.sim = self.compute_similarities() return self def estimate(self, u, i): if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)): raise PredictionImpossible('User and/or item is unkown.') # Compute similarities between u and v, where v describes all other # users that have also rated item i. neighbors = [(v, self.sim[u, v]) for (v, r) in self.trainset.ir[i]] # Sort these neighbors by similarity neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True) print('The 3 nearest neighbors of user', str(u), 'are:') for v, sim_uv in neighbors[:3]: print('user {0:} with sim {1:1.2f}'.format(v, sim_uv)) # ... Aaaaand return the baseline estimate anyway ;)