https://blog.csdn.net/LuYi_WeiLin/article/details/88019102轉載微信
評分卡可分爲申請評分卡(A卡)、行爲評分卡(B卡)、催收評分卡(C卡)。不一樣的卡使用場景不同,A卡用於貸前申請環節,用來區分客戶好壞;B卡用於貸中環節,根據觀察行爲預測將來一段時間發生逾期/違約的機率;C卡主要用於貸後環節,這篇博客來總結一下C卡相關的知識點。app
催收工做介紹
要學習催收評分卡,首先要先了解一下信貸客戶管理的週期
less
其實對於逾期,不能一棍子打死,逾期有時候罰息對於公司來講也是一種收益,因此咱們須要對逾期客戶進行分類,不一樣客戶類別採起的催收手段都是不同的dom
其中風險等級(第四類>第三類>第二類>第一類)ide
催收的流程大體能夠分爲如下幾步:函數
催收評分卡和申請評分卡和行爲評分卡不太同樣,通常申請評分卡和行爲評分卡使用一個模型就能夠了,可是催收評分卡由三個模型構成:(不一樣的模型功能目的不同,其中失聯預測模型是比較重要的)學習
下面介紹一下這三種模型的目的以及模型經常使用的數據特徵測試
催收評分卡和申請評分卡和行爲評分卡不太同樣,通常申請評分卡和行爲評分卡使用一個模型就能夠了,可是催收評分卡由三個模型構成:(不一樣的模型功能目的不同,其中失聯預測模型是比較重要的)ui
還款率模型
帳齡滾動模型
失聯預測模型
這篇博客以還款率模型進行講解,要講解還款率模型,咱們首先要了解一下隨機森林模型python實現代碼
基於迴歸樹的隨機森林(元分類器是由許多回歸樹構成,每個元分類器模型並行運行得出一個預測值,取全部元分類器模型的平均值做爲最終的預測值)
代碼以下,數據能夠在個人資源下載,固然了,還款率模型完以後還能夠對其進行延伸,預測出來的催回還款率假設定一個閾值(80%,本身能夠定),大於80%爲可摧回,小於爲不可催回,以後能夠使用二分類的邏輯迴歸對客戶狀況進行預測該客戶是可摧回仍是不可催回:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import GridSearchCV def MakeupMissingCategorical(x): if str(x) == 'nan': return 'Unknown' else: return x def MakeupMissingNumerical(x,replacement): if np.isnan(x): return replacement else: return x ''' 第一步:文件準備 ''' foldOfData = 'H:/' mydata = pd.read_csv(foldOfData + "還款率模型.csv",header = 0,engine ='python') #催收還款率等於催收金額/(所欠本息+催收費用)。其中催收費用以支出形式表示 mydata['rec_rate'] = mydata.apply(lambda x: x.LP_NonPrincipalRecoverypayments /(x.AmountDelinquent-x.LP_CollectionFees), axis=1) #還款率假如大於1,按做1處理 mydata['rec_rate'] = mydata['rec_rate'].map(lambda x: min(x,1)) #整個開發數據分爲訓練集、測試集2個部分 trainData, testData = train_test_split(mydata,test_size=0.4) ''' 第二步:數據預處理 ''' #因爲不存在數據字典,因此只分類了一些數據 categoricalFeatures = ['CreditGrade','Term','BorrowerState','Occupation','EmploymentStatus','IsBorrowerHomeowner','CurrentlyInGroup','IncomeVerifiable'] numFeatures = ['BorrowerAPR','BorrowerRate','LenderYield','ProsperRating (numeric)','ProsperScore','ListingCategory (numeric)','EmploymentStatusDuration','CurrentCreditLines', 'OpenCreditLines','TotalCreditLinespast7years','CreditScoreRangeLower','OpenRevolvingAccounts','OpenRevolvingMonthlyPayment','InquiriesLast6Months','TotalInquiries', 'CurrentDelinquencies','DelinquenciesLast7Years','PublicRecordsLast10Years','PublicRecordsLast12Months','BankcardUtilization','TradesNeverDelinquent (percentage)', 'TradesOpenedLast6Months','DebtToIncomeRatio','LoanFirstDefaultedCycleNumber','LoanMonthsSinceOrigination','PercentFunded','Recommendations','InvestmentFromFriendsCount', 'Investors'] ''' 類別型變量須要用目標變量的均值進行編碼 ''' encodedFeatures = [] encodedDict = {} for var in categoricalFeatures: trainData[var] = trainData[var].map(MakeupMissingCategorical) avgTarget = trainData.groupby([var])['rec_rate'].mean() avgTarget = avgTarget.to_dict() newVar = var + '_encoded' trainData[newVar] = trainData[var].map(avgTarget) encodedFeatures.append(newVar) encodedDict[var] = avgTarget #對數值型數據的缺失進行補缺 trainData['ProsperRating (numeric)'] = trainData['ProsperRating (numeric)'].map(lambda x: MakeupMissingNumerical(x,0)) trainData['ProsperScore'] = trainData['ProsperScore'].map(lambda x: MakeupMissingNumerical(x,0)) avgDebtToIncomeRatio = np.mean(trainData['DebtToIncomeRatio']) trainData['DebtToIncomeRatio'] = trainData['DebtToIncomeRatio'].map(lambda x: MakeupMissingNumerical(x,avgDebtToIncomeRatio)) numFeatures2 = numFeatures + encodedFeatures ''' 第三步:調參 對基於CART的隨機森林的調參,主要有: 1,樹的個數 2,樹的最大深度 3,內部節點最少樣本數與葉節點最少樣本數 4,特徵個數 此外,調參過程當中選擇的偏差函數是均值偏差,5倍摺疊 ''' X, y= trainData[numFeatures2],trainData['rec_rate'] param_test1 = {'n_estimators':range(60,91,5)} gsearch1 = GridSearchCV(estimator = RandomForestRegressor(min_samples_split=50,min_samples_leaf=10,max_depth=8,max_features='sqrt' ,random_state=10),param_grid = param_test1, scoring='neg_mean_squared_error',cv=5) gsearch1.fit(X,y) gsearch1.best_params_, gsearch1.best_score_ best_n_estimators = gsearch1.best_params_['n_estimators'] param_test2 = {'max_depth':range(3,15), 'min_samples_split':range(10,101,10)} gsearch2 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, min_samples_leaf=10,max_features='sqrt' ,random_state=10,oob_score=True),param_grid = param_test2, scoring='neg_mean_squared_error',cv=5) gsearch2.fit(X,y) gsearch2.best_params_, gsearch2.best_score_ best_max_depth = gsearch2.best_params_['max_depth'] best_min_samples_split = gsearch2.best_params_['min_samples_split'] param_test3 = {'min_samples_leaf':range(1,20,2)} gsearch3 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, max_depth = best_max_depth,max_features='sqrt',min_samples_split=best_min_samples_split,random_state=10,oob_score=True),param_grid = param_test3, scoring='neg_mean_squared_error',cv=5) gsearch3.fit(X,y) gsearch3.best_params_, gsearch3.best_score_ best_min_samples_leaf = gsearch3.best_params_['min_samples_leaf'] numOfFeatures = len(numFeatures2) mostSelectedFeatures = numOfFeatures/2 param_test4 = {'max_features':range(3,numOfFeatures+1)} gsearch4 = GridSearchCV(estimator = RandomForestRegressor(n_estimators=best_n_estimators, max_depth=best_max_depth,min_samples_leaf=best_min_samples_leaf,min_samples_split=best_min_samples_split,random_state=10,oob_score=True),param_grid = param_test4, scoring='neg_mean_squared_error',cv=5) gsearch4.fit(X,y) gsearch4.best_params_, gsearch4.best_score_ best_max_features = gsearch4.best_params_['max_features'] #把最優參數所有獲取去作隨機森林擬合 cls = RandomForestRegressor(n_estimators=best_n_estimators,max_depth=best_max_depth,min_samples_leaf=best_min_samples_leaf,min_samples_split=best_min_samples_split,max_features=best_max_features,random_state=10,oob_score=True) cls.fit(X,y) trainData['pred'] = cls.predict(trainData[numFeatures2]) trainData['less_rr'] = trainData.apply(lambda x: int(x.pred > x.rec_rate), axis=1) np.mean(trainData['less_rr']) err = trainData.apply(lambda x: np.abs(x.pred - x.rec_rate), axis=1) np.mean(err) #隨機森林評估變量重要性 importance=cls.feature_importances_ featureImportance=dict(zip(numFeatures2,importance)) featureImportance=sorted(featureImportance.items(),key=lambda x:x[1],reverse=True) ''' 第四步:在測試集上測試效果 ''' #類別型數據處理 for var in categoricalFeatures: testData[var] = testData[var].map(MakeupMissingCategorical) newVar = var + '_encoded' testData[newVar] = testData[var].map(encodedDict[var]) avgnewVar = np.mean(trainData[newVar]) testData[newVar] = testData[newVar].map(lambda x: MakeupMissingNumerical(x, avgnewVar)) #連續性數據處理 testData['ProsperRating (numeric)'] = testData['ProsperRating (numeric)'].map(lambda x: MakeupMissingNumerical(x,0)) testData['ProsperScore'] = testData['ProsperScore'].map(lambda x: MakeupMissingNumerical(x,0)) testData['DebtToIncomeRatio'] = testData['DebtToIncomeRatio'].map(lambda x: MakeupMissingNumerical(x,avgDebtToIncomeRatio)) testData['pred'] = cls.predict(testData[numFeatures2]) testData['less_rr'] = testData.apply(lambda x: int(x.pred > x.rec_rate), axis=1) np.mean(testData['less_rr']) err = testData.apply(lambda x: np.abs(x.pred - x.rec_rate), axis=1) np.mean(err)
python風控建模實戰lendingClub(博主錄製,catboost,lightgbm建模,2K超清分辨率)
https://study.163.com/course/courseMain.htm?courseId=1005988013&share=2&shareId=400000000398149
微信掃二維碼,免費學習更多python資源