There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.ios
– Albert Einstein算法
生活只有兩種方式。一個好像什麼都不是奇蹟。另外一個就好像一切都是奇蹟。 - 艾爾伯特愛因斯坦session
Advanced Analytics Professional: An Unbiased Observer – by Roopamapp
I think the best way to appreciate and enjoy the trivial is to travel. When I say trivial, it includes doorknobs, posters, letterboxes, graffiti and everything we never bother to turn our heads for in our own city. I experienced the same last week while traveling with my wife across Florence and Tuscany. I think one’s level of awareness and curiosity goes up many-fold while traveling. In Florence, we stayed at a lovely bed-and-breakfast named Fiorenza. The breakfast was good and the people even better. There we met this amicable family from the UK with a year old baby named Owen and his 7-year-old sister Kyra. Owen and Kyra were playing hide and seek while having their breakfast. Kyra hid behind the same chair repeatedly and jumped out to reveal herself to her younger brother. Owen was pleasantly surprised every time during this process. All humans are born curious. However, they lose it as they grow older and get familiar with things. The phenomenon could be the reason why we never turn our heads for the trivial in our own city.less
我認爲欣賞和享受雜事的最佳方式是旅行。當我說瑣碎的時候,它包括門把手,海報,信箱,塗鴉以及咱們從未在咱們本身的城市中轉過頭來作的一切。上週我與妻子一塊兒在佛羅倫薩和托斯卡納旅行時經歷了一樣的經歷。我認爲一我的的意識水平和好奇心在旅行時會增長不少倍。在佛羅倫薩,咱們住在一個可愛的住宿加早餐,名爲Fiorenza。早餐很好,人們甚至更好。在那裏,咱們遇到了這個來自英國的友好家庭,一個名叫Owen的嬰兒和他7歲的妹妹Kyra。歐文和凱拉在吃早餐時玩捉迷藏。凱拉反覆躲在同一把椅子後面,跳出來向她的弟弟透露本身。歐文在這個過程當中每次都感到驚喜。全部人都天生好奇。然而,隨着年齡的增加和熟悉事物,他們會失去它。這種現象多是咱們永遠不會爲本身城市中的雜事而煩惱的緣由。ide
Being curious and aware requires constant energy and effort. Perhaps, humans have the natural tendency to slip into a low energy state. Nonetheless, this is particularly dangerous for analysts since their job requires finding meaning in something that seems mundane to others. In my opinion, the biggest challenge for analytics is not the sophistication of statistical algorithms and enhancement of computing power, but for its practitioners to stay curious and constantly ask questions. Zen Buddhists try to achieve cosmic awareness by living in the moment. If that is too difficult, I would recommend that treat your job like a wonderful travel destination and be a good tourist – curious and aware.oop
Ok, so that was a bit of a detour from our original discussion on scorecards. However, there are a couple of reasons for telling you the above: primarily, to tell you why I was late in posting this part of the series. Secondly, I would like us to have a discussion on the importance and challenges of being curious at work and life in general. I already have a few examples in mind i.e. Louis Pasteur and Edward Lorenz but that is for later.post
Now, let’s continue with the topic for this part i.e. model evaluation.性能
好奇心與數據科學事業
充滿好奇和意識須要不斷的精力和努力。也許,人類有天然傾向於陷入低能量狀態。儘管如此,這對分析師來講尤爲危險,由於他們的工做須要在對他人而言看似平凡的事情中找到意義。在我看來,分析的最大挑戰不是統計算法的複雜性和計算能力的提升,而是讓其從業者保持好奇並不斷提出問題。禪宗佛教徒試圖經過生活在當下來實現宇宙意識。若是這太難了,我建議把你的工做看成一個很棒的旅遊目的地,作個好遊客 - 好奇又有意識。
好的,因此這與咱們對記分卡的原始討論有點迂迴。可是,有幾個緣由告訴你上面的內容:主要是告訴你爲何我在發佈這個系列的這一部分時遲到了。其次,我但願咱們討論通常對工做和生活充滿好奇的重要性和挑戰。我已經有一些例子,即路易斯巴斯德和愛德華洛倫茲,但這是爲了之後。
如今,讓咱們繼續討論這個部分的主題,即模型評估。
Model Evaluation & Validation: the test of the pudding is in the eating – by Roopam
When I was in high school, I joined a cricket academy during the summer vacations. Cricket is a game quite similar to baseball. I shall use baseball terminology in parenthesises for everyone to understand. The design of the training camp was to train for about a month followed by a full game with kids at same skill-level from another club. There was this tall and lean kid with us in the camp; he was the star bowler (pitcher) throughout during the training sessions. He used to bowl (pitch) some of the best Yorkers (curve balls). We were quite sure he would outperform everyone in the game. We ask him to open the bowling, his first bowl went for a six (home run) followed by several more. Maybe it was a mix match pressure, expectations, and the crowd but his performance was an absolute disaster. Later the coach told us what happened was not unusual and he had seen this several times before. At higher levels, the game is played not on the ground but the space between the ears. Clearly, he was referring to players’ presence of mind and temperament.
當我在高中時,我在暑假期間加入了板球學院。 Cricket是一款與棒球很是類似的遊戲。我將在括號中使用棒球術語,讓每一個人都能理解。訓練營的設計是訓練大約一個月,而後與來自另外一個俱樂部的相同技能水平的孩子進行完整的比賽。在營地裏有一個高大瘦弱的孩子和咱們在一塊兒;在訓練期間,他一直是明星投手(投手)。他過去經常把一些最好的Yorkers(曲線球)弄成一團糟。咱們很是確定他會在遊戲中賽過每一個人。咱們要求他打開保齡球,他的第一個碗去了六個(本壘打),而後是幾個。也許這是混合比賽壓力,指望和人羣,但他的表現是絕對的災難。後來教練告訴咱們發生的事情並不罕見,他之前曾屢次見過這件事。在更高的級別,遊戲不是在地面上播放,而是在耳朵之間的空間播放。顯然,他指的是球員的思想和睦質。
As the famous saying goes, the test of the pudding is in the eating. One could be a star on the training fields but a complete flop in the match situation. The same is true for an analytical model as well. A model, after going through a round of training (Part 5 of the series) goes through a several rounds of testing.
1. Out of sample test: remember article 2, where we have divided our sample into the training and the test sample. The first level of testing happens on the holdout or test sample. The test sample needs to perform as well as the training sample. Let us come back to this in the next section when I will discuss the measures for performance and ROC curve.
2. Out of time sample test: since the model was built on a sample of the portfolio with reasonable vintage (refer to Part 2), the analyst would like to test the performance of a more recent portfolio. The number of bad borrowers (90+ DPD) in this out of time sample will be certainly less but the overall trend of good/bad ratio against scores will still be a good indicator for model performance. Additionally, the analyst could relax the condition for bad loans and consider 30+ DPD as bad. Again, the overall trend should match the scorecard estimations.
3. On field test: this is where the test of the pudding is; the analyst needs to be completely aware of any credit policy changes that the bank has gone through since the scorecard is developed and more importantly, the impact the changes will have on the scorecard. Always remember not every policy change will influence the scorecard – a good business understanding and a bit of common sense really help here. A regular monitoring and accordingly calibrating the scorecard is a good way to keep it updated.
正如俗名所說,布丁的考驗就在於吃。一我的多是訓練場上的明星,但在比賽狀況下徹底失敗了。對於分析模型也是如此。通過一輪訓練(系列的第5部分)後,模型通過了幾輪測試。
1.train VS test樣品外測試:記住第2條,咱們將樣品分紅培訓和測試樣品。第一級測試發生在保持或測試樣本上。測試樣本須要與訓練樣本同樣好。讓咱們在下一節回到這一點,我將討論性能和ROC曲線的措施。
2.OOT超時樣本測試:因爲該模型是基於合理年份的投資組合樣本(參見第2部分),所以分析師但願測試最近投資組合的表現。在這段時間樣本中,不良借款人(90+ DPD)的數量確定會減小,可是對比分的好/壞比率的總體趨勢仍將是模型表現的良好指標。此外,分析師能夠放鬆不良貸款的條件,並認爲30+ DPD是壞的。一樣,總體趨勢應該與記分卡估計相匹配。
3.政策變化對模型影響大
場景測試:這是布丁測試的地方;分析師須要徹底瞭解銀行自開發記分卡以來所經歷的任何信貸政策變化,更重要的是,變動將對記分卡產生的影響。永遠記住不是每一個政策變化都會影響記分卡 - 良好的商業理解和一些常識在這裏真的頗有幫助。按期監控並相應地校準記分卡是保持更新的好方法。
There are several ways to test the performance of the scorecard such as confusion matrix, KS statistics, Gini and area under ROC curve (AUROC) etc. The KS statistics is widely used metric in scorecards development. However, I personally prefer the AUROC to the others. I must add the Gini is a variant of the AUROC. The reason for my liking of the AUROC could be my formal training in Physics and engineering. I think it is a more holistic measure and lets the analyst visually analyze the model performance. I prefer graph and visual statistics any day to raw numbers.
有幾種方法能夠測試記分卡的性能,例如混淆矩陣,KS統計,基尼係數和ROC曲線下面積(AUROC)等.KS統計量是記分卡開發中普遍使用的度量標準。 可是,我我的更喜歡AUROC和其餘人。 我必須添加Gini是AUROC的變種。 我喜歡AUROC的緣由多是我在物理和工程方面的正式培訓。 我認爲這是一個更全面的衡量標準,讓分析師能夠直觀地分析模型的表現。 我更喜歡圖形和視覺統計數據,以及原始數字。
ROC Curve: for Credit Scorecard Model Validation and Evaluation – by Roopam
The adjacent graph shows a ROC. The two axes on the curve are true and false positive rates. As expected, the plot informs about the level of prediction for the model. A perfect model will perfectly segregate good and bad cases. Hence, you will get 100% true positives in the beginning (i.e. absolute lift) as shown with the green curve in the graph. However, like anything in life perfection does not exist. As they say – If it is too good to be true it probably is. On the other extreme is a worthless model, curve marked in red. Anything close to or below the red curve is as good as tossing a coin, then why to bother with the effort to build a model. Finally, a typical scorecard ROC will look like the blue curve. The AUROC for a usual credit-scoring model is within 70 to 85, higher the better. However, for some fraud and insurance models, a slightly above 60 is an acceptable ROC. Again, analysts should be sure about the business benefits from the scorecard before finalizing the ROC. A simple cost-benefit analysis helps significantly before finalizing the model and reporting it to the top management.
相鄰的圖表顯示了ROC。曲線上的兩個軸是真實和誤報率。正如預期的那樣,該圖表通知了該模型的預測水平。一個完美的模型將完美地隔離好的和壞的案件。所以,您將在開始時得到100%真實的正數(即絕對提高),如圖中的綠色曲線所示。可是,生活中的任何事物都不存在完美。正如他們所說 - 若是真是太好了,那可能就是這樣。另外一個極端是一個毫無價值的模型,曲線標記爲紅色。任何靠近或低於紅色曲線的東西都和投擲硬幣同樣好,那麼爲何要費心去打造一個模型。最後,典型的記分卡ROC看起來像藍色曲線。一般的信用評分模型的AUROC在70到85之間,越高越好。可是,對於某些欺詐和保險模式,略高於60的是可接受的ROC。一樣,分析師應該在最終肯定ROC以前確保記分卡的業務收益。在最終肯定模型並將其報告給最高管理層以前,簡單的成本效益分析能夠顯着提供幫助。
I hope after reading this, you will pick up your camera and visit that unexplored nook at the corner of the street – and be ready for some wonderful surprises!
References1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi 2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas