https://www.researchgate.net/post/How_can_I_study_the_past_spending_behaviour_of_a_customer_in_a_banking_perspective_and_predict_the_next_purchase_category_and_amount_of_buynode
To predict if the first time buyer will purchase next month, the model has to evaluate non-transaction customer data, such as how many times a customer clicked on an email or how the customer interacts with your website. These models can also take into account certain demographic data. For example, in consumer marketing they may compare gender, age, and zip code to other likely buyers. In business marketing, relevant demographics may include industry, job title, and geography.web
Here’s how it works: the models compare the pre-purchase behavior of prospective buyers to the pre-purchase behavior of thousands or millions of previous customers who ended up buying, comparing attributes like what emails they opened and what products they spent the most time looking at. The prospects that behave most like the previous buyers are tagged as 「high-likelihood buyers」.面試
Predicting likelihood to buy for repeat buyers is a lot easier than predicting likelihood to buy for first-time buyers because there is a lot more information to go on. Repeat purchase predictions utilize all interactions of the customer, such as purchased item type(some items customer will purchase more frequently), last purchase for an item type, returned purchases, order interval, Track general events (Holidays, seasons) and phone calls to customer service.app
Q1 - Which statistical method do you think is most over used?dom
Q2 - Suppose the company is awarding bonuses, and you are given the task to select the awardees. How would you do it? Describe your analytics, as specific as possible.ide
Q5 - curse of dimensionality是什麼意思;問了hash table;函數
Q9 - feature selectionpost
Q13 -兩個模型, 分類正確率分別是 80% 與 81%, 能夠說81%比較好嗎? 為什麼?優化
weighted accuracy (WA) vs un-weighted accuracy (UA),lua
若是存在class imbalance, UA才能選出不會biased to big class的模型
(這邊被隨口追問一下怎麼前處理data of unbalanced class distribution: random sampling, class weights etc)
另外要考慮測試樣本數是否significant, test data diversity etc.
Q17 - 而後讓用ML建模: 給了一個situation,讓選出AWS用戶中unpaid的那些account,剛開始不太理解unpaid; 問了給什麼數據,分析了一下; 最後問了怎麼validation model,怎麼肯定這個模型可行之類的
Q18 - maximum likelihood vs maximum a posteriori 啥區別?
Q19 - feature extraction, Word2Vec相關內容
Q20 –
若是給了一堆數據,而後發現plot出來的結果是個有噪聲的sine 函數。怎麼根據數據來訓練模型。開始的時候不是很明白問題的意思,一直沒太回答到點子上。後來面試官有引導,而後往regression的思路上靠。要寫出推導函數 (optimization function, derivative 等),怎麼訓練參數,若是解決overfitting等問題。
model: Y = a+SINE(bX + c), here a, b, c are parameters.
optimization/cost function: mean squared error 1/m * sum(y - y_pred)^2
use gradient descent to minimize optimization (first derivative needed)
overfitting can be solved by regularization.
Q21 - how do you choose between random forest and linear regression given that you want to figure out the feature importance
Q25 - Hessian的計算,和特徵值、特徵向量的關係。
Q26 - 描述一個數據錯誤的例子,你怎麼解決的。
Q27 - sensitive analysis
Q28 - PCA宏觀理解->實現原理->PCA和SVD關係->爲何用SVD實現更好->latent analysis方法比較->other type of matrix decomposition.
Q29 –
首先問了下若是你創建了模型後,有新的數據用於預測將來,可是你並不知道這個新的數據的label的時候,如何判斷模型是否能預測準確,是否須要從新train模型。
(1)training 過程當中設置validation set prevent model不會overfit,(2)比較new加入data feature distribution 跟以前training data是否是類似的, 能夠用t-test看?
ML SDE, ML Scientist
1. Amazon seller上傳產品的時候須要給category; 如何根據product name, description, brand, 等信息recommend合適的category以及相關的sub-category
2. How to handle unbalanced data
3. How do you train logistic regression, what is the obj function
4. 如何combine多個很是類似的listed products. 好比amazon搜索某一個laptop 可能會返回3個results 但大部分時候實際上是同樣的東西 只是賣家,描述和圖片有些出入
5. when naive bayes is bettern than logistic regression?
6. Overfitting, Cross Validation etc.
7. 簡要說一下本身作過跟ML有關的項目,用什麼ML方法,數據什麼樣,多少feature,怎麼處理overfitting/underfitting,L1/L2區別,feature selection
2018-9-27
onsite
8. 徹底就是根據作的項目問。他會問high level的問題,好比哪一個項目是你本身完成而且頗有意義的,從產品的角度來講有哪些意義。
哪一個項目是和別人,尤爲是不一樣領域的人,合做完成的,那麼是如何合做的。再好比ML 的metric是什麼(好比AUC),爲何用這個,若是是對客戶或者市場方面的人說AUC可能不太好理解,那麼用什麼metric好一些?
而後會忽然教你說一下一些ML的方法好比GBM。另外,由於我面的是Alexa組,會叫你聊聊如何根據語言判別skill。
Skill 我當時理解就是具體的種類,好比game,pizza。好比我問「Alexa,can you suggestion pizza?」它要根據個人位置來推薦我家附近的pizza店。若是我問「Alexa,can you suggestion game?」
它應該問「what kind of game? Video game or something else?」 我說「Video」,它會接着問再具體的東西(RPG?)直到足夠詳細再給出建議。 那麼如何設計方法叫Alexa可以這麼問。
9.建model, 因此出了個題目. 若是作了一個survey, 知道人的姓名身高等等狀況, 預測其收入, 怎麼建模
10. naive bayes和logistics regression的區別. 說之間有個trade-off, 是什麼?
11. 說一個vector只有binary, 用哪一個好.
答, 我仍是說看狀況. 面試官說不看狀況, 你只有一次嘗試的機會你用什麼? 我說若是都是binary的話我會用logistic regression…
12. evaluate performance:
13. regulation
2017-2-8 applied scientist
14. 解釋深度學習的模型, 優點 etc
15. generative, discriminative models 的差異, 舉例
16. 解釋 generative adversarial networks, 讀過論文但沒用過, 大約講一下原理
17. 避免overfitting的方法, regularization, dropout, cross validation, early stopping etc
18. 兩個模型, 分類正確率分別是 80% 與 81%, 能夠說81%比較好嗎? 為什麼?
19. weighted accuracy (WA) vs un-weighted accuracy (UA), 若是存在class imbalance, UA才能選出不會biased to big class的模型(這邊被隨口追問一下怎麼前處理data of unbalanced class distribution: random sampling, class weights etc)另外要考慮測試樣本數是否significant, test data diversity etc. vis