在業務中,咱們常常須要對數據建模並預測。簡單的狀況下,咱們採用 if else 判斷(一棵樹)便可。但若是預測結果與衆多因素有關,而每個特徵的權重又不盡相同。python
因此咱們如何把這些特徵的權重合理的找出來,xgboost正是這樣一種算法。算法
xgboost的原理大體是會構建多棵決策樹,來提升預測率。原諒我渣渣的數學,資料不少:(https://www.jianshu.com/p/7467e616f227)dom
這裏記錄下python demolua
參考網址:https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/spa
經驗:在模型訓練中,參數的調整當然重要,但特徵的辨識度更加劇要,因此加入的特徵辨識度必定要高,這樣訓練出的模型準確率才能高。code
# First XGBoost model for Pima Indians dataset from numpy import loadtxt from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # load data dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",") # split data into X and y X = dataset[:,0:8] Y = dataset[:,8] # split data into train and test sets seed = 7 test_size = 0.33 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed) # fit model no training data model = XGBClassifier() model.fit(X_train, y_train) # make predictions for test data y_pred = model.predict(X_test) predictions = [round(value) for value in y_pred] # evaluate predictions accuracy = accuracy_score(y_test, predictions) print("Accuracy: %.2f%%" % (accuracy * 100.0))