xgboost應用

時間 2019-12-09

標籤 xgboost 應用简体版

原文原文鏈接

在業務中，咱們常常須要對數據建模並預測。簡單的狀況下，咱們採用 if else 判斷（一棵樹）便可。但若是預測結果與衆多因素有關，而每個特徵的權重又不盡相同。python

因此咱們如何把這些特徵的權重合理的找出來,xgboost正是這樣一種算法。算法

xgboost的原理大體是會構建多棵決策樹，來提升預測率。原諒我渣渣的數學，資料不少：（https://www.jianshu.com/p/7467e616f227）dom

這裏記錄下python demolua

參考網址：https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/spa

經驗：在模型訓練中，參數的調整當然重要，但特徵的辨識度更加劇要，因此加入的特徵辨識度必定要高，這樣訓練出的模型準確率才能高。code

# First XGBoost model for Pima Indians dataset
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。