使用sklearn.model_selection.train_test_split能夠在數據集上隨機劃分出必定比例的訓練集和測試集python
1.使用形式爲:dom
1 from sklearn.model_selection import train_test_split 2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)
2.參數解釋:測試
train_data:樣本特徵集spa
train_target:樣本的標籤集code
test_size:樣本佔比,測試集佔數據集的比重,若是是整數的話就是樣本的數量blog
random_state:是隨機數的種子。在同一份數據集上,相同的種子產生相同的結果,不一樣的種子產生不一樣的劃分結果utf-8
X_train,y_train:構成了訓練集get
X_test,y_test:構成了測試集it
3.舉例:io
生成一個包含100個樣本的數據集,隨機換分出20%爲測試集
1 #py36 2 #!/usr/bin/env python 3 # -*- coding: utf-8 -*- 4 5 #from sklearn.cross_validation import train_test_split 6 from sklearn.model_selection import train_test_split 7 8 # 生成100條數據:100個2維的特徵向量,對應100個標籤 9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50 10 y = [1] * 50 + [2] * 50 11 12 # 隨機抽取20%的測試集 13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1) 14 print ("train:",len(X_train), "test:",len(X_test)) 15 16 # 查看被劃分出的測試集 17 for i in range(len(X_test)): 18 print ("".join(X_test[i]), y_test[i]) 19 20 ''' 21 train: 80 test: 20 22 feature two 2 23 feature two 2 24 feature one 1 25 feature two 2 26 feature two 2 27 feature one 1 28 feature one 1 29 feature two 2 30 feature two 2 31 feature two 2 32 feature two 2 33 feature one 1 34 feature two 2 35 feature two 2 36 feature two 2 37 feature one 1 38 feature one 1 39 feature one 1 40 feature two 2 41 feature one 1 42 '''