使用Sklearn-train_test_split 劃分數據集

使用sklearn.model_selection.train_test_split能夠在數據集上隨機劃分出必定比例的訓練集和測試集python

1.使用形式爲:dom

1 from sklearn.model_selection import train_test_split 
2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

2.參數解釋:測試

train_data:樣本特徵集spa

train_target:樣本的標籤集code

test_size:樣本佔比,測試集佔數據集的比重,若是是整數的話就是樣本的數量blog

random_state:是隨機數的種子。在同一份數據集上,相同的種子產生相同的結果,不一樣的種子產生不一樣的劃分結果utf-8

X_train,y_train:構成了訓練集get

X_test,y_test:構成了測試集it

3.舉例:io

生成一個包含100個樣本的數據集,隨機換分出20%爲測試集

 1 #py36
 2 #!/usr/bin/env python
 3 # -*- coding: utf-8 -*-
 4 
 5 #from sklearn.cross_validation import train_test_split
 6 from sklearn.model_selection import train_test_split 
 7 
 8 # 生成100條數據:100個2維的特徵向量,對應100個標籤
 9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50
10 y = [1] * 50 + [2] * 50
11 
12 # 隨機抽取20%的測試集
13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)
14 print ("train:",len(X_train), "test:",len(X_test))
15 
16 # 查看被劃分出的測試集
17 for i in range(len(X_test)):
18     print ("".join(X_test[i]), y_test[i])
19 
20 '''
21 train: 80 test: 20
22 feature two  2
23 feature two  2
24 feature one  1
25 feature two  2
26 feature two  2
27 feature one  1
28 feature one  1
29 feature two  2
30 feature two  2
31 feature two  2
32 feature two  2
33 feature one  1
34 feature two  2
35 feature two  2
36 feature two  2
37 feature one  1
38 feature one  1
39 feature one  1
40 feature two  2
41 feature one  1
42 '''
相關文章
相關標籤/搜索