sklearn.model_selection.train_test_split 用法

時間 2020-05-08

標籤 sklearn.model sklearn model selection.train selection train test split 用法简体版

原文原文鏈接

sklearn.model_selection.train_test_split 用法
在使用python作機械學習時候，爲了製做訓練數據（training samples）和測試數據（testing samples），常使用sklearn裏面的
sklearn.model_selection.train_test_split模塊。
train_test_split的使用方法：
sklearn.model_selection.train_test_split(*arrays, **options)
train_test_split裏面經常使用的因數（arguments）介紹：
arrays：分割對象一樣長度的列表或者numpy arrays，矩陣。
test_size：兩種指定方法。1：指定小數。小數範圍在0.0~0.1之間，它表明test集佔據的比例。2：指定整數。整數的大小必須在這個數據集個數範圍內，總不能指定一個數超出了數據集的個數範圍吧。要是test_size在沒有指定的場合，能夠經過train_size來指定。（兩個是對應關係）。若是train_size也沒有指定，那麼默認值是0.25.
train_size：和test_size類似。
random_state:這是將分割的training和testing集合打亂的個數設定。若是不指定的話，也能夠經過numpy.random來設定隨機數。
shuffle和straify不經常使用。straify就是將數據分層。
train_test_split 用法舉例：
這個數據集 4列（カラム），12行（レコード）。
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split
>>>
>>> namelist = pd.DataFrame({
... "name" : ["Suzuki", "Tanaka", "Yamada", "Watanabe", "Yamamoto",
... "Okada", "Ueda", "Inoue", "Hayashi", "Sato",
... "Hirayama", "Shimada"],
... "age": [30, 40, 55, 29, 41, 28, 42, 24, 33, 39, 49, 53],
... "department": ["HR", "Legal", "IT", "HR", "HR", "IT",
... "Legal", "Legal", "IT", "HR", "Legal", "Legal"],
... "attendance": [1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1]
... })
>>> print(namelist)

age attendance department name
0 30 1 HR Suzuki
1 40 1 Legal Tanaka
2 55 1 IT Yamada
3 29 0 HR Watanabe
4 41 1 HR Yamamoto
5 28 1 IT Okada
6 42 1 Legal Ueda
7 24 0 Legal Inoue
8 33 0 IT Hayashi
9 39 1 HR Sato
10 49 1 Legal Hirayama
11 53 1 Legal Shimada
將testing數據指定爲0.3（test_size=0.3），從而將testing和training 集合分開。
————————————————
版權聲明：本文爲CSDN博主「大魚霸吃小魚兒」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處連接及本聲明。
原文連接：https://blog.csdn.net/datascientist_chen/article/details/79024020python

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。