scikit-learn

時間 2019-12-05

標籤 scikit learn 简体版

原文原文鏈接

（1）數據標準化（Standardization or Mean Removal and Variance Scaling）python

進行標準化縮放的數據均值爲0，具備單位方差。函數

scale函數提供一種便捷的標準化轉換操做，以下：工具

[python] view plain copy

>>> from sklearn import preprocessing #導入數據預處理包
>>> X=[[1.,-1.,2.],
[2.,0.,0.],
[0.,1.,-1.]]
>>> X_scaled = preprocessing.scale(X)
>>> X_scaled
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])

[python] view plain copy

>>> X_scaled.mean(axis=0)
array([ 0., 0., 0.])
>>> X_scaled.std(axis=0)
array([ 1., 1., 1.])

一樣咱們也能夠經過preprocessing模塊提供的Scaler（StandardScaler 0.15之後版本）工具類來實現這個功能：

[python] view plain copy

>>> scaler = preprocessing.StandardScaler().fit(X)
>>> scaler
StandardScaler(copy=True, with_mean=True, with_std=True)
>>> scaler.mean_
array([ 1. , 0. , 0.33333333])
>>> scaler.std_
array([ 0.81649658, 0.81649658, 1.24721913])
>>> scaler.transform(X)
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])

（2）數據規範化（Normalization）
把數據集中的每一個樣本全部數值縮放到(-1,1)之間。

[python] view plain copy

>>> X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
>>> X_normalized = preprocessing.normalize(X, norm='l2')
>>> X_normalized
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])
>>> normalizer = preprocessing.Normalizer().fit(X) # fit does nothing
>>> normalizer
Normalizer(copy=True, norm='l2')
>>> normalizer.transform(X)
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])
>>> normalizer.transform([[-1., 1., 0.]])
array([[-0.70710678, 0.70710678, 0. ]])

（3）二進制化（Binarization）
將數值型數據轉化爲布爾型的二值數據，能夠設置一個閾值（threshold）

[python] view plain copy

>>> X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
>>> binarizer = preprocessing.Binarizer().fit(X) # fit does nothing
>>> binarizer
Binarizer(copy=True, threshold=0.0) # 默認閾值爲0.0
>>> binarizer.transform(X)
array([[ 1., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])
>>> binarizer = preprocessing.Binarizer(threshold=1.1) # 設定閾值爲1.1
>>> binarizer.transform(X)
array([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 0., 0.]])

（4）標籤預處理（Label preprocessing）編碼

4.1）標籤二值化（Label binarization）spa

LabelBinarizer一般用於經過一個多類標籤（label）列表，建立一個label指示器矩陣.net

[python] view plain copy

>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])

上例中每一個實例中只有一個標籤（label），LabelBinarizer也支持每一個實例數據顯示多個標籤：code

[python] view plain copy

>>> lb.fit_transform([(1, 2), (3,)]) #(1,2)實例中就包含兩個label
array([[1, 1, 0],
[0, 0, 1]])
>>> lb.classes_
array([1, 2, 3])

4.2）標籤編碼（Label encoding）

orm

[python] view plain copy

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2])
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

也能夠用於非數值類型的標籤到數值類型標籤的轉化：blog

[python] view plain copy

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1])
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

相關文章

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

最新文章

本站公眾號

歡迎關注本站公眾號,獲取更多信息

相關文章

>>更多相關文章<<