python標籤值標準化到[0-(#class-1)]（從新編碼標籤）

時間 2019-12-14

原文原文鏈接

python 處理標籤經常須要將一組標籤映射到一組數字，數字還要求連續。
好比 ['a', 'b', 'c', 'a', 'a', 'b', 'c'] ==(a->0, b->1, c->2)=> [0, 1, 2, 0, 0, 1, 2]。爲了便於本文被搜索，加個關鍵詞：從新編碼python

能夠用sklearn.preprocessing.LabelEncoder()這個函數。函數

以數字標籤爲例：

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([1,2,2,6,3])

獲取標籤值

In [2]: le.classes_
Out[2]: array([1, 2, 3, 6])

將標籤值標準化

In [3]: le.transform([1,1,3,6,2])
Out[3]: array([0, 0, 2, 3, 1], dtype=int64)

將標準化的標籤值反轉

即「反向編碼」：編碼

In [4]: le.inverse_transform([0, 0, 2, 3, 1])
Out[4]: array([1, 1, 3, 6, 2])

非數字型標籤值標準化：

In [5]: from sklearn import preprocessing
   ...: le =preprocessing.LabelEncoder()
   ...: le.fit(["paris", "paris", "tokyo", "amsterdam"])
   ...: print('標籤個數:%s'% le.classes_)
   ...: print('標籤值標準化:%s' % le.transform(["tokyo", "tokyo", "paris"]))
   ...: print('標準化標籤值反轉:%s' % le.inverse_transform([2, 2, 1]))
   ...:

標籤個數:['amsterdam' 'paris' 'tokyo']
標籤值標準化:[2 2 1]
標準化標籤值反轉:['tokyo' 'tokyo' 'paris']

相關標籤/搜索