Py之imblearn:imblearn/imbalanced-learn庫的簡介、安裝、使用方法之詳細攻略python
目錄git
imblearn/imbalanced-learn庫的簡介github
imblearn/imbalanced-learn庫的安裝算法
imblearn/imbalanced-learn庫的使用方法app
imblearn/imbalanced-learn庫的簡介
imblearn/imbalanced-learn是一個python包,它提供了許多重採樣技術,經常使用於顯示強烈類間不平衡的數據集中。它與scikit learn兼容,是 scikit-learn-contrib 項目的一部分。dom
在python3.6+下測試了imbalanced-learn。依賴性要求基於上一個scikit學習版本:ide
- scipy(>=0.19.1)
- numpy(>=1.13.3)
- scikit-learn(>=0.22)
- joblib(>=0.11)
- keras 2 (optional)
- tensorflow (optional)
imblearn/imbalanced-learn庫的安裝
pip install imblearn
pip install imbalanced-learn
pip install -U imbalanced-learn
conda install -c conda-forge imbalanced-learn
學習
imblearn/imbalanced-learn庫的使用方法
大多數分類算法只有在每一個類的樣本數量大體相同的狀況下才能達到最優。高度傾斜的數據集,其中少數被一個或多個類大大超過,已經證實是一個挑戰,但同時變得愈來愈廣泛。
解決這個問題的一種方法是經過從新採樣數據集來抵消這種不平衡,但願獲得一個比其餘方法更健壯和公平的決策邊界。
測試
Re-sampling techniques are divided in two categories:this
- Under-sampling the majority class(es).
- Over-sampling the minority class.
- Combining over- and under-sampling.
- Create ensemble balanced sets.
Below is a list of the methods currently implemented in this module.
-
Under-sampling
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links [1]
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3) [2]
- Condensed Nearest Neighbour [3]
- One-Sided Selection [4]
- Neighboorhood Cleaning Rule [5]
- Edited Nearest Neighbours [6]
- Instance Hardness Threshold [7]
- Repeated Edited Nearest Neighbours [14]
- AllKNN [14]
-
Over-sampling
- Random minority over-sampling with replacement
- SMOTE - Synthetic Minority Over-sampling Technique [8]
- SMOTENC - SMOTE for Nominal Continuous [8]
- bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 [9]
- SVM SMOTE - Support Vectors SMOTE [10]
- ADASYN - Adaptive synthetic sampling approach for imbalanced learning [15]
- KMeans-SMOTE [17]
-
Over-sampling followed by under-sampling
-
Ensemble classifier using samplers internally
- Mini-batch resampling for Keras and Tensorflow