ML之sklearn:sklearn的make_pipeline函數、RobustScaler函數、KFold函數、cross_val_score函數的代碼解釋、使用方法之詳細攻略

ML之sklearn:sklearn的make_pipeline函數、RobustScaler函數、KFold函數、cross_val_score函數的代碼解釋、使用方法之詳細攻略html

 

 

 

目錄python

sklearn的make_pipeline函數的代碼解釋、使用方法express

sklearn的make_pipeline函數的代碼解釋數組

sklearn的make_pipeline函數的使用方法緩存

一、使用Pipeline類來表示在使用MinMaxScaler縮放數據以後再訓練一個SVM的工做流程app

二、make_pipeline函數建立管道dom

sklearn的RobustScaler函數的代碼解釋、使用方法機器學習

RobustScaler函數的代碼解釋ide

RobustScaler函數的使用方法函數

sklearn的KFold函數的代碼解釋、使用方法

KFold函數的代碼解釋

KFold函數的使用方法

sklearn的cross_val_score函數的代碼解釋、使用方法

cross_val_score函數的代碼解釋

scoring參數可選的對象

cross_val_score函數的使用方法

一、分類預測——糖尿病

二、分類預測——iris鳶尾花


 

sklearn的make_pipeline函數的代碼解釋、使用方法

          爲了簡化構建變換和模型鏈的過程,Scikit-Learn提供了pipeline類,能夠將多個處理步驟合併爲單個Scikit-Learn估計器。pipeline類自己具備fit、predict和score方法,其行爲與Scikit-Learn中的其餘模型相同。

sklearn的make_pipeline函數的代碼解釋

def make_pipeline(*steps, **kwargs):
    """Construct a Pipeline from the given estimators.

    This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set  to the lowercase of their types automatically.

    Parameters
    ----------
    *steps : list of estimators,

    memory : None, str or object with the joblib.Memory interface, optional
        Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of  the transformers before fitting. Therefore, the transformer  instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to  inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

根據給定的估算器構造一條管道

這是管道構造函數的簡寫;它不須要,也不容許命名估算器。相反,它們的名稱將自動設置爲類型的小寫。

參數

    ----------

*steps :評估表、

memory:無,str或帶有joblib的對象。內存接口,可選

用於緩存安裝在管道中的變壓器。默認狀況下,不執行緩存。若是給定一個字符串,它就是到緩存目錄的路徑。啓用緩存會在安裝前觸發變壓器的克隆。所以,給管線的變壓器實例不能直接檢查。使用屬性' ' named_steps ' ' '或' ' steps ' '檢查管道中的評估器。當裝配耗時時,緩存變壓器是有利的。

    Examples
    --------
    >>> from sklearn.naive_bayes import GaussianNB
    >>> from sklearn.preprocessing import StandardScaler
    >>> make_pipeline(StandardScaler(), GaussianNB(priors=None))
    ...     # doctest: +NORMALIZE_WHITESPACE
    Pipeline(memory=None,
             steps=[('standardscaler',
                     StandardScaler(copy=True, with_mean=True, with_std=True)),
                    ('gaussiannb', GaussianNB(priors=None))])








    Returns
    -------
    p : Pipeline
    """
    memory = kwargs.pop('memory', None)
    if kwargs:
        raise TypeError('Unknown keyword arguments: "{}"'
                        .format(list(kwargs.keys())[0]))
    return Pipeline(_name_estimators(steps), memory=memory)







 

 

 

sklearn的make_pipeline函數的使用方法

Examples
    --------
    >>> from sklearn.naive_bayes import GaussianNB
    >>> from sklearn.preprocessing import StandardScaler
    >>> make_pipeline(StandardScaler(), GaussianNB(priors=None))
    ...     # doctest: +NORMALIZE_WHITESPACE
    Pipeline(memory=None,
             steps=[('standardscaler',
                     StandardScaler(copy=True, with_mean=True, with_std=True)),
                    ('gaussiannb', GaussianNB(priors=None))])

    Returns
    -------
    p : Pipeline

 

一、使用Pipeline類來表示在使用MinMaxScaler縮放數據以後再訓練一個SVM的工做流程

from sklearn.pipeline import Pipeline
pipe = Pipeline([("scaler",MinMaxScaler()),("svm",SVC())])
pip.fit(X_train,y_train)
pip.score(X_test,y_test)

 

二、make_pipeline函數建立管道

用Pipeline類構建管道時語法有點麻煩,咱們一般不須要爲每個步驟提供用戶指定的名稱,這種狀況下,就能夠用make_pipeline函數建立管道,它能夠爲咱們建立管道並根據每一個步驟所屬的類爲其自動命名。

from sklearn.pipeline import make_pipeline
pipe = make_pipeline(MinMaxScaler(),SVC())

 

參考文章
《Python機器學習基礎教程》構建管道(make_pipeline)
Python sklearn.pipeline.make_pipeline() Examples

 

sklearn的RobustScaler函數的代碼解釋、使用方法

RobustScaler函數的代碼解釋

class RobustScaler(BaseEstimator, TransformerMixin):
    """Scale features using statistics that are robust to outliers.

    This Scaler removes the median and scales the data according to  the quantile range (defaults to IQR: Interquartile Range).
    The IQR is the range between the 1st quartile (25th quantile)  and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature (or each sample, depending on the ``axis`` argument) by computing the relevant statistics on the samples in the training set. Median and  interquartile  range are then stored to be used on later data using the ``transform`` method.

    Standardization of a dataset is a common requirement for many  machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and  the interquartile range often give better results.

    .. versionadded:: 0.17

    Read more in the :ref:`User Guide <preprocessing_scaler>`.

    Parameters
    ----------
    with_centering : boolean, True by default
        If True, center the data before scaling.  This will cause ``transform`` to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in  memory.


    with_scaling : boolean, True by default
        If True, scale the data to interquartile range.

    quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0
        Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR
        Quantile range used to calculate ``scale_``.

        .. versionadded:: 0.18

    copy : boolean, optional, default is True
        If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be
        returned.

    Attributes
    ----------
    center_ : array of floats
        The median value for each feature in the training set.


    scale_ : array of floats
        The (scaled) interquartile range for each feature in the training set.

        .. versionadded:: 0.17
           *scale_* attribute.

    See also
    --------
    robust_scale: Equivalent function without the estimator API.

    :class:`sklearn.decomposition.PCA`
        Further removes the linear correlation across features with   'whiten=True'.

    Notes
    -----
    For a comparison of the different scalers, transformers, and normalizers, see :ref:`examples/preprocessing/plot_all_scaling.py
    <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.


    https://en.wikipedia.org/wiki/Median_(statistics)
    https://en.wikipedia.org/wiki/Interquartile_range
    """

 

使用對離羣值穩健的統計數據來衡量特徵

這個標量去除中值,並根據分位數範圍(默認爲IQR:四分位數範圍)對數據進行縮放
IQR是第1個四分位數(第25分位數)和第3個四分位數(第75分位數)之間的範圍。經過計算訓練集中樣本的相關統計數據,在每一個特徵(或每一個樣本,取決於「軸」參數)上獨立地進行定心和縮放。而後,中值和四分位範圍被存儲起來,以便使用「變換」方法在之後的數據中使用。

數據集的標準化是許多機器學習估計器的常見需求。這一般是經過去除平均值和縮放到單位方差來實現的。然而,異常值每每會對樣本均值/方差產生負面影響。在這種狀況下,中位數和四分位範圍一般會獲得更好的結果

. .versionadded:: 0.17

詳見:ref: ' User Guide  '。</preprocessing_scaler>

參數
 ----------
with_center:boolean,默認爲True
若是爲真,在縮放前將數據居中。這將致使「轉換」在嘗試處理稀疏矩陣時引起異常,由於圍繞它們須要構建一個密集的矩陣,在常見的用例中,這個矩陣可能太大而沒法裝入內存。


with_scaling:布爾值,默認爲True
若是爲真,將數據縮放到四分位範圍。

quantile_range:元組(q_min, q_max), 0.0 < q_min < q_max < 100.0
默認:(25.0,75.0)=(第1分位數,第3分位數)= IQR
用於計算' ' scale_ ' '的分位數範圍。

. .versionadded:: 0.18

布爾值,可選,默認爲真
若是爲False,則儘可能避免複製,而改成就地縮放。這並不能保證老是有效的;例如,若是數據不是一個NumPy數組或scipy。稀疏CSR矩陣,仍可複製
返回。

屬性
----------
浮點數數組
訓練集中每一個特徵的中值。


浮點數數組
訓練集中每一個特徵的(縮放的)四分位範圍。

. .versionadded:: 0.17
* scale_ *屬性。

另請參閱
 --------
沒有estimator API的等價函數。

類:「sklearn.decomposition.PCA」
進一步消除了「whiten=True」特徵之間的線性相關性。

筆記
-----
對於不一樣的標量、轉換器和規整器的比較,請參見:ref: ' examples/preprocessing/ plot_all_scale .py
< sphx_glr_auto_examples_preprocessing_plot_all_scaling.py >」。


https://en.wikipedia.org/wiki/Median_(統計)
https://en.wikipedia.org/wiki/Interquartile_range
」「」

 

    def __init__(self, with_centering=True, with_scaling=True,
                 quantile_range=(25.0, 75.0), copy=True):
        self.with_centering = with_centering
        self.with_scaling = with_scaling
        self.quantile_range = quantile_range
        self.copy = copy




    def _check_array(self, X, copy):
        """Makes sure centering is not enabled for sparse matrices."""
        X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
                        estimator=self, dtype=FLOAT_DTYPES)


        if sparse.issparse(X):
            if self.with_centering:
                raise ValueError(
                    "Cannot center sparse matrices: use `with_centering=False`"
                    " instead. See docstring for motivation and alternatives.")
        return X




    def fit(self, X, y=None):
        """Compute the median and quantiles to be used for scaling.

        Parameters
        ----------
        X : array-like, shape [n_samples, n_features]
            The data used to compute the median and quantiles
            used for later scaling along the features axis.
        """
        if sparse.issparse(X):
            raise TypeError("RobustScaler cannot be fitted on sparse inputs")
        X = self._check_array(X, self.copy)
        if self.with_centering:
            self.center_ = np.median(X, axis=0)









        if self.with_scaling:
            q_min, q_max = self.quantile_range
            if not 0 <= q_min <= q_max <= 100:
                raise ValueError("Invalid quantile range: %s" %
                                 str(self.quantile_range))



            q = np.percentile(X, self.quantile_range, axis=0)
            self.scale_ = (q[1] - q[0])
            self.scale_ = _handle_zeros_in_scale(self.scale_, copy=False)
        return self


    def transform(self, X):
        """Center and scale the data.

        Can be called on sparse input, provided that ``RobustScaler`` has been
        fitted to dense input and ``with_centering=False``.

        Parameters
        ----------
        X : {array-like, sparse matrix}
            The data used to scale along the specified axis.
        """
        if self.with_centering:
            check_is_fitted(self, 'center_')
        if self.with_scaling:
            check_is_fitted(self, 'scale_')
        X = self._check_array(X, self.copy)








        if sparse.issparse(X):
            if self.with_scaling:
                inplace_column_scale(X, 1.0 / self.scale_)
        else:
            if self.with_centering:
                X -= self.center_
            if self.with_scaling:
                X /= self.scale_
        return X







    def inverse_transform(self, X):
        """Scale back the data to the original representation

        Parameters
        ----------
        X : array-like
            The data used to scale along the specified axis.
        """
        if self.with_centering:
            check_is_fitted(self, 'center_')
        if self.with_scaling:
            check_is_fitted(self, 'scale_')
        X = self._check_array(X, self.copy)








        if sparse.issparse(X):
            if self.with_scaling:
                inplace_column_scale(X, self.scale_)
        else:
            if self.with_scaling:
                X *= self.scale_
            if self.with_centering:
                X += self.center_
        return X







 

 

RobustScaler函數的使用方法

lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.5, random_state=1))
ENet = make_pipeline(RobustScaler(), ElasticNet(alpha=0.5, l1_ratio=.9, random_state=3))

 

 

 

sklearn的KFold函數的代碼解釋、使用方法

KFold函數的代碼解釋

class KFold Found at: sklearn.model_selection._split

class KFold(_BaseKFold):
    """K-Folds cross-validator
    Provides train/test indices to split data in train/test sets. Split  dataset into k consecutive folds (without shuffling by default).
    Each fold is then used once as a validation while the k - 1 remaining  folds form the training set.
    Read more in the :ref:`User Guide <cross_validation>`. 
    Parameters
    ----------
    n_splits : int, default=3
    Number of folds. Must be at least 2.
    
    shuffle : boolean, optional
    Whether to shuffle the data before splitting into batches.
    
    random_state : int, RandomState instance or None, optional, 
     default=None
    If int, random_state is the seed used by the random number  generator;
    If RandomState instance, random_state is the random number  generator;
    If None, the random number generator is the RandomState  instance used  by `np.random`. Used when ``shuffle`` == True.
















在:sklearn.model_select ._split找到的類KFold

類KFold (_BaseKFold):
」「「K-Folds cross-validator
提供訓練/測試索引來分割訓練/測試集中的數據。將數據集分割成k個連續的摺疊(默認狀況下沒有洗牌)
而後,每條摺疊使用一次做爲驗證,而k - 1條剩餘摺疊造成訓練集。
更多信息參見:ref: ' User Guide <cross_validation> '。</cross_validation>
參數
----------
n_splits :int,默認=3
摺疊的數量。必須至少是2。

shuffle :布爾型,可選
在分割成批以前是否打亂數據。

random_state :int, RandomState實例或None,可選,
默認=沒有
若是int, random_state是隨機數生成器使用的種子;
若是是RandomState實例,則random_state爲隨機數生成器;
若是沒有,隨機數生成器是「np.random」使用的RandomState實例。當' ' shuffle ' == True時使用。
















    Examples
    --------
    >>> from sklearn.model_selection import KFold
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = KFold(n_splits=2)
    >>> kf.get_n_splits(X)
    2
    >>> print(kf)  # doctest: +NORMALIZE_WHITESPACE
    KFold(n_splits=2, random_state=None, shuffle=False)
    >>> for train_index, test_index in kf.split(X):
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]
    
    Notes
    -----
    The first ``n_samples % n_splits`` folds have size
    ``n_samples // n_splits + 1``, other folds have size
    ``n_samples // n_splits``, where ``n_samples`` is the number of 
     samples.
    
    See also
    --------
    StratifiedKFold
    Takes group information into account to avoid building folds with  imbalanced class distributions (for binary or multiclass  classification tasks).
    GroupKFold: K-fold iterator variant with non-overlapping groups.
    RepeatedKFold: Repeats K-Fold n times.
    """





























另請參閱
--------
StratifiedKFold
考慮組信息,以免構建不平衡的類分佈的摺疊(對於二進制或多類分類任務)。
GroupKFold:不重疊組的K-fold迭代器變體。
RepeatedKFold:重複K-Fold n次。
」「」





    def __init__(self, n_splits=3, shuffle=False, 
        random_state=None):
        super(KFold, self).__init__(n_splits, shuffle, random_state)
    
    def _iter_test_indices(self, X, y=None, groups=None):
        n_samples = _num_samples(X)
        indices = np.arange(n_samples)
        if self.shuffle:
            check_random_state(self.random_state).shuffle(indices)
        n_splits = self.n_splits
        fold_sizes = (n_samples // n_splits) * np.ones(n_splits, dtype=np.
         int)
        fold_sizes[:n_samples % n_splits] += 1
        current = 0
        for fold_size in fold_sizes:
            start, stop = current, current + fold_size
            yield indices[start:stop]
            current = stop
















 

 

 

KFold函數的使用方法

    Examples
    --------
    >>> from sklearn.model_selection import KFold
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = KFold(n_splits=2)
    >>> kf.get_n_splits(X)
    2
    >>> print(kf)  # doctest: +NORMALIZE_WHITESPACE
    KFold(n_splits=2, random_state=None, shuffle=False)
    >>> for train_index, test_index in kf.split(X):
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]

 

 

 

 

 

sklearn的cross_val_score函數的代碼解釋、使用方法

cross_val_score函數的代碼解釋

def cross_val_score Found at: sklearn.model_selection._validation

def cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None,  n_jobs=1, verbose=0, fit_params=None,   pre_dispatch='2*n_jobs'):
    """Evaluate a score by cross-validation
    Read more in the :ref:`User Guide <cross_validation>`.

經過交叉驗證來評估一個分數

更多信息參見:ref: ' User Guide '。

  Parameters
    ----------
    estimator : estimator object implementing 'fit' The object to use to fit the data.
    
    X : array-like
    The data to fit. Can be for example a list, or an array.
    
    y : array-like, optional, default: None
    The target variable to try to predict in the case of  supervised learning.
    
    groups : array-like, with shape (n_samples,), optional
    Group labels for the samples used while splitting the dataset into  train/test set.
    
    scoring : string, callable or None, optional, default: None
    A string (see model evaluation documentation) or a scorer callable object / function with signature  ``scorer(estimator, X, y)``.
    
    cv : int, cross-validation generator or an iterable, optional
    Determines the cross-validation splitting strategy.
    Possible inputs for cv are:
    - None, to use the default 3-fold cross validation,
    - integer, to specify the number of folds in a `(Stratified)KFold`,
    - An object to be used as a cross-validation generator.
    - An iterable yielding train, test splits.
    For integer/None inputs, if the estimator is a classifier and ``y`` is  either binary or multiclass, :class:`StratifiedKFold` is used. In all  other cases, :class:`KFold` is used.
    
    Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.
    
    n_jobs : integer, optional
    The number of CPUs to use to do the computation. -1 means   'all CPUs'.
    
    verbose : integer, optional
    The verbosity level.
    
    fit_params : dict, optional
    Parameters to pass to the fit method of the estimator.
    
    pre_dispatch : int, or string, optional
    Controls the number of jobs that get dispatched during parallel  execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched  than CPUs can process. This parameter can be:
    
    - None, in which case all the jobs are immediately  created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
    - An int, giving the exact number of total jobs that are spawned
    - A string, giving an expression as a function of n_jobs, as in '2*n_jobs'
    
    Returns
    -------
    scores : array of float, shape=(len(list(cv)),)
    Array of scores of the estimator for each run of the cross validation.













































參數

    ----------

estimator:實現「適合」對象以適合數據。

    

X:數組類

須要匹配的數據。能夠是列表,也能夠是數組。

    

y : 相似數組,可選,默認:無

在監督學習的狀況下,預測的目標變量。

    

groups : 類數組,形狀(n_samples,),可選

將數據集分割爲訓練/測試集時使用的樣本的標籤分組。

    

scoring : 字符串,可調用或無,可選,默認:無

一個字符串(參見模型評估文檔)或簽名爲' ' scorer(estimator, X, y) ' '的scorer可調用對象/函數。

    

cv : int,交叉驗證生成器或可迭代,可選

肯定交叉驗證分割策略。

cv可能的輸入有:

-無,使用默認的三折交叉驗證,

-整數,用於指定「(分層的)KFold」中的摺疊數,

-用做交叉驗證生成器的對象。

-一個可迭代產生的序列,測試分裂。

對於整數/無輸入,若是估計器是一個分類器,而且' ' y ' '是二進制的或多類的,則使用:class: ' StratifiedKFold '。在全部其餘狀況下,使用:class: ' KFold '。

    

請參考:ref: ' User Guide ',瞭解能夠在這裏使用的各類交叉驗證策略。

    

n_jobs:整數,可選

用於進行計算的cpu數量。-1表示「全部cpu」。

    

verbose:整數,可選

冗長的水平。

    

fit_params :dict,可選

參數傳遞給估計器的擬合方法。

    

pre_dispatch: int或string,可選

控制並行執行期間分派的做業數量。當分配的做業多於cpu可以處理的任務時,減小這個數量有助於避免內存消耗激增。該參數能夠爲:

-無,在這種狀況下,當即建立並派生全部做業。將此用於輕量級和快速運行的做業,以免因爲按需生成做業而形成的延遲

-一個int,給出生成的做業的確切總數

一個字符串,給出一個做爲n_jobs函數的表達式,如'2*n_jobs'

    

返回

    -------

(len(list(cv)),)

交叉驗證的每次運行估計器的分數數組。

    Examples
    --------
    >>> from sklearn import datasets, linear_model
    >>> from sklearn.model_selection import cross_val_score
    >>> diabetes = datasets.load_diabetes()
    >>> X = diabetes.data[:150]
    >>> y = diabetes.target[:150]
    >>> lasso = linear_model.Lasso()
    >>> print(cross_val_score(lasso, X, y))  # doctest: +ELLIPSIS
    [ 0.33150734  0.08022311  0.03531764]
    
    See Also
    ---------
    :func:`sklearn.model_selection.cross_validate`:
    To run cross-validation on multiple metrics and also to return  train scores, fit times and score times.
    
    :func:`sklearn.metrics.make_scorer`:
    Make a scorer from a performance metric or loss function.
    
    """
    # To ensure multimetric format is not supported
    scorer = check_scoring(estimator, scoring=scoring)
    cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups, 
        scoring={'score':scorer}, cv=cv, 
        return_train_score=False, 
        n_jobs=n_jobs, verbose=verbose, 
        fit_params=fit_params, 
        pre_dispatch=pre_dispatch)
    return cv_results['test_score']



























另請參閱
---------
:func:「sklearn.model_selection.cross_validate」:
在多個指標上進行交叉驗證,並返回訓練分數、適應時間和得分時間。

:func:「sklearn.metrics.make_scorer」:
從性能度量或損失函數中製做一個記分員。

」「」
#以確保不支持多度量格式








scoring參數可選的對象

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Scoring

Function

Comment

Classification

   

‘accuracy’

metrics.accuracy_score

 

‘balanced_accuracy’

metrics.balanced_accuracy_score

 

‘average_precision’

metrics.average_precision_score

 

‘neg_brier_score’

metrics.brier_score_loss

 

‘f1’

metrics.f1_score

for binary targets

‘f1_micro’

metrics.f1_score

micro-averaged

‘f1_macro’

metrics.f1_score

macro-averaged

‘f1_weighted’

metrics.f1_score

weighted average

‘f1_samples’

metrics.f1_score

by multilabel sample

‘neg_log_loss’

metrics.log_loss

requires predict_proba support

‘precision’ etc.

metrics.precision_score

suffixes apply as with ‘f1’

‘recall’ etc.

metrics.recall_score

suffixes apply as with ‘f1’

‘jaccard’ etc.

metrics.jaccard_score

suffixes apply as with ‘f1’

‘roc_auc’

metrics.roc_auc_score

 

‘roc_auc_ovr’

metrics.roc_auc_score

 

‘roc_auc_ovo’

metrics.roc_auc_score

 

‘roc_auc_ovr_weighted’

metrics.roc_auc_score

 

‘roc_auc_ovo_weighted’

metrics.roc_auc_score

 

Clustering

   

‘adjusted_mutual_info_score’

metrics.adjusted_mutual_info_score

 

‘adjusted_rand_score’

metrics.adjusted_rand_score

 

‘completeness_score’

metrics.completeness_score

 

‘fowlkes_mallows_score’

metrics.fowlkes_mallows_score

 

‘homogeneity_score’

metrics.homogeneity_score

 

‘mutual_info_score’

metrics.mutual_info_score

 

‘normalized_mutual_info_score’

metrics.normalized_mutual_info_score

 

‘v_measure_score’

metrics.v_measure_score

 

Regression

   

‘explained_variance’

metrics.explained_variance_score

 

‘max_error’

metrics.max_error

 

‘neg_mean_absolute_error’

metrics.mean_absolute_error

 

‘neg_mean_squared_error’

metrics.mean_squared_error

 

‘neg_root_mean_squared_error’

metrics.mean_squared_error

 

‘neg_mean_squared_log_error’

metrics.mean_squared_log_error

 

‘neg_median_absolute_error’

metrics.median_absolute_error

 

‘r2’

metrics.r2_score

 

‘neg_mean_poisson_deviance’

metrics.mean_poisson_deviance

 

‘neg_mean_gamma_deviance’

metrics.mean_gamma_deviance

 

 

cross_val_score函數的使用方法

一、分類預測——糖尿病

    >>> from sklearn import datasets, linear_model
    >>> from sklearn.model_selection import cross_val_score
    >>> diabetes = datasets.load_diabetes()
    >>> X = diabetes.data[:150]
    >>> y = diabetes.target[:150]
    >>> lasso = linear_model.Lasso()
    >>> print(cross_val_score(lasso, X, y))  # doctest: +ELLIPSIS
    [ 0.33150734  0.08022311  0.03531764]

 

二、分類預測——iris鳶尾花

from sklearn import datasets	#自帶數據集
from sklearn.model_selection import train_test_split,cross_val_score	#劃分數據 交叉驗證
from sklearn.neighbors import KNeighborsClassifier  #一個簡單的模型,只有K一個參數,相似K-means
import matplotlib.pyplot as plt
iris = datasets.load_iris()		#加載sklearn自帶的數據集
X = iris.data 			#這是數據
y = iris.target 		#這是每一個數據所對應的標籤
train_X,test_X,train_y,test_y = train_test_split(X,y,test_size=1/3,random_state=3)	#這裏劃分數據以1/3的來劃分 訓練集訓練結果 測試集測試結果
k_range = range(1,31)
cv_scores = []		#用來放每一個模型的結果值
for n in k_range:
    knn = KNeighborsClassifier(n)   #knn模型,這裏一個超參數能夠作預測,當多個超參數時須要使用另外一種方法GridSearchCV
    scores = cross_val_score(knn,train_X,train_y,cv=10,scoring='accuracy')  #cv:選擇每次測試折數  accuracy:評價指標是準確度,能夠省略使用默認值,具體使用參考下面。
    cv_scores.append(scores.mean())
plt.plot(k_range,cv_scores)
plt.xlabel('K')
plt.ylabel('Accuracy')		#經過圖像選擇最好的參數
plt.show()
best_knn = KNeighborsClassifier(n_neighbors=3)	# 選擇最優的K=3傳入模型
best_knn.fit(train_X,train_y)			#訓練模型
print(best_knn.score(test_X,test_y))	#看看評分
相關文章
相關標籤/搜索