This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Parameters
----------
*steps : list of estimators,

memory : None, str or object with the joblib.Memory interface, optional
Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

根據給定的估算器構造一條管道。

這是管道構造函數的簡寫;它不須要，也不容許命名估算器。相反，它們的名稱將自動設置爲類型的小寫。

參數

----------

*steps :評估表、

memory:無，str或帶有joblib的對象。內存接口,可選

用於緩存安裝在管道中的變壓器。默認狀況下，不執行緩存。若是給定一個字符串，它就是到緩存目錄的路徑。啓用緩存會在安裝前觸發變壓器的克隆。所以，給管線的變壓器實例不能直接檢查。使用屬性' ' named_steps ' ' '或' ' steps ' '檢查管道中的評估器。當裝配耗時時，緩存變壓器是有利的。

Examples
--------
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.preprocessing import StandardScaler
>>> make_pipeline(StandardScaler(), GaussianNB(priors=None))
... # doctest: +NORMALIZE_WHITESPACE
Pipeline(memory=None,
steps=[('standardscaler',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('gaussiannb', GaussianNB(priors=None))])

Returns
-------
p : Pipeline
"""
memory = kwargs.pop('memory', None)
if kwargs:
raise TypeError('Unknown keyword arguments: "{}"'
.format(list(kwargs.keys())[0]))
return Pipeline(_name_estimators(steps), memory=memory)

sklearn的make_pipeline函數的使用方法

Examples
    --------
    >>> from sklearn.naive_bayes import GaussianNB
    >>> from sklearn.preprocessing import StandardScaler
    >>> make_pipeline(StandardScaler(), GaussianNB(priors=None))
    ...     # doctest: +NORMALIZE_WHITESPACE
    Pipeline(memory=None,
             steps=[('standardscaler',
                     StandardScaler(copy=True, with_mean=True, with_std=True)),
                    ('gaussiannb', GaussianNB(priors=None))])

    Returns
    -------
    p : Pipeline

一、使用Pipeline類來表示在使用MinMaxScaler縮放數據以後再訓練一個SVM的工做流程

from sklearn.pipeline import Pipeline
pipe = Pipeline([("scaler",MinMaxScaler()),("svm",SVC())])
pip.fit(X_train,y_train)
pip.score(X_test,y_test)

二、make_pipeline函數建立管道

用Pipeline類構建管道時語法有點麻煩，咱們一般不須要爲每個步驟提供用戶指定的名稱，這種狀況下，就能夠用make_pipeline函數建立管道，它能夠爲咱們建立管道並根據每一個步驟所屬的類爲其自動命名。

from sklearn.pipeline import make_pipeline
pipe = make_pipeline(MinMaxScaler(),SVC())

參考文章
《Python機器學習基礎教程》構建管道(make_pipeline)
Python sklearn.pipeline.make_pipeline() Examples

sklearn的RobustScaler函數的代碼解釋、使用方法

RobustScaler函數的代碼解釋

class RobustScaler(BaseEstimator, TransformerMixin):
"""Scale features using statistics that are robust to outliers.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range).
The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature (or each sample, depending on the ``axis`` argument) by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the ``transform`` method.

Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the interquartile range often give better results.

.. versionadded:: 0.17

Read more in the :ref:`User Guide <preprocessing_scaler>`.

Parameters
----------
with_centering : boolean, True by default
If True, center the data before scaling. This will cause ``transform`` to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_scaling : boolean, True by default
If True, scale the data to interquartile range.

quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0
Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR
Quantile range used to calculate ``scale_``.

.. versionadded:: 0.18

copy : boolean, optional, default is True
If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be
returned.

Attributes
----------
center_ : array of floats
The median value for each feature in the training set.

scale_ : array of floats
The (scaled) interquartile range for each feature in the training set.

.. versionadded:: 0.17
*scale_* attribute.

See also
--------
robust_scale: Equivalent function without the estimator API.

:class:`sklearn.decomposition.PCA`
Further removes the linear correlation across features with 'whiten=True'.

Notes
-----
For a comparison of the different scalers, transformers, and normalizers, see :ref:`examples/preprocessing/plot_all_scaling.py
<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.

https://en.wikipedia.org/wiki/Median_(statistics)
https://en.wikipedia.org/wiki/Interquartile_range
"""

使用對離羣值穩健的統計數據來衡量特徵。

這個標量去除中值，並根據分位數範圍(默認爲IQR:四分位數範圍)對數據進行縮放。
IQR是第1個四分位數(第25分位數)和第3個四分位數(第75分位數)之間的範圍。經過計算訓練集中樣本的相關統計數據，在每一個特徵(或每一個樣本，取決於「軸」參數)上獨立地進行定心和縮放。而後，中值和四分位範圍被存儲起來，以便使用「變換」方法在之後的數據中使用。

數據集的標準化是許多機器學習估計器的常見需求。這一般是經過去除平均值和縮放到單位方差來實現的。然而，異常值每每會對樣本均值/方差產生負面影響。在這種狀況下，中位數和四分位範圍一般會獲得更好的結果。

. .versionadded:: 0.17

詳見:ref: ' User Guide '。</preprocessing_scaler>

參數
----------
with_center:boolean，默認爲True
若是爲真，在縮放前將數據居中。這將致使「轉換」在嘗試處理稀疏矩陣時引起異常，由於圍繞它們須要構建一個密集的矩陣，在常見的用例中，這個矩陣可能太大而沒法裝入內存。

with_scaling:布爾值，默認爲True
若是爲真，將數據縮放到四分位範圍。

quantile_range:元組(q_min, q_max)， 0.0 < q_min < q_max < 100.0
默認:(25.0,75.0)=(第1分位數，第3分位數)= IQR
用於計算' ' scale_ ' '的分位數範圍。

. .versionadded:: 0.18

布爾值，可選，默認爲真
若是爲False，則儘可能避免複製，而改成就地縮放。這並不能保證老是有效的;例如，若是數據不是一個NumPy數組或scipy。稀疏CSR矩陣，仍可複製
返回。

屬性
----------
浮點數數組
訓練集中每一個特徵的中值。

浮點數數組
訓練集中每一個特徵的(縮放的)四分位範圍。

. .versionadded:: 0.17
* scale_ *屬性。

另請參閱
--------
沒有estimator API的等價函數。

類:「sklearn.decomposition.PCA」
進一步消除了「whiten=True」特徵之間的線性相關性。

筆記
-----
對於不一樣的標量、轉換器和規整器的比較，請參見:ref: ' examples/preprocessing/ plot_all_scale .py
< sphx_glr_auto_examples_preprocessing_plot_all_scaling.py >」。

https://en.wikipedia.org/wiki/Median_(統計)
https://en.wikipedia.org/wiki/Interquartile_range
」「」

def __init__(self, with_centering=True, with_scaling=True,
quantile_range=(25.0, 75.0), copy=True):
self.with_centering = with_centering
self.with_scaling = with_scaling
self.quantile_range = quantile_range
self.copy = copy

def _check_array(self, X, copy):
"""Makes sure centering is not enabled for sparse matrices."""
X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
estimator=self, dtype=FLOAT_DTYPES)

if sparse.issparse(X):
if self.with_centering:
raise ValueError(
"Cannot center sparse matrices: use `with_centering=False`"
" instead. See docstring for motivation and alternatives.")
return X

def fit(self, X, y=None):
"""Compute the median and quantiles to be used for scaling.

Parameters
----------
X : array-like, shape [n_samples, n_features]
The data used to compute the median and quantiles
used for later scaling along the features axis.
"""
if sparse.issparse(X):
raise TypeError("RobustScaler cannot be fitted on sparse inputs")
X = self._check_array(X, self.copy)
if self.with_centering:
self.center_ = np.median(X, axis=0)

if self.with_scaling:
q_min, q_max = self.quantile_range
if not 0 <= q_min <= q_max <= 100:
raise ValueError("Invalid quantile range: %s" %
str(self.quantile_range))

q = np.percentile(X, self.quantile_range, axis=0)
self.scale_ = (q[1] - q[0])
self.scale_ = _handle_zeros_in_scale(self.scale_, copy=False)
return self

def transform(self, X):
"""Center and scale the data.

Can be called on sparse input, provided that ``RobustScaler`` has been
fitted to dense input and ``with_centering=False``.

Parameters
----------
X : {array-like, sparse matrix}
The data used to scale along the specified axis.
"""
if self.with_centering:
check_is_fitted(self, 'center_')
if self.with_scaling:
check_is_fitted(self, 'scale_')
X = self._check_array(X, self.copy)

if sparse.issparse(X):
if self.with_scaling:
inplace_column_scale(X, 1.0 / self.scale_)
else:
if self.with_centering:
X -= self.center_
if self.with_scaling:
X /= self.scale_
return X

def inverse_transform(self, X):
"""Scale back the data to the original representation

Parameters
----------
X : array-like
The data used to scale along the specified axis.
"""
if self.with_centering:
check_is_fitted(self, 'center_')
if self.with_scaling:
check_is_fitted(self, 'scale_')
X = self._check_array(X, self.copy)

if sparse.issparse(X):
if self.with_scaling:
inplace_column_scale(X, self.scale_)
else:
if self.with_scaling:
X *= self.scale_
if self.with_centering:
X += self.center_
return X

RobustScaler函數的使用方法

lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.5, random_state=1))
ENet = make_pipeline(RobustScaler(), ElasticNet(alpha=0.5, l1_ratio=.9, random_state=3))

sklearn的KFold函數的代碼解釋、使用方法

KFold函數的代碼解釋

class KFold Found at: sklearn.model_selection._split

class KFold(_BaseKFold):
"""K-Folds cross-validator
Provides train/test indices to split data in train/test sets. Split  dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining  folds form the training set.
Read more in the :ref:`User Guide <cross_validation>`.
Parameters
----------
n_splits : int, default=3
Number of folds. Must be at least 2.

shuffle : boolean, optional
Whether to shuffle the data before splitting into batches.

random_state : int, RandomState instance or None, optional,
default=None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used  by `np.random`. Used when ``shuffle`` == True.

在:sklearn.model_select ._split找到的類KFold

類KFold (_BaseKFold):
」「「K-Folds cross-validator
提供訓練/測試索引來分割訓練/測試集中的數據。將數據集分割成k個連續的摺疊(默認狀況下沒有洗牌)。
而後，每條摺疊使用一次做爲驗證，而k - 1條剩餘摺疊造成訓練集。
更多信息參見:ref: ' User Guide <cross_validation> '。</cross_validation>
參數
----------
n_splits :int，默認=3
摺疊的數量。必須至少是2。

shuffle :布爾型，可選
在分割成批以前是否打亂數據。

random_state :int, RandomState實例或None，可選，
默認=沒有
若是int, random_state是隨機數生成器使用的種子;
若是是RandomState實例，則random_state爲隨機數生成器;
若是沒有，隨機數生成器是「np.random」使用的RandomState實例。當' ' shuffle ' == True時使用。

Examples
--------
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf) # doctest: +NORMALIZE_WHITESPACE
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

Notes
-----
The first ``n_samples % n_splits`` folds have size
``n_samples // n_splits + 1``, other folds have size
``n_samples // n_splits``, where ``n_samples`` is the number of
samples.

See also
--------
StratifiedKFold
Takes group information into account to avoid building folds with imbalanced class distributions (for binary or multiclass  classification tasks).
GroupKFold: K-fold iterator variant with non-overlapping groups.
RepeatedKFold: Repeats K-Fold n times.
"""

另請參閱
--------
StratifiedKFold
考慮組信息，以免構建不平衡的類分佈的摺疊(對於二進制或多類分類任務)。
GroupKFold:不重疊組的K-fold迭代器變體。
RepeatedKFold:重複K-Fold n次。
」「」

def __init__(self, n_splits=3, shuffle=False,
random_state=None):
super(KFold, self).__init__(n_splits, shuffle, random_state)

def _iter_test_indices(self, X, y=None, groups=None):
n_samples = _num_samples(X)
indices = np.arange(n_samples)
if self.shuffle:
check_random_state(self.random_state).shuffle(indices)
n_splits = self.n_splits
fold_sizes = (n_samples // n_splits) * np.ones(n_splits, dtype=np.
int)
fold_sizes[:n_samples % n_splits] += 1
current = 0
for fold_size in fold_sizes:
start, stop = current, current + fold_size
yield indices[start:stop]
current = stop

KFold函數的使用方法

    Examples
    --------
    >>> from sklearn.model_selection import KFold
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = KFold(n_splits=2)
    >>> kf.get_n_splits(X)
    2
    >>> print(kf)  # doctest: +NORMALIZE_WHITESPACE
    KFold(n_splits=2, random_state=None, shuffle=False)
    >>> for train_index, test_index in kf.split(X):
    ...    print("TRAIN:", train_index, "TEST:", test_index)
    ...    X_train, X_test = X[train_index], X[test_index]
    ...    y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]

sklearn的cross_val_score函數的代碼解釋、使用方法

cross_val_score函數的代碼解釋

def cross_val_score Found at: sklearn.model_selection._validation

def cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs'):
"""Evaluate a score by cross-validation
Read more in the :ref:`User Guide <cross_validation>`.

經過交叉驗證來評估一個分數

更多信息參見:ref: ' User Guide '。

Parameters
----------
estimator : estimator object implementing 'fit' The object to use to fit the data.

X : array-like
The data to fit. Can be for example a list, or an array.

y : array-like, optional, default: None
The target variable to try to predict in the case of  supervised learning.

groups : array-like, with shape (n_samples,), optional
Group labels for the samples used while splitting the dataset into  train/test set.

scoring : string, callable or None, optional, default: None
A string (see model evaluation documentation) or a scorer callable object / function with signature  ``scorer(estimator, X, y)``.

cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a `(Stratified)KFold`,
- An object to be used as a cross-validation generator.
- An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and ``y`` is  either binary or multiclass, :class:`StratifiedKFold` is used. In all  other cases, :class:`KFold` is used.

Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.

n_jobs : integer, optional
The number of CPUs to use to do the computation. -1 means   'all CPUs'.

verbose : integer, optional
The verbosity level.

fit_params : dict, optional
Parameters to pass to the fit method of the estimator.

pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel  execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched  than CPUs can process. This parameter can be:

- None, in which case all the jobs are immediately  created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
- An int, giving the exact number of total jobs that are spawned
- A string, giving an expression as a function of n_jobs, as in '2*n_jobs'

Returns
-------
scores : array of float, shape=(len(list(cv)),)
Array of scores of the estimator for each run of the cross validation.

參數

----------

estimator:實現「適合」對象以適合數據。

X:數組類

須要匹配的數據。能夠是列表，也能夠是數組。

y : 相似數組，可選，默認:無

在監督學習的狀況下，預測的目標變量。

groups : 類數組，形狀(n_samples，)，可選

將數據集分割爲訓練/測試集時使用的樣本的標籤分組。

scoring : 字符串，可調用或無，可選，默認:無

一個字符串(參見模型評估文檔)或簽名爲' ' scorer(estimator, X, y) ' '的scorer可調用對象/函數。

cv : int，交叉驗證生成器或可迭代，可選

肯定交叉驗證分割策略。

cv可能的輸入有:

-無，使用默認的三折交叉驗證，

-整數，用於指定「(分層的)KFold」中的摺疊數，

-用做交叉驗證生成器的對象。

-一個可迭代產生的序列，測試分裂。

對於整數/無輸入，若是估計器是一個分類器，而且' ' y ' '是二進制的或多類的，則使用:class: ' StratifiedKFold '。在全部其餘狀況下，使用:class: ' KFold '。

請參考:ref: ' User Guide '，瞭解能夠在這裏使用的各類交叉驗證策略。

n_jobs:整數，可選

用於進行計算的cpu數量。-1表示「全部cpu」。

verbose:整數，可選

冗長的水平。

fit_params :dict，可選

參數傳遞給估計器的擬合方法。

pre_dispatch: int或string，可選

控制並行執行期間分派的做業數量。當分配的做業多於cpu可以處理的任務時，減小這個數量有助於避免內存消耗激增。該參數能夠爲:

-無，在這種狀況下，當即建立並派生全部做業。將此用於輕量級和快速運行的做業，以免因爲按需生成做業而形成的延遲

-一個int，給出生成的做業的確切總數

一個字符串，給出一個做爲n_jobs函數的表達式，如'2*n_jobs'

-------

(len(list(cv))，)

交叉驗證的每次運行估計器的分數數組。

Examples
--------
>>> from sklearn import datasets, linear_model
>>> from sklearn.model_selection import cross_val_score
>>> diabetes = datasets.load_diabetes()
>>> X = diabetes.data[:150]
>>> y = diabetes.target[:150]
>>> lasso = linear_model.Lasso()
>>> print(cross_val_score(lasso, X, y)) # doctest: +ELLIPSIS
[ 0.33150734 0.08022311 0.03531764]

See Also
---------
:func:`sklearn.model_selection.cross_validate`:
To run cross-validation on multiple metrics and also to return  train scores, fit times and score times.

:func:`sklearn.metrics.make_scorer`:
Make a scorer from a performance metric or loss function.

"""
# To ensure multimetric format is not supported
scorer = check_scoring(estimator, scoring=scoring)
cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,
scoring={'score':scorer}, cv=cv,
return_train_score=False,
n_jobs=n_jobs, verbose=verbose,
fit_params=fit_params,
pre_dispatch=pre_dispatch)
return cv_results['test_score']

另請參閱
---------
:func:「sklearn.model_selection.cross_validate」:
在多個指標上進行交叉驗證，並返回訓練分數、適應時間和得分時間。

:func:「sklearn.metrics.make_scorer」:
從性能度量或損失函數中製做一個記分員。

」「」
#以確保不支持多度量格式

scoring參數可選的對象

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Scoring	Function	Comment
Classification
‘accuracy’	`metrics.accuracy_score`
‘balanced_accuracy’	`metrics.balanced_accuracy_score`
‘average_precision’	`metrics.average_precision_score`
‘neg_brier_score’	`metrics.brier_score_loss`
‘f1’	`metrics.f1_score`	for binary targets
‘f1_micro’	`metrics.f1_score`	micro-averaged
‘f1_macro’	`metrics.f1_score`	macro-averaged
‘f1_weighted’	`metrics.f1_score`	weighted average
‘f1_samples’	`metrics.f1_score`	by multilabel sample
‘neg_log_loss’	`metrics.log_loss`	requires `predict_proba` support
‘precision’ etc.	`metrics.precision_score`	suffixes apply as with ‘f1’
‘recall’ etc.	`metrics.recall_score`	suffixes apply as with ‘f1’
‘jaccard’ etc.	`metrics.jaccard_score`	suffixes apply as with ‘f1’
‘roc_auc’	`metrics.roc_auc_score`
‘roc_auc_ovr’	`metrics.roc_auc_score`
‘roc_auc_ovo’	`metrics.roc_auc_score`
‘roc_auc_ovr_weighted’	`metrics.roc_auc_score`
‘roc_auc_ovo_weighted’	`metrics.roc_auc_score`
Clustering
‘adjusted_mutual_info_score’	`metrics.adjusted_mutual_info_score`
‘adjusted_rand_score’	`metrics.adjusted_rand_score`
‘completeness_score’	`metrics.completeness_score`
‘fowlkes_mallows_score’	`metrics.fowlkes_mallows_score`
‘homogeneity_score’	`metrics.homogeneity_score`
‘mutual_info_score’	`metrics.mutual_info_score`
‘normalized_mutual_info_score’	`metrics.normalized_mutual_info_score`
‘v_measure_score’	`metrics.v_measure_score`
Regression
‘explained_variance’	`metrics.explained_variance_score`
‘max_error’	`metrics.max_error`
‘neg_mean_absolute_error’	`metrics.mean_absolute_error`
‘neg_mean_squared_error’	`metrics.mean_squared_error`
‘neg_root_mean_squared_error’	`metrics.mean_squared_error`
‘neg_mean_squared_log_error’	`metrics.mean_squared_log_error`
‘neg_median_absolute_error’	`metrics.median_absolute_error`
‘r2’	`metrics.r2_score`
‘neg_mean_poisson_deviance’	`metrics.mean_poisson_deviance`
‘neg_mean_gamma_deviance’	`metrics.mean_gamma_deviance`

cross_val_score函數的使用方法

一、分類預測——糖尿病

    >>> from sklearn import datasets, linear_model
    >>> from sklearn.model_selection import cross_val_score
    >>> diabetes = datasets.load_diabetes()
    >>> X = diabetes.data[:150]
    >>> y = diabetes.target[:150]
    >>> lasso = linear_model.Lasso()
    >>> print(cross_val_score(lasso, X, y))  # doctest: +ELLIPSIS
    [ 0.33150734  0.08022311  0.03531764]

二、分類預測——iris鳶尾花

from sklearn import datasets	#自帶數據集
from sklearn.model_selection import train_test_split,cross_val_score	#劃分數據 交叉驗證
from sklearn.neighbors import KNeighborsClassifier  #一個簡單的模型，只有K一個參數，相似K-means
import matplotlib.pyplot as plt
iris = datasets.load_iris()		#加載sklearn自帶的數據集
X = iris.data 			#這是數據
y = iris.target 		#這是每一個數據所對應的標籤
train_X,test_X,train_y,test_y = train_test_split(X,y,test_size=1/3,random_state=3)	#這裏劃分數據以1/3的來劃分 訓練集訓練結果 測試集測試結果
k_range = range(1,31)
cv_scores = []		#用來放每一個模型的結果值
for n in k_range:
    knn = KNeighborsClassifier(n)   #knn模型，這裏一個超參數能夠作預測，當多個超參數時須要使用另外一種方法GridSearchCV
    scores = cross_val_score(knn,train_X,train_y,cv=10,scoring='accuracy')  #cv：選擇每次測試折數  accuracy：評價指標是準確度,能夠省略使用默認值，具體使用參考下面。
    cv_scores.append(scores.mean())
plt.plot(k_range,cv_scores)
plt.xlabel('K')
plt.ylabel('Accuracy')		#經過圖像選擇最好的參數
plt.show()
best_knn = KNeighborsClassifier(n_neighbors=3)	# 選擇最優的K=3傳入模型
best_knn.fit(train_X,train_y)			#訓練模型
print(best_knn.score(test_X,test_y))	#看看評分