PyCon2018：兩款最新ML數據可視化庫：Altair和Yellowbrick

時間 2019-11-29

標籤 pycon2018 pycon 最新數據可視化 altair yellowbrick 简体版

原文原文鏈接

原做者：David 9html

原文發於做者我的博客，點擊查看原文，掘金已得到轉載受權。再次感謝做者。git

PyCon2018兩款最新ML數據可視化庫：Altair和Yellowbrick，函數式編程的可視化庫和scikit-learn加強可視化庫github

數據科學的可視化庫和深度學習框架庫同樣，雖然層出不窮，可是大體分爲兩種：編程

一種是通用可視化庫任何相似json schema的靜態數據均可以用它做圖如： Pandas, Seaborn , ggplot, Bokeh, pygal, Plotly 。json

另外一種是和框架耦合較高的可視化庫，如TensorFlow的TensorBoard，scikit-learn加強可視化庫Yellowbrick。api

對於第一種通用庫，方便簡潔、易用的趨勢一直沒有改變。這屆PyCon2018上的talk：Exploratory Data Visualization with Vega, Vega-Lite, and Altair 就介紹了Altair這種新的函數式編程可視化庫，其簡潔程度，只要拿到panda的dataframe數據，多加一句聲明代碼，就能夠進行可視化了：bash

import altair as alt# to use with Jupyter notebook (not JupyterLab) run the following# alt.renderers.enable('notebook')# load a simple dataset as a pandas DataFramefrom vega_datasets import datacars = data.cars()# 這裏是聲明代碼，是否是有函數式編程的味道 ？alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin',)複製代碼

Altair例程

若是要把點的樣式改爲線的樣式，只需把函數mark_point()改爲mark_line()便可：微信

alt.Chart(cars).mark_line().encode(    x='Horsepower',    y='Miles_per_Gallon',    color='Origin',)複製代碼

這裏能夠注意到不管你的car數據集有多少特徵，可視化時你須要什麼特徵，在encode函數中聲明就能夠了。固然，Altair API還有許多便捷的地方，許多實例的jupyterNOTEBOOK例子能夠先試試。框架

而對於和scikit-learn耦合較高的可視化庫Yellowbrick，甚至在可視化的過程當中，已經融入的訓練過程：ide

from sklearn.linear_model import LogisticRegressionfrom yellowbrick.classifier import ROCAUC# 初始化分類模型和可視化logistic = LogisticRegression()visualizer = ROCAUC(logistic)visualizer.fit(X_train, y_train) # visualizer對象其實就是estimater類的繼承，能夠進行fit訓練visualizer.score(X_test, y_test) # 在測試集上得分g = visualizer.poof() # 得到ROCAUC的分析圖複製代碼

如上述代碼，在logistic迴歸模型訓練完畢就當即輸出分析圖：

來自：http://www.scikit-yb.org/en/latest/api/classifier/rocauc.html

一樣，PCA分析也同樣，可視化和訓練代碼是耦合的：

from yellowbrick.features.pca import PCADecompositionvisualizer = PCADecomposition(scale=True, center=False, color=y)visualizer.fit_transform(X,y)visualizer.poof()複製代碼

上述代碼直接實現了兩維的PCA可視化：