Python數據可視化庫seaborn ------ 多變量的分佈繪圖：stripplot()、swarmplot()；箱線圖與小提琴圖；條形圖；點圖；多層面板分類圖：catplot函數、FacetG

時間 2019-12-01

標籤 python 數據可視化 seaborn 變量分佈繪圖 stripplot swarmplot 線圖小提琴條形圖多層面板分類 catplot 函數 facetg 欄目 Python 简体版

原文原文鏈接

seaborn官方文檔：http://seaborn.pydata.org/api.htmlhtml

繪製多變量的分佈圖bootstrap

　　先繪製兩個變量的分佈圖，其中X變量爲分類變量，Y爲數值變量。api

1 import pandas as pd 2 import numpy as np 3 import seaborn as sns 4 import matplotlib.pyplot as plt 5 import matplotlib as mpl 6 tips = sns.load_dataset("tips") 7 sns.set(style="whitegrid", color_codes=True) 8 sns.stripplot(x="day", y="total_bill", data=tips) 9 plt.show()

　　運行結果：app

　　注意：觀察上圖不難發現，帶圖默認是有抖動的，即 jitter=True 。下面用 swarmplot 繪製帶分佈的散點圖。而且將展現在圖中分割多個分類變量，以不一樣的顏色展現。dom

1 plt.subplot(121) 2 sns.swarmplot(x="day", y="total_bill", data=tips) 3 plt.subplot(122) 4 sns.swarmplot(x="day", y="total_bill", hue="sex", data=tips) 5 plt.show() 6 sns.swarmplot(x="total_bill", y="day", hue="time", data=tips) 7 plt.show()

　　運行結果：ide

　　經過上面的圖像咱們很容易觀察到 day 與 time 、sex 之間的一些關係。函數

箱線圖與小提琴圖this

　　下面咱們將繪製箱線圖以及小提琴圖展現變量間的關係spa

　盒圖scala

　　IQR即統計學概念四分位距，第一/四分位與第三/四分位之間的距離

　　N = 1.5IQR 若是一個值>Q3+N或　<　Ｑ1-N,則爲離羣點

1 sns.boxplot(x="day", y="total_bill", hue="time", data=tips) 2 plt.show()

　　運行結果：

　　小提琴圖能夠作相似的效果，且可以展現其分佈

1 sns.violinplot(x="total_bill", y="day", hue="time", data=tips) 2 plt.show()

　　運行結果：

　　中間的黑色粗線爲4分位距，細線爲 95% 置信區間。咱們也能夠將小提琴圖設置爲一邊顯示一個類別，這樣對比性就更加明確。

1 sns.violinplot(x="day", y="total_bill", hue="sex", data=tips) 2 plt.show() 3 sns.violinplot(x="day", y="total_bill", hue="sex", data=tips, split=True) 4 plt.show()

　　運行結果：

　　明顯能夠發現上面第二張圖區分更明顯。兩種函數結合能夠生成更加炫酷的圖：

1 sns.violinplot(x="day", y="total_bill", data=tips, inner=None) # inner 小提琴內部圖形 2 sns.swarmplot(x="day", y="total_bill", data=tips, color="w", alpha=.5) # alpha 透明度 3 plt.show() 4 sns.violinplot(x="day", y="total_bill", data=tips, inner=None) 5 sns.swarmplot(x="day", y="total_bill", data=tips, color="w",) 6 plt.show()

　　固然咱們也能夠橫着展現箱線圖：

1 sns.boxplot(data=iris, orient="h") # orient 垂直和水平 2 plt.show()

條形圖

　　顯示圖的集中趨勢

1 titanic = sns.load_dataset("titanic") 2 print(titanic.describe()) 3 print(titanic.info()) 4 sns.barplot(x="sex", y="survived", hue="class", data=titanic) 5 plt.show()

點圖能夠更好的描述變化差別

　　對class屬性分類繪製：

1 sns.pointplot(x="sex", y="survived", hue="class", data=titanic) 2 plt.show()

　　改變線形和點的形狀

1 sns.pointplot(x="class", y="survived", hue="sex", data=titanic, 2 palette={"male": "g", "female": "m"}, 3 markers=["^", "o"], linestyles=["-", "--"]) 4 plt.show()

多層面板分類圖

　下面展現的是 catplot 函數，及其參數說明：

# catplot(x=None, y=None, hue=None, data=None, row=None, col=None, # col_wrap=None, estimator=np.mean, ci=95, n_boot=1000, # units=None, order=None, hue_order=None, row_order=None, # col_order=None, kind="strip", height=5, aspect=1, # orient=None, color=None, palette=None, # legend=True, legend_out=True, sharex=True, sharey=True, # margin_titles=False, facet_kws=None, **kwargs)

Parameters:

x, y, hue : names of variables in data

Inputs for plotting long-form data. See examples for interpretation.

data : DataFrame

Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation.

row, col : names of variables in data, optional

Categorical variables that will determine the faceting of the grid.

col_wrap : int, optional

「Wrap」 the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.

estimator : callable that maps vector -> scalar, optional

Statistical function to estimate within each categorical bin.

ci : float or 「sd」 or None, optional

Size of confidence intervals to draw around estimated values. If 「sd」, skip bootstrapping and draw the standard deviation of the observations. If None, no bootstrapping will be performed, and error bars will not be drawn.

n_boot : int, optional

Number of bootstrap iterations to use when computing confidence intervals.

units : name of variable in data or vector data, optional

Identifier of sampling units, which will be used to perform a multilevel bootstrap and account for repeated measures design.

order, hue_order : lists of strings, optional

Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.

row_order, col_order : lists of strings, optional

Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects.

kind : string, optional

The kind of plot to draw (corresponds to the name of a categorical plotting function. Options are: 「point」, 「bar」, 「strip」, 「swarm」, 「box」, 「violin」, or 「boxen」.

height : scalar, optional

Height (in inches) of each facet. See also: aspect.

aspect : scalar, optional

Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches.

orient : 「v」 | 「h」, optional

Orientation of the plot (vertical or horizontal). This is usually inferred from the dtype of the input variables, but can be used to specify when the 「categorical」 variable is a numeric or when plotting wide-form data.

color : matplotlib color, optional

Color for all of the elements, or seed for a gradient palette.

palette : palette name, list, or dict, optional

Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.

legend : bool, optional

If True and there is a hue variable, draw a legend on the plot.

legend_out : bool, optional

If True, the figure size will be extended, and the legend will be drawn outside the plot on the center right.

share{x,y} : bool, ‘col’, or ‘row’ optional

If true, the facets will share y axes across columns and/or x axes across rows.

margin_titles : bool, optional

If True, the titles for the row variable are drawn to the right of the last column. This option is experimental and may not work in all cases.

facet_kws : dict, optional

Dictionary of other keyword arguments to pass to FacetGrid.

kwargs : key, value pairings

Other keyword arguments are passed through to the underlying plotting function.

Returns:

g : FacetGrid

Returns the FacetGrid object with the plot on it for further tweaking.

　　Parameters：

　　x,y,hue 數據集變量變量名

　　date 數據集數據集名

　　row,col 更多分類變量進行平鋪顯示變量名

　　col_wrap 每行的最高平鋪數整數

　　estimator 在每一個分類中進行矢量到標量的映射矢量

　　ci 置信區間浮點數或None

　　n_boot 計算置信區間時使用的引導迭代次數整數

　　units 採樣單元的標識符，用於執行多級引導和重複測量設計數據變量或向量數據

　　order, hue_order 對應排序列表字符串列表

　　row_order, col_order 對應排序列表字符串列表

　　kind : 可選：point 默認, bar 柱形圖, count 頻次, box 箱體, violin 提琴, strip 散點，swarm 分散點

　　size 每一個面的高度（英寸）標量已經不用了，如今使用height

　　aspect 縱橫比標量

　　orient 方向 "v"/"h"

　　color 顏色 matplotlib顏色

　　palette 調色板名稱 seaborn顏色色板

　　legend_hue 布爾值：若是是真的，圖形大小將被擴展，而且圖畫將繪製在中心右側的圖外。

　　share{x,y} 共享軸線 True/False：若是爲真，則刻面將經過列和/或X軸在行之間共享Y軸。

　　下面將是經常使用圖像的展現：

1 sns.catplot(x="day", y="total_bill", hue="smoker", data=tips) 2 plt.show()

1 sns.catplot(x="day", y="total_bill", hue="smoker", data=tips, kind="bar") 2 plt.show()

1 sns.catplot(x="day", y="total_bill", hue="smoker", 2 col="time", data=tips, kind="swarm") 3 plt.show()

1 sns.catplot(x="time", y="total_bill", hue="smoker", 2 col="day", data=tips, kind="box", height=4, aspect=.5) 3 plt.show()

　用 FacetGrid 這個類來展現數據

　　更多內容請點擊上面的連接，下面將簡單展現

1 g = sns.FacetGrid(tips, col="time") # 佔位 2 g.map(plt.hist, "tip") # 畫圖;第一個參數是func 3 plt.show()

1 g = sns.FacetGrid(tips, col="sex", hue="smoker") 2 g.map(plt.scatter, "total_bill", "tip", alpha=.7) 3 g.add_legend() 4 plt.show()

1 sns.set_style("ticks") 2 g = sns.FacetGrid(tips, row="smoker", col="time", margin_titles=True) # 變量標題右側，實驗性並不老是有效 3 g.map(sns.regplot, "size", "total_bill", color=".1", fit_reg=False, x_jitter=.1) # color 顏色深淺 fit_reg 迴歸的線 x_jitter 浮動 4 plt.show()

1 g = sns.FacetGrid(tips, col="day", height=4, aspect=.5) 2 g.map(sns.barplot, "sex", "total_bill", order=["Male", "Female"]) 3 plt.show()

1 from pandas import Categorical 2 ordered_days = tips.day.value_counts().index 3 print(ordered_days) 4 ordered_days = Categorical(['Thur', 'Fri', 'Sat', 'Sun']) 5 g = sns.FacetGrid(tips, row="day", row_order=ordered_days, 6 height=1.7, aspect=4) 7 g.map(sns.boxplot, "total_bill", order=["Male","Female"]) 8 plt.show()

1 pal = dict(Lunch="seagreen", Dinner="gray") 2 g = sns.FacetGrid(tips, hue="time", palette=pal, height=5) 3 g.map(plt.scatter, "total_bill", "tip", s=50, alpha=.7, linewidth=.5, edgecolors="red") # edgecolors 元素邊界顏色 4 g.add_legend() 5 plt.show()

1 g = sns.FacetGrid(tips, hue="sex", palette="Set1", height=5, hue_kws={"marker": ["^", "v"]}) 2 g.map(plt.scatter, "total_bill", "tip", s=100, linewidth=.5, edgecolor="white") 3 g.add_legend() 4 plt.show()

1 with sns.axes_style("white"): 2 g = sns.FacetGrid(tips, row="sex", col="smoker", margin_titles=True, height=2.5) 3 g.map(plt.scatter, "total_bill", "tip", color="#334488", edgecolor="white", lw=.5) 4 g.set_axis_labels("Total bill (US Dollars)", "Tip") 5 g.set(xticks=[10, 30, 50], yticks=[2, 6, 10]) 6 g.fig.subplots_adjust(wspace=.02, hspace=.02) # 子圖與子圖 7 # g.fig.subplots_adjust(left = 0.125,right = 0.5,bottom = 0.1,top = 0.9, wspace=.02, hspace=.02) 8 plt.show()

PairGrid 的簡單展現

1 iris = sns.load_dataset("iris") 2 g = sns.PairGrid(iris) 3 g.map(plt.scatter) 4 plt.show()

1 g = sns.PairGrid(iris) 2 g.map_diag(plt.hist) # 對角線 3 g.map_offdiag(plt.scatter) # 非對角線 4 plt.show()

1 g = sns.PairGrid(iris, hue="species") 2 g.map_diag(plt.hist) 3 g.map_offdiag(plt.scatter) 4 g.add_legend() 5 plt.show()

1 g = sns.PairGrid(iris, vars=["sepal_length", "sepal_width"], hue="species") # vars 取一部分 2 g.map(plt.scatter) 3 plt.show()

1 g = sns.PairGrid(tips, hue="size", palette="GnBu_d") 2 g.map(plt.scatter, s=50, edgecolor="white") 3 g.add_legend() 4 plt.show()

熱力圖

　　用顏色的深淺、亮度等來顯示數據的分佈

1 uniform_data = np.random.rand(3, 3) 2 print(uniform_data) 3 heatmap = sns.heatmap(uniform_data) 4 plt.show()

1 ax = sns.heatmap(uniform_data, vmin=0.2, vmax=0.5) # 最大最小取值 2 plt.show()

　　注意上圖的隨機數發生了變化。

1 normal_data = np.random.randn(3, 3)
2 print(normal_data)
3 ax = sns.heatmap(normal_data, center=0)      # 中心值
4 plt.show()

1 flights = sns.load_dataset("flights") 2 print(flights.head()) 3 flights = flights.pivot("month", "year", "passengers") # 根據列值重塑數據 4 print(flights) 5 sns.heatmap(flights) 6 plt.show()

1 # fmt參數在這裏是必須的，否則會亂碼 2 sns.heatmap(flights, annot=True, fmt="d") 3 plt.show()

1 sns.heatmap(flights, linewidths=.4) 2 plt.show()

1 sns.heatmap(flights, cmap="YlGnBu")  # 指定數據值到顏色空間的映射;若是不提供，默認將取決因而否設置了中心
2 plt.show()

1 sns.heatmap(flights, cbar=False) # 隱藏bar 2 plt.show()

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。