Python - matplotlib 數據可視化

時間 2020-04-30

標籤 python matplotlib 數據可視化欄目 Python 简体版

原文原文鏈接

在許多實際問題中，常常要對給出的數據進行可視化，便於觀察。html

今天專門針對Python中的數據可視化模塊--matplotlib這塊內容系統的整理，方便查找使用。

本文來自於對《利用python進行數據分析》以及網上一些博客的總結。

1 matplotlib簡介

matplotlib是Pythom可視化程序庫的泰斗，通過幾十年它仍然是Python使用者最經常使用的畫圖庫。有許多別的程序庫都是創建在它的基礎上或直接調用它，好比pandas和seaborn就是matplotlib的外包，

它們讓你使用更少的代碼去使用matplotlib的方法。 Gallery頁面中有上百幅縮略圖，打開以後都有源程序，很是適合學習matplotlib。

2 圖和子圖的創建

2.1 導入matplotlib

import matplotlib.pyplot as plt
import numpy as np

2.2 創建圖和子圖方式一

plt.plot( )會在最近的一個圖上進行繪製

from numpy.random import randn  
fig = plt.figure(figsize = (8,4))    #設置圖的大小  
ax1 = fig.add_subplot(2,2,1)  
ax2 = fig.add_subplot(2,2,2)  
ax3 = fig.add_subplot(2,1,2)  
ax3.plot(randn(50).cumsum(),'k--')   # plt.plot(randn(50).cumsum(),'k--')等效  
ax1.hist(randn(100),bins = 10, color = 'b', alpha = 0.3)      #bins 分紅多少間隔   alpha 透明度  
ax2.scatter(np.arange(30),np.arange(30) + 3*randn(30))  
plt.show()

2.3 創建子圖方式二

from numpy.random import randn  
fig, axes = plt.subplots(2,2)                               #以數組方式訪問  
t = np.arange(0., 5., 0.2)  
axes[0,0].plot(t, t, 'r-o', t, t**2, 'bs', t, t**3, 'g^')   #同時繪製多條曲線  
axes[1,1].plot(randn(40).cumsum(),'b--')  
plt.show()

2.4 主題設置

使用style.use()函數python

df_iris = pd.read_csv('../input/iris.csv')  
plt.style.use('ggplot')    #'fivethirtyeight'，'ggplot'，'dark_background'，'bmh'  
df_iris.hist('sepal length')  
plt.show()

3 顏色、標記、線型、刻度、標籤和圖例

from numpy.random import randn  
fig = plt.figure()  
ax1 = fig.add_subplot(1,1,1)  
ax1.plot(randn(30).cumsum(),color = 'b',linestyle = '--',marker = 'o',label = '$cumsum$')  # 線型  能夠直接'k--o'  
ax1.set_xlim(10,25)                                                                                     
ax1.set_title('My first plot')  
ax1.set_xlabel('Stages')  
plt.legend(loc = 'best')           #把圖放在不礙事的地方  xticks（[]）設置刻度  
plt.show()

等價於下面的代碼：數組

from numpy.random import randn  
fig = plt.figure()  
ax1 = fig.add_subplot(1,1,1)  
ax1.plot(randn(30).cumsum(),color = 'b',linestyle = '--',marker = 'o',label = '$cumsum$')   #圖標可使用latex內嵌公式  
plt.xlim(10,25)                   #plt.axis([10,25,0,10])對x,y軸範圍同時進行設置  
plt.title('My first plot')  
plt.xlabel('Stages')  
plt.legend(loc = 'best')  
plt.show()

4 pandas中的繪圖函數

在pandas中，咱們具備行標籤，列標籤以及分組信息。這也就是說，要製做一張完整的圖表，本來須要一大堆的matplotlib代碼，如今只需一兩條簡潔的語句就能夠了。

pandas有不少可以利用DataFrame對象數據組織特色來建立標準圖表的高級繪圖方法。

4.1 線型圖

from numpy.random import randn  
fig, axes = plt.subplots(1,2)  
s = pd.Series(randn(10).cumsum(),index = np.arange(0,100,10))  
s.plot(ax = axes[0])   # ax參數選擇子圖  
  
df = pd.DataFrame(randn(10,3).cumsum(0),columns = ['A','B','C'],index = np.arange(0,100,10))  
df.plot(ax = axes[1])      
plt.show()

4.2 柱狀圖

from numpy.random import rand  
fig, axes = plt.subplots(1,2)  
data = pd.Series(rand(16),index = list('abcdefghijklmnop'))  
data.plot(kind = 'bar', ax = axes[0], color = 'b', alpha = 0.7)    #kind選擇圖表類型  'bar' 垂直柱狀圖  
data.plot(kind = 'barh', ax = axes[1], color = 'b', alpha = 0.7)   # 'barh' 水平柱狀圖  
plt.show()

from numpy.random import rand  
fig, axes = plt.subplots(1,2)  
data = pd.DataFrame(rand(6,4),  
                    index = ['one','two','three','four','five','six'],  
                    columns = pd.Index(['A','B','C','D'], name = 'Genus'))  
data.plot(kind = 'bar', ax = axes[0], alpha = 0.5)  
data.plot(kind = 'bar', ax = axes[1], stacked = True, alpha = 0.5)  
plt.show()

此外，柱狀圖有一個很是不錯的用法，利用value_counts( )圖形化顯示Series中各值的出現機率，好比s.value_counts( ).plot(kind = 'bar')。dom

4.3 直方圖和密度圖

from numpy.random import randn  
fig, axes = plt.subplots(1,2)  
data = pd.Series(randn(100))  
data.hist(ax = axes[0], bins = 50)       #直方圖  
data.plot(kind = 'kde', ax = axes[1])    #密度圖  
plt.show()

其實能夠一次性製做多個直方圖，layout參數的意思是將兩個圖分紅兩行一列，若是沒有這個參數，默認會將所有的圖放在同一行。函數

df_iris = pd.read_csv('../input/iris.csv')  
columns = ['sepal length','sepal width','petal length','petal width']  
df_iris.hist(column=columns, layout=(2,2))  
plt.show()

4.4 箱型圖

箱型圖是基於五數歸納法（最小值，第一個四分位數，第一個四分位數（中位數），第三個四分位數，最大值）的數據的一個圖形彙總，還須要用到四分位數間距IQR = 第三個四分位數 - 第一個四分位數。

df_iris = pd.read_csv('../input/iris.csv')  #['sepal length','sepal width','petal length','petal width','class']  
sample_size = df_iris[['petal width','class']]  
sample_size.boxplot(by='class')  
plt.xticks(rotation=90)                     #將X軸的座標文字旋轉90度，垂直顯示  
plt.show()