Matplotlib 是一個 Python 的 2D繪圖庫,它以各類硬拷貝格式和跨平臺的交互式環境生成出版質量級別的圖形。前端
假設,咱們要將下面的文件繪製成折線圖:
因爲文件裏日期的格式不是標準格式,因此要先將日期轉換:python
import pandas as pd unrate = pd.read_csv('unrate.csv') unrate['DATE'] = pd.to_datetime(unrate['DATE']) print(unrate.head(12))
導入庫:數組
import matplotlib.pyplot as plt
先繪製一個簡單的折線圖:dom
first_twelve = unrate[0:12] plt.plot(first_twelve['DATE'], first_twelve['VALUE']) plt.show()
這個圖,簡直醜陋無比,並且只能表示一年的數據,通常繪製完折線圖以後要根據實際狀況進行美化:ide
fig = plt.figure(figsize=(10,6)) colors = ['red', 'blue', 'green', 'orange', 'black'] for i in range(5): start_index = i*12 end_index = (i+1)*12 subset = unrate[start_index:end_index] label = str(1948 + i) plt.plot(subset['MONTH'], subset['VALUE'], c=colors[i], label=label) plt.legend(loc='upper left') plt.xlabel('Month, Integer') plt.ylabel('Unemployment Rate, Percent') plt.title('Monthly Unemployment Trends, 1948-1952') plt.show()
這樣就能夠繪製出一個不錯的折線圖了。ui
有的時候須要將幾張圖在一塊兒對比,這就須要用到子圖操做了:
建立一個二維3行2列的子圖數組:編碼
import matplotlib.pyplot as plt fig = plt.figure() ax1 = fig.add_subplot(3,2,1) ax2 = fig.add_subplot(3,2,2) ax2 = fig.add_subplot(3,2,6) plt.show()
建立兩個子圖,並設置隨機值:spa
import numpy as np fig = plt.figure() #fig = plt.figure(figsize=(3, 3)) ax1 = fig.add_subplot(2,1,1) ax2 = fig.add_subplot(2,1,2) ax1.plot(np.random.randint(1,5,5), np.arange(5)) ax2.plot(np.arange(10)*3, np.arange(10)) plt.show()
假設有以下數據,要將其繪製成條形圖:
首先讀入數據,並將其中幾列存儲起來:code
import pandas as pd reviews = pd.read_csv('fandango_scores.csv') cols = ['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars'] norm_reviews = reviews[cols]
而後用bar建立條形圖:orm
import matplotlib.pyplot as plt from numpy import arange #The Axes.bar() method has 2 required parameters, left and height. #We use the left parameter to specify the x coordinates of the left sides of the bar. #We use the height parameter to specify the height of each bar num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars'] bar_heights = norm_reviews.ix[0, num_cols].values #print bar_heights bar_positions = arange(5) + 0.75 #print bar_positions fig, ax = plt.subplots() ax.bar(bar_positions, bar_heights, 0.5) plt.show()
同理,建立完簡單的條形圖以後要對其進行美化處理:
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars'] bar_heights = norm_reviews.ix[0, num_cols].values bar_positions = arange(5) + 0.75 tick_positions = range(1,6) fig, ax = plt.subplots() ax.bar(bar_positions, bar_heights, 0.5) ax.set_xticks(tick_positions) ax.set_xticklabels(num_cols, rotation=45) ax.set_xlabel('Rating Source') ax.set_ylabel('Average Rating') ax.set_title('Average User Rating For Avengers: Age of Ultron (2015)') plt.show()
固然,還有橫向條形圖:
import matplotlib.pyplot as plt from numpy import arange num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars'] bar_widths = norm_reviews.ix[0, num_cols].values bar_positions = arange(5) + 0.75 tick_positions = range(1,6) fig, ax = plt.subplots() ax.barh(bar_positions, bar_widths, 0.5) ax.set_yticks(tick_positions) ax.set_yticklabels(num_cols) ax.set_ylabel('Rating Source') ax.set_xlabel('Average Rating') ax.set_title('Average User Rating For Avengers: Age of Ultron (2015)') plt.show()
還能夠建立散點圖:
fig, ax = plt.subplots() ax.scatter(norm_reviews['Fandango_Ratingvalue'], norm_reviews['RT_user_norm']) ax.set_xlabel('Fandango') ax.set_ylabel('Rotten Tomatoes') plt.show()
兩張圖對比:
fig = plt.figure(figsize=(5,10)) ax1 = fig.add_subplot(2,1,1) ax2 = fig.add_subplot(2,1,2) ax1.scatter(norm_reviews['Fandango_Ratingvalue'], norm_reviews['RT_user_norm']) ax1.set_xlabel('Fandango') ax1.set_ylabel('Rotten Tomatoes') ax2.scatter(norm_reviews['RT_user_norm'], norm_reviews['Fandango_Ratingvalue']) ax2.set_xlabel('Rotten Tomatoes') ax2.set_ylabel('Fandango') plt.show()
在這裏遇到了一個錯誤:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 21: invalid start byte
我猜想是由於前邊的操做將源文件的編碼方式改變了,因此後來不能正常讀取,可是,我不會改過來啊,5555。
柱形圖與條形圖相似
import pandas as pd import matplotlib.pyplot as plt reviews = pd.read_csv('fandango_scores.csv') cols = ['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue'] norm_reviews = reviews[cols] fandango_distribution = norm_reviews['Fandango_Ratingvalue'].value_counts() fandango_distribution = fandango_distribution.sort_index() imdb_distribution = norm_reviews['IMDB_norm'].value_counts() imdb_distribution = imdb_distribution.sort_index() fig, ax = plt.subplots() ax.hist(norm_reviews['Fandango_Ratingvalue']) #ax.hist(norm_reviews['Fandango_Ratingvalue'],bins=20) #ax.hist(norm_reviews['Fandango_Ratingvalue'], range=(4, 5),bins=20) plt.show()
將四幅圖在一塊兒對比:
fig = plt.figure(figsize=(5,20)) ax1 = fig.add_subplot(4,1,1) ax2 = fig.add_subplot(4,1,2) ax3 = fig.add_subplot(4,1,3) ax4 = fig.add_subplot(4,1,4) ax1.hist(norm_reviews['Fandango_Ratingvalue'], bins=20, range=(0, 5)) ax1.set_title('Distribution of Fandango Ratings') ax1.set_ylim(0, 50) ax2.hist(norm_reviews['RT_user_norm'], 20, range=(0, 5)) ax2.set_title('Distribution of Rotten Tomatoes Ratings') ax2.set_ylim(0, 50) ax3.hist(norm_reviews['Metacritic_user_nom'], 20, range=(0, 5)) ax3.set_title('Distribution of Metacritic Ratings') ax3.set_ylim(0, 50) ax4.hist(norm_reviews['IMDB_norm'], 20, range=(0, 5)) ax4.set_title('Distribution of IMDB Ratings') ax4.set_ylim(0, 50) plt.show()
還有一種盒圖,它顯示的是數據的中間值、四分之一值、四分之三值:
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue'] fig, ax = plt.subplots() ax.boxplot(norm_reviews[num_cols].values) ax.set_xticklabels(num_cols, rotation=90) ax.set_ylim(0,5) plt.show()
用Matplotlib畫圖其實很簡單,只要用相應的方法加上數據就能夠,必要的美化若是不是前端不須那麼在乎。