Python的數據可視化

本文參考書籍：《Python編程：從入門到實踐》

html

生成數據

matplotlib:數學繪圖庫
Pygal包：專一於生成適合在數字設備上顯示的圖表
python

繪製簡單的折線圖

import matplotlib.pyplot as plt	#導入模塊pyplot，並給它指定別名爲plt
squares = [1, 4, 9, 16, 25]
#默認第一個數據點對應的x爲0，以此類推
plt.plot(squares)	#(0,1) (1,4) (2,9) (3,16) (4,25)
plt.show()

修改標籤文字和線條粗細

import matplotlib.pyplot as plt
squares = [1, 4, 9, 16, 25]
#參數linewidth決定了plot()繪製的線條的粗細
plt.plot(squares, linewidth=5)	
#設置圖表標題，並給座標軸加上標籤
#函數title()給圖表指定標題
#參數fontsize指定了圖表中文字的大小
plt.title("Square Numbers", fontsize=24)
#函數xlabel(),ylabel()爲每條軸設置標題
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
#設置刻度標記的大小
#函數tick_params()設置刻度樣式，
#其中指定的實參將影響x、y軸上的刻度(axis='both')
#並將刻度標記的字號設置爲14(labelsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.show()

給plot()同時提供輸入和輸出值

import matplotlib.pyplot as plt
input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
plt.plot(input_values, squares, linewidth=5)
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.show()

使用scatter()繪製散點圖並設置樣式

import matplotlib.pyplot as plt
#向函數scatter()傳遞一對x和y座標，
#它將在指定位置繪製一個點
plt.scatter(2, 4)
plt.show()

#設置格式
import matplotlib.pyplot as plt
#實參s設置繪製圖形時使用的點的尺寸
plt.scatter(2, 4, s=200)
plt.title("Square Numbers",fontsize=24)
plt.xlabel("Value",fontsize=14)
plt.ylabel("Square of Value",fontsize=14)
plt.tick_params(axis='both',which='major',labelsize=14)
plt.show()

使用scatter()繪製一系列點

import matplotlib.pyplot as plt
#向scatter()傳遞兩個分別包含x和y值的列表
x_values = [1, 2, 3, 4, 5]
y_values = [1, 4, 9, 16, 25]
plt.scatter(x_values, y_values, s=100)
plt.title("Square Numbers",fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
plt.tick_params(axis='both', which='major', labelsize=14)
plt.show()

自動計算數據，繪製1000個散點

import matplotlib.pyplot as plt
x_values = list(range(1, 1001))
y_values = [x**2 for x in x_values]
plt.scatter(x_values, y_values, s=40)
plt.title("Square Numbers",fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
# 設置每一個座標軸的取值範圍
# 函數axis()要求提供x和y軸的最小值和最大值
plt.axis([0, 1100, 0, 1100000])
plt.tick_params(axis='both', which='major', labelsize=14)
plt.show()

刪除數據點的輪廓

matplotlib容許給散點圖中的各個點指定顏色。默認爲藍色點和黑色輪廓。git

# 在調用scatter()時傳遞實參edgecolor='none',刪除數據點輪廓
plt.scatter(x_values,y_values,edgecolor='none',s=40)

自定義顏色

要修改數據點的顏色，可向scatter()傳遞參數c，並將其設置爲要使用的顏色的名稱github

plt.scatter(x_values, y_values, c='red', s=100)

還可以使用RGB顏色模式自定義顏色。要指定自定義顏色，可傳遞參數c，並將其設置爲一個元組，其中包含三個0~1之間的小數值，分別表明紅色、綠色、藍色份量。值越接近於0，顏色越深，越接近1顏色越淺django

plt.scatter(x_values, y_values, c=(1, 0.1, 1), s=100)

使用顏色映射

顏色映射是一系列顏色，它們從起始顏色漸變到結束顏色。在可視化中，顏色映射用於突出數據的規律，如可用較淺顏色顯示較小的值，用較深顏色顯示較大的值
模塊pyplot內置了一組顏色映射。若需使用顏色映射，需告知pyplot該如何設置數據集中每一個點的顏色
編程

# 將參數c設置爲一個y值列表，並使用參數cmap告訴pyplot使用哪一個顏色映射
#如下將y值較小的點設置爲淺藍色，y值較大的點設置爲深藍色
plt.scatter(x_values, y_values, c=y_values, cmap=plt.cm.Blues, s=100)

自動保存圖表

要讓程序自動將圖表保存到文件中，可進行plt.savefig()調用json

#scatter_squares.py文件
#第一個實參指定要以什麼樣的文件名保存圖表
#此文件將保存在scatter_squares.py所在的目錄中
#第二個實參指定將圖表多餘的空白區域裁剪掉
plt.savefig('squares_plot.png',bbox_inches='tight')

隨機漫步

隨機漫步是這樣行走獲得的路徑：每次行走都徹底是隨機的，沒有明確的方向，結果是由一系列隨機決策決定的flask

建立RandomWalk()類

RandomWalk類隨機地選擇前進方向，這個類需三個屬性，其一爲存儲隨機漫步次數的變量，其餘兩個爲列表，分別存儲隨機漫步通過的每一個點的x和y座標api

from random import choice
#在每次決策時使用choice()來決定使用哪一種選擇
class RandomWalk()：
	def __init__(self,num_points=5000):
		#初始化隨機漫步的屬性
		self.num_points=num_points
		#全部隨機漫步都始於（0,0）
		self.x_values=[0]
		self.y_values=[0]

選擇方向

在RandomWalk類中添加fill_walk()來生成漫步包含的點，並決定每次漫步的方向服務器

def fill_walk(self):
	#計算隨機漫步包含的全部點
	#不斷漫步，直到列表達到指定的長度
	while len(self.x_values)<self.num_points:
		#決定前進方向以及沿這個方向前進的距離
		x_direction=choice([1,-1])
		#給x_direction選擇一個值
		#表示向右走1或向左走1
		x_distance=choice([0,1,2,3,4])
		#隨機選擇一個0~4之間的整數
		#告知沿指定的方向走多遠
		x_step=x_direction*x_distance
		#將移動方向乘以移動距離
		#肯定沿x軸移動的距離
		y_direction=choice([1,-1])
		y_distance=choice([0,1,2,3,4])
		y_step=y_direction*y_distance
		#拒絕原地踏步狀況
		if x_step==0 and y_step==0:
			continue
		#計算下一個點的x和y值
		#將x_step和x_values中最後一個值相加
		next_x=self.x_values[-1]+x_step
		next_y=self.y_values[-1]+y_step
		#得到下一個點的x、y後，附加到列表末尾
		self.x_values.append(next_x)
		self.y_values.append(next_y)

繪製隨機漫步圖

import matplotlib.pyplot as plt
from random_walk import RandomWalk
#建立一個RandomWalk實例，並將其包含的點都繪製出來
rw = RandomWalk()
rw.fill_walk()
plt.scatter(rw.x_values, rw.y_values, s=15)
plt.show()

模擬屢次隨機漫步

import matplotlib.pyplot as plt
from random_walk import RandomWalk
#只要程序處於活動狀態，就不斷進行模擬隨機漫步
while True:
    rw = RandomWalk()
    rw.fill_walk()
    plt.scatter(rw.x_values, rw.y_values, s=15)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

設置隨機漫步圖的樣式

給點着色

使用顏色映射指出漫步中各點的前後順序，並刪除每一個點的黑色輪廓
爲根據漫步中各點的前後順序進行着色，傳遞參數c，並將其設置爲一個列表，其中包含各點的前後順序。因爲這些點是按順序繪製的，給參數c指定的列表只需包含數字1~5000

import matplotlib.pyplot as plt
from random_walk import RandomWalk
while True:
    rw = RandomWalk()
    rw.fill_walk()
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none', s=15)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

從新繪製起點和終點

讓起點和終點變得更大，並顯示爲不一樣的顏色

import matplotlib.pyplot as plt
from random_walk import RandomWalk
while True:
    rw = RandomWalk()
    rw.fill_walk()
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none', s=15)
    plt.scatter(0, 0, c='green', edgecolors='none', s=100)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=100)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

隱藏座標軸

使用函數plt.axes()將每條座標軸的可見性設爲False

import matplotlib.pyplot as plt
from random_walk import RandomWalk
while True:
    rw = RandomWalk()
    rw.fill_walk()
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none', s=15)
    plt.scatter(0, 0, c='green', edgecolors='none', s=100)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=100)
    #隱藏座標軸
    plt.axes().get_xaxis().set_visible(False)
    plt.axes().get_yaxis().set_visible(False)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

增長點數

在建立RandomWalk實例時增長new_points的值，並調整每一個點的大小

import matplotlib.pyplot as plt
from random_walk import RandomWalk
while True:
    rw = RandomWalk(50000)
    rw.fill_walk()
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none', s=1)
    plt.scatter(0, 0, c='green', edgecolors='none', s=100)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=100)
    plt.axes().get_xaxis().set_visible(False)
    plt.axes().get_yaxis().set_visible(False)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

調整尺寸以適合屏幕

函數figure()用於指定圖表的寬度、高度、分辨率和背景色。需給形參figsize指定一個元組，向matplotlib指出繪圖窗口的尺寸，單位爲英寸

import matplotlib.pyplot as plt
from random_walk import RandomWalk
while True:
    rw = RandomWalk(50000)
    rw.fill_walk()
    # 設置繪圖窗口的尺寸
    plt.figure(figsize=(10, 6))
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none', s=1)
    plt.scatter(0, 0, c='green', edgecolors='none', s=100)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=100)
    plt.axes().get_xaxis().set_visible(False)
    plt.axes().get_yaxis().set_visible(False)
    plt.show()

    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

還可用形參dpi向figure()傳遞分辨率

plt.figure(dpi=128, figsize=(10, 6))

使用Pygal模擬擲骰子

建立Die類

下面的類模擬擲一個骰子

from random import randint

class Die():
    
    def __init__(self, num_sides=6):
        self.num_sides = num_sides
    
    def roll(self):
    	# 返回一個位於1和骰子面數之間的隨機值
        return randint(1, self.num_sides)

方法__init__()接受一個可選參數，建立這個類的實例時，若沒有指定任何實參，面數默認爲6；若指定了實參，這個值將用於設置骰子的面數。

擲骰子

from die import Die
die = Die()
# 擲幾回骰子，並將結果存儲在一個列表中
results = []
for roll_num in range(100):
    result = die.roll()
    results.append(result)
    
print(results)

分析結果

計算每一個點數出現的次數

from die import Die
die = Die()
results = []
for roll_num in range(1000):
    result = die.roll()
    results.append(result)

frequencies = []
for value in range(1,die.num_sides+1):
    frequency = results.count(value)
    frequencies.append(frequency)

print(frequencies)

繪製直方圖

import pygal
from die import Die
die = Die()
results = []
for roll_num in range(1000):
    result = die.roll()
    results.append(result)

frequencies = []
for value in range(1, die.num_sides+1):
    frequency = results.count(value)
    frequencies.append(frequency)
# 建立條形圖，建立pygal.Bar()實例
hist = pygal.Bar()

hist.title = "Results of rolling one D6 1000 times."
hist.x_labels = ['1', '2', '3', '4', '5', '6']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

# 使用add()將一系列值添加到圖表中（向它傳遞要給添加的值指定的標籤，還有
# 一個列表，其中包含將出如今圖表中的值）
hist.add('D6', frequencies)
# 將直方圖渲染爲SVG文件
hist.render_to_file('die_visual.svg')

同時擲兩個骰子

import pygal
from die import Die
die_1 = Die()
die_2 = Die()
results = []
for roll_num in range(1000):
    result = die_1.roll() + die_2.roll()
    results.append(result)

frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)

hist = pygal.Bar()

hist.title = "Results of rolling two D6 1000 times."
hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

hist.add('D6 + D6', frequencies)
hist.render_to_file('dice_visual.svg')

同時擲兩個面數不一樣的骰子

import pygal
from die import Die
die_1 = Die()
die_2 = Die(10)
results = []
for roll_num in range(50000):
    result = die_1.roll() + die_2.roll()
    results.append(result)

frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)

hist = pygal.Bar()

hist.title = "Results of rolling a D6 and a D10 50,000 times."
hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

hist.add('D6 + D10', frequencies)
hist.render_to_file('dice_visual.svg')

下載數據

CSV文件格式

要在文本文件中存儲數據，可將數據做爲一系列以逗號分隔的值（CSV）寫入文件，這樣的文件稱CSV文件
如：

2014-1-5,61,44,26,18,7,-1,56,30,9,30.34,30.27,30.15,,,,10,4,,0.00,0,,195

分析CSV文件頭

csv模塊包含在Python標準庫中，可用於分析CSV文件中的數據行

import csv

# 將要使用的文件名稱存儲在filename中
filename = 'sitka_weather_07-2014.csv'
# 打開這個文件，並將結果文件對象存儲在f中
with open(filename) as f:
# 調用csv.reader()，並將前面存儲的文件對象做爲實參傳遞
# 給它，從而建立一個與該文件相關聯的閱讀器（reader）對# 象。將這個閱讀器對象存儲在reader中
    reader = csv.reader(f)
# 模塊csv的reader類包含next()方法，調用內置函數next()
# 並將一個reader做爲參數傳遞給它時，將調用reader的
# next()方法，從而返回文件的下一行。此處只調用了next()
# 一次，所以獲得的是文件的第一行，其中包含文件頭。將
# 返回的數據存儲在header_row中
# reader處理文件中以逗號分隔的第一行數據，並將每項數據
# 都做爲一個元素存儲在列表中
    header_row = next(reader)
    print(header_row)

打印文件頭及其位置

將列表中的每一個文件頭及其位置打印出來

import csv

filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

# 對列表調用了enumerate()來獲取每一個元素的索引及其值
    for index, column_header in enumerate(header_row):
        print(index, column_header)

提取並讀取數據

import csv

# 從文件中獲取最高氣溫
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    highs = []
    for row in reader:
# 閱讀器對象從其停留的地方繼續向下讀取CSV文件，每次都
# 自動返回當前所處位置的下一行
        highs.append(row[1])

    print(highs)

使用int()將這些字符串轉換爲數字，讓matplotlib能讀取它們：

for row in reader:
        high = int(row[1])
        highs.append(high)

繪製氣溫圖表

使用matplotlib建立一個顯示每日最高氣溫的簡單圖形：

import csv

from matplotlib import pyplot as plt
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    highs = []
    for row in reader:
        high = int(row[1])
        highs.append(high)

    fig = plt.figure(dpi=128, figsize=(10, 6))
    # 將最高氣溫列表傳給plot()
    plt.plot(highs, c='red')
    plt.title("Daily high temperatures, July 2014",fontsize=24)
    plt.xlabel('', fontsize=16)
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)
    
    plt.show()

模塊datetime

在圖表中添加日期
應將日期字符串轉換爲一個表示相應日期的對象，可以使用模塊datetime中的方法strptime()

導入m-模塊datetime中的datetime類，而後調用方法strptime()，並將包含所需日期的字符串做爲第一個實參。第二個實參告知如何設置日期的格式。
在此示例中，’%Y-‘將字符串中第一個連字符前面的部分視爲四位的年份；’%m-‘將第二個連字符前面的部分視爲表示月份的數字；’%d-'將字符串的最後一部分視爲月份中的一天

模塊datetime中設置日期和時間格式的實參：

%A			# 星期的名稱，如Monday
%B			# 月份名，如January
%m			# 用數字表示的月份（01~12）
%d			# 用數字表示月份中的一天（01~31）
%Y			# 四位的年份，如2015
%y			# 兩位的年份，如15
%H			# 24小時制的小時數（00~23）
%I			# 12小時制的小時數（01~12）
%p			# am或pm
%M			# 分鐘數（00~59）
%S			# 秒數（00~61）

在圖表中添加日期

import csv
from datetime import datetime
from matplotlib import pyplot as plt
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs = [], []
    for row in reader:
        current_date = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_date)
        high = int(row[1])
        highs.append(high)

    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')
    plt.title("Daily high temperatures, July 2014",fontsize=24)
    plt.xlabel('', fontsize=16)
    #調用斜的日期標籤，以避免彼此重疊
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

涵蓋更長的時間

import csv
from datetime import datetime
from matplotlib import pyplot as plt
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs = [], []
    for row in reader:
        current_date = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_date)
        high = int(row[1])
        highs.append(high)

    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')
    xlim1 = datetime.strptime("2014-1", "%Y-%m")
    xlim2 = datetime.strptime("2014-12", "%Y-%m")
    plt.xlim([xlim1, xlim2])
    plt.title("Daily high temperatures - 2014", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

再繪製一個數據系列

添加最低溫數據

import csv
from datetime import datetime
from matplotlib import pyplot as plt
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []
    for row in reader:
        current_date = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_date)
        high = int(row[1])
        highs.append(high)
        low = int(row[3])
        lows.append(low)

    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')
    plt.plot(dates, lows, c='blue')
    xlim1 = datetime.strptime("2014-1", "%Y-%m")
    xlim2 = datetime.strptime("2014-12", "%Y-%m")
    plt.xlim([xlim1, xlim2])
    plt.ylim([10, 80])
    plt.title("Daily high and low temperatures - 2014", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

給圖表區域着色

經過着色呈現天天的氣溫範圍
使用方法fill_between()，它接受一個x值系列和兩個y值系列，並填充兩個y值系列之間

import csv
from datetime import datetime
from matplotlib import pyplot as plt
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []
    for row in reader:
        current_date = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_date)
        high = int(row[1])
        highs.append(high)
        low = int(row[3])
        lows.append(low)

    fig = plt.figure(dpi=128, figsize=(10, 6))
    # alpha指定顏色的透明度，alpha=0表示徹底透明，1（默認設置）表示徹底
    # 不透明 
    plt.plot(dates, highs, c='red', alpha=0.5)
    plt.plot(dates, lows, c='blue', alpha=0.5)
    #實參facecolor指定了填充區域的顏色
    plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
    xlim1 = datetime.strptime("2014-1", "%Y-%m")
    xlim2 = datetime.strptime("2014-12", "%Y-%m")
    plt.xlim([xlim1, xlim2])
    plt.ylim([10, 80])
    plt.title("Daily high and low temperatures - 2014", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

錯誤檢查

在從CSV文件中讀取值時執行錯誤檢查代碼，對分析數據集時可能出現的異常進行處理

import csv
from datetime import datetime
from matplotlib import pyplot as plt
filename = 'death_valley_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        # 打印一條錯誤信息，指出缺失數據的日期。打印錯誤信息後，循環將
        # 接着處理下一行
        except ValueError:
            print(current_date, 'missing data')
        # 若獲取特定日期的全部數據時沒有發生錯誤，將運行else代碼塊
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)

    fig = plt.figure(dpi=128, figsize=(11, 7))
    plt.plot(dates, highs, c='red', alpha=0.5)
    plt.plot(dates, lows, c='blue', alpha=0.5)
    plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
    xlim1 = datetime.strptime("2014-1", "%Y-%m")
    xlim2 = datetime.strptime("2014-12", "%Y-%m")
    plt.xlim([xlim1, xlim2])
    plt.ylim([20, 120])
    plt.title("Daily high and low temperatures - 2014\nDeath Valley,CA", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

運行後顯示：

製做交易收盤價走勢圖：JSON格式

下載收盤價數據

1.使用函數urlopen來下載數據

from __future__ import (absolute_import, division, print_function, unicode_literals)
from urllib.request import urlopen
import json
json_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'
# urlopen(json_url)是將json_url網址傳入urlopen函數
response = urlopen(json_url)
# 讀取數據
req = response.read()
# 將數據寫入文件
with open('btc_close_2017_urllib.json','wb') as f:
    f.write(req)
# 加載json格式
# 用函數json.load()將文件內容轉換成Python能處理的格式
file_urllib = json.loads(req)
print(file_urllib)

2.第三方模塊requests可更簡單的下載數據

import requests

json_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'
# requests經過get方法向GitHub服務器發送請求
# GitHub服務器響應請求後，返回的結果存儲在req變量中
req = requests.get(json_url)
# 將數據寫入文件
with open('btc_close_2017_request.json','w') as f:
# req.text屬性能夠直接讀取文件數據，返回格式是字符串
    f.write(req.text)
# 直接用req.json()可將btc_close_2017.json文件的數據轉換爲
# Python列表file_requests
file_requests = req.json()

提取相關的數據

import json
# 將數據加載到一個列表中
filename = 'btc_close_2017_request.json'
with open(filename) as f:
# 將數據存儲在btc_data中

    btc_data = json.load(f)
    
# 遍歷btc_data中的每一個元素，每一個元素都是一個字典，包含5個鍵值對
# btc_dict用於存儲字典中的每一個鍵值對，以後取出全部鍵的值

# 打印每一天的信息
for btc_dict in btc_data:
    date = btc_dict['date']
    month = btc_dict['month']
    week = btc_dict['week']
    weekday = btc_dict['weekday']
    close = btc_dict['close']
    print("{} is month {} week {},{},the close price is {} RMB".format(date, month, week, weekday, close))

將字符串轉換爲數字值

Python不能直接將包含小數點的字符串轉換爲整數，需先將字符串轉換爲浮點數，再將浮點數轉換爲整數（截尾取整）

import json

filename = 'btc_close_2017_request.json'
with open(filename) as f:
    btc_data = json.load(f)
for btc_dict in btc_data:
    date = btc_dict['date']
    month = int(btc_dict['month'])
    week = int(btc_dict['week'])
    weekday = btc_dict['weekday']
    close = int(float(btc_dict['close']))
    print("{} is month {} week {},{},the close price is {} RMB".format(date, month, week, weekday, close))

繪製收盤價折線圖

使用Pygal繪製折線圖

import json
import pygal
filename = 'btc_close_2017_request.json'
with open(filename) as f:
    btc_data = json.load(f)
# 建立5個列表，分別存儲日期和收盤價
dates = []
months = []
weeks = []
weekdays = []
close = []
# 每一天的信息
for btc_dict in btc_data:
    dates.append(btc_dict['date'])
    months.append(int(btc_dict['month']))
    weeks.append(int(btc_dict['week']))
    weekdays.append(btc_dict['weekday'])
    close.append(int(float(btc_dict['close'])))

# 在建立Line實例時，分別設置了x_label_rotation與
# show_minor_x_labels做爲初始化參數
# x_label_rotation=20讓x軸上的日期標籤順時針旋轉20度
# show_minor_x_labels=False告知圖形不用顯示全部的x標
# 籤
line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = '收盤價(￥)'
line_chart.x_labels = dates
N = 20
# 配置x_labels_major屬性，讓x軸座標每隔20天顯示一次
line_chart.x_labels_major = dates[::N]
line_chart.add('收盤價', close)
line_chart.render_to_file('收盤價折線圖(￥).svg')

時間序列特徵初探

利用Pyhton標準庫的數學模塊math中的半對數變換

import json
import pygal
import math
filename = 'btc_close_2017_request.json'
with open(filename) as f:
    btc_data = json.load(f)
dates = []
months = []
weeks = []
weekdays = []
close = []
for btc_dict in btc_data:
    dates.append(btc_dict['date'])
    months.append(int(btc_dict['month']))
    weeks.append(int(btc_dict['week']))
    weekdays.append(btc_dict['weekday'])
    close.append(int(float(btc_dict['close'])))

line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = '收盤價對數變換(￥)'
line_chart.x_labels = dates
N = 20
line_chart.x_labels_major = dates[::N]
close_log = [math.log10(_) for _ in close]
line_chart.add('收盤價', close_log)
line_chart.render_to_file('收盤價對數變換折線圖(￥).svg')

收盤價均值

繪製一段時間內的日均值
因爲需將數據按月份、週數、周幾分組，再計算每組的均值，所以導入模塊
itertools的函數groupby

from itertools import groupby


def draw_line(x_date, y_data, title, y_legend):
    xy_map = []
    # 將x與y軸的數據合併、排序，再用函數groupby分組
    for x, y in groupby(sorted(zip(x_date,y_data)), key=lambda _: _[0]):
        y_list = [v for _, v in y]
        # 求出每組的均值，存儲到xy_map變量中
        xy_map.append([x,sum(y_list) / len(y_list)])
    # 將xy_map中存儲的x與y軸數據分離
    x_unique, y_mean = [*zip(*xy_map)]
    line_chart = pygal.Line()
    line_chart.title = title
    line_chart.x_labels = x_unique
    line_chart.add(y_legend, y_mean)
    line_chart.render_to_file(title+'.svg')
    return line_chart

繪製月日均值：

idx_month = dates.index('2017-12-01')
line_chart_month = draw_line(months[:idx_month], close[:idx_month], '收盤價月日均值（￥）', '月日均值')
line_chart_month

繪製週日均值：

idx_week = dates.index('2017-12-11')
line_chart_week = draw_line(weeks[1:idx_week], close[1:idx_week], '收盤價週日均值（￥）', '週日均值')
line_chart_week

繪製每週中各天的均值：

idx_week = dates.index('2017-12-11')
wd = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
# 將weekdays的內容替換爲1~7的整數
weekdays_int = [wd.index(w) + 1 for w in weekdays[1:idx_week]]
line_chart_weekday = draw_line(weekdays_int, close[1:idx_week], '收盤價星期均值（￥）', '星期均值')
line_chart_weekday.x_labels = ['週一', '週二', '週三', '週四', '週五', '週六', '週日']
line_chart_weekday.render_to_file('收盤價星期均值（￥）.svg')

收盤價數據儀表盤

將前面繪製的圖整合在一塊兒

import json
with open('收盤價Dashboard.html', 'w', encoding='utf8') as html_file:
    html_file.write('<html><head><title>收盤價Dashboard</title><meta charset="utf-8"></head><body>\n')
    for svg in [
            '收盤價折線圖（￥）.svg', '收盤價對數變換折線圖（￥）.svg', '收盤價月日均值（￥）.svg', '收盤價週日均值（￥）.svg','收盤價星期均值（￥）.svg'
    ]:
        html_file.write('<object type="image/svg+xml" data="{0}" height=500></object>\n'.format(svg))
    html_file.write('</body></html>')

使用API

使用Web應用編程接口（API）自動請求網站的特定信息而不是整個網頁，再對這些信息進行可視化

使用Web API

Web API是網站的一部分，用於與使用很是具體的URL請求特定信息的程序交互。這種稱爲API調用。請求的數據將以易於處理的格式（json或csv）返回

使用API調用請求數據

https://api.github.com/search/repositories?q=language:python&sort=stars

# 此調用返回Github當前託管了多少個Python項目，
# 還有有關最受歡迎的Python倉庫的信息

# 第一部分（https://api.github.com/）將請求發送到Github網站上響應API
# 調用的部分
# 接下來的一部分（search/repositories）讓API搜索Github上的全部倉庫
# repositories後的問號指出咱們要傳遞一個實參。q表示查詢，等號讓咱們能
# 開始指定查詢。經過使用language:python，指出只想獲取主要語言爲
# Python的倉庫的信息，最後一部分（&sort=stars）指定將項目按星級排序

requests

requests包能向網站請求信息以及檢查返回的響應

處理API響應

import requests

# 執行API調用並存儲響應
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
# 響應對象包含一個名爲status_code的屬性，它讓咱們知道請求是否成功
# （狀態碼200表示請求成功）
print("Status code:", r.status_code)

# 將API響應存儲在一個變量中
# 方法json()將信息轉換爲一個Python字典
response_dict = r.json()
# 打印字典中的鍵
print(response_dict.keys())

處理響應字典

import requests

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

# 與'items'相關聯的值是一個列表，其中包含不少字典，而每一個字典
# 都包含有關一個Python倉庫的信息，將此字典列表存儲在repo_dicts中
repo_dicts = response_dict['items']
print("Repositories returned:", len(repo_dicts))

repo_dict = repo_dicts[0]
print("\nKeys:", len(repo_dict))
for key in sorted(repo_dict.keys()):
    print(key)

提取repo_dict中與一些鍵相關聯的值：

import requests

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

repo_dicts = response_dict['items']
print("Repositories returned:", len(repo_dicts))

repo_dict = repo_dicts[0]

print("\nSelected information about first repository:")
print('Name:', repo_dict['name'])
print('Owner:', repo_dict['owner']['login'])
print('Stars:', repo_dict['stargazers_count'])
print('Repository:', repo_dict['html_url'])
print('Created:', repo_dict['created_at'])
print('Updated:', repo_dict['updated_at'])
print('Description:', repo_dict['description'])

打印API調用返回的信息：概述最受歡迎的倉庫

打印API調用返回的每一個倉庫的特定信息

import requests

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

repo_dicts = response_dict['items']
print("Repositories returned:", len(repo_dicts))

print("\nSelected information about each repository:")
for repo_dict in repo_dicts:
    print('\nName:', repo_dict['name'])
    print('Owner:', repo_dict['owner']['login'])
    print('Stars:', repo_dict['stargazers_count'])
    print('Repository:', repo_dict['html_url'])
    print('Description:', repo_dict['description'])

使用Pygal可視化倉庫信息

繪製交互式條形圖，呈現Github上Python項目的受歡迎程度。條形的高度表示項目得到的star數，單擊條形將進入項目在Github上的主頁

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

repo_dicts = response_dict['items']

# 建立兩個空列表，用於存儲圖表中的信息
names, stars=[], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

# 使用LightenStyle類定義了一種樣式，並將其基色設爲深藍色
# 傳遞了實參base_style，以使用LightColorizedStyle類
my_style = LS('#333366', base_style=LCS)
# 使用Bar()建立條形圖，並向它傳遞了my_style
# 還傳遞了兩個樣式實參：標籤繞x軸旋轉45度，並隱藏圖例
chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)
chart.title = 'Most-Starred Python Projects on Github'
chart.x_labels = names

# 因爲不需添加標籤，在添加數據時，將標籤設爲空字符
chart.add('', stars)
chart.render_to_file('python_repos.svg')

改進Pygal圖表

建立一個配置對象，在其中包含要傳遞給Bar()的全部定製

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

repo_dicts = response_dict['items']

names, stars=[], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])


my_style = LS('#333366', base_style=LCS)
# 建立一個Pygal類Config的實例my_config，經過修改它的屬性，可定製圖表的外觀
my_config = pygal.Config()
my_config.x_label_rotation = 45
my_config.show_legend = False
# 設置圖表標題、副標題和主標籤的字體大小
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
# 使用truncate_label將較長的項目名縮短爲15個字符
# （將鼠標指向被截短的項目名，將顯示完整的項目名）
my_config.truncate_label = 15
# 隱藏圖表中的水平線
my_config.show_y_guides = False
# 自定義寬度
my_config.width = 1000

# 將my_config做爲第一個實參，傳遞全部配置設置
chart = pygal.Bar(my_config, style=my_style)
chart.title = 'Most-Starred Python Projects on Github'
chart.x_labels = names

chart.add('', stars)
chart.render_to_file('python_repos.svg')

添加自定義工具提示

在Pygal中，將鼠標移向條形將顯示它表示的信息，此一般稱爲工具提示
建立一個自定義工具提示，以同時顯示項目的描述：

import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

my_style = LS('#333366', base_style=LCS)
chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)

chart.title = 'Python Projects'
chart.x_labels = ['httpie', 'django', 'flask']

# 定義一個名爲plot_dicts的列表，其中包含3個字典，每一個字典包含2個鍵
# Pygal根據與'label'相關聯的字符串給條形建立工具提示
plot_dicts = [
    {'value': 46762, 'label': 'Description of httpie.'},
    {'value': 49445, 'label': 'Description of django.'},
    {'value': 50420, 'label': 'Description of flask.'},
    ]

# add()方法接受一個字符串和一個列表
chart.add('', plot_dicts)
chart.render_to_file('bar_descriptions.svg')

根據數據繪圖

自動生成plot_dicts:

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

repo_dicts = response_dict['items']

names, plot_dicts=[], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])

    plot_dict = {
        'value': repo_dict['stargazers_count'],
        'label': repo_dict['description']
    }
    plot_dicts.append(plot_dict)

my_style = LS('#333366', base_style=LCS)
my_config = pygal.Config()
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config.show_y_guides = False
my_config.width = 1000

chart = pygal.Bar(my_config, style=my_style)

chart.title = 'Most-Starred Python Projects on Github'
chart.x_labels = names

chart.add('', plot_dicts)
chart.render_to_file('python_repos.svg')

在圖表中添加可單擊的連接

Pygal容許將圖表中的每一個條形用做網站的連接
在爲每一個項目建立的字典中，添加一個鍵爲’xlink’的鍵-值對

plot_dict = {
        'value': repo_dict['stargazers_count'],
        'label': repo_dict['description'],
        'xlink': repo_dict['html_url']
    }

Hacker News網站的 API調用

使用Hacker News網站的API調用
返回最熱門的文章信息：

https://hacker-news.firebaseio.com/v0/item/9884165.json

返回Hacker News上當前熱門文章的ID，再查看每篇排名靠前的文章：

import requests

from operator import itemgetter

# 執行API調用並存儲響應
# 此API調用返回hacker-news上最熱門的500篇文章的id
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print("Status code:", r.status_code)

# 將響應文本轉換爲python列表
submission_ids = r.json()
# submission_dicts用於存儲字典
submission_dicts = []
# 遍歷id，對於每篇文章都執行一次API調用
for submission_id in submission_ids[:30]:
	# url包含submission_id的當前值
    url = ('https://hacker-news.firebaseio.com/v0/item' + str(submission_id) + '.json')
    submission_r = requests.get(url)
    print(submission_r.status_code)
    response_dict = submission_r.json()
    
    # 爲當前處理的文章建立字典
    # 存儲文章標題、頁面連接、評論數
    submission_dict = {
        'title': response_dict['title'],
        'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
        # 不肯定某個鍵是否包含在字典中時，可以使用方法dict.get()
        # 它在指定的鍵存在時返回與其關聯的值，不存在時返回指定的值（此爲0）
        'comments': response_dict.get('descendants', 0)
        }
    submission_dicts.append(submission_dict)
    
# 根據評論數對字典列表排序，使用了模塊operator中的函數itemgetter()
# 向此函數傳遞了鍵'comments'，它將從這個列表中的沒個字典中提取與該鍵
# 關聯的值，sorted()將根據這種值對列表進行排序（此處爲降序排列）
submission_dicts = sorted(submission_dicts, key=itemgetter('comments'), reverse = True)

# 遍歷排序後的列表，打印信息
for submission_dict in submission_dicts:
    print("\nTitle:", submission_dict['title'])
    print("Discussion link:", submission_dict['link'])
    print("Comments:", submission_dict['comments'])