day18

時間 2019-11-10

標籤 day18 day 简体版

原文原文鏈接

numpy模塊

numpy是Python的一種開源的數據計算擴展庫，用來存儲和處理大型矩陣數組

區別於list列表，提供數組操做，數組運算，以及統計分佈和簡單的數學模型
計算速度快

矩陣即numpy的ndarray對象，建立矩陣就是把一個列表傳入np.array()方法數據結構

import numpy as np #約定俗成 np表明numpy#一維arr = np.array([1,2,3,4])print(arr)[1,2,3,4]#二維arr = np.array([[1,2,3,4],[5,6,7,8]])[[1,2,3,4] [5,6,7,8]]#三維[[[1,2,3,4],  [1,2,3,4],  [1,2,3,4]], [[2,3,4,5],  [3,4,5,6],  [3,4,5,6]], [[5,6,7,8],  [5,6,7,8],  [5,6,7,8]]]arr = [[1,2,3]       [4,5,6]]#獲取矩陣的行和列print(arr.shape)(2,3)#獲取矩陣的行print(arr.shape[0])2# 獲取矩陣的列print(arr.shape[1])3

切割矩陣

# 取全部元素print(arr[:,:])# 取第一行全部元素print(arr[:1,:])print(arr[0,[0,1,2,3,....(n個數則n-1)]])# 取第一列全部元素print(arr[:,:1])print(arr[[0,1,2,3,..],0])# 取第一行第一列的元素print(arr[0,0])#取大於5的元素，返回一個數組print(arr[arr > 5])#生成布爾矩陣print(arr > 5)[[False False False] [True  False True ]]

矩陣元素替換

相似於列表的替換app

# 取第一行全部元素變爲0arr1 = arr.copy()arr1[:1,:] = 0print(arr1)# 去全部大於5的元素變爲0arr2 = arr.copy()arr2[arr >5] = 0print(arr2)#對矩陣清零arr3 = arr.copy()arr3[:,:] = 0print(arr3)

矩陣的合併

arr1 = [[1,2]        [3,4]]arr2 = [[5,6]        [7,8]]# 合併矩陣的行，用hstack的合併的話 會具備相同的行#方法1np.hstack((arr1,arr2))[[1,2,5,6] [3,4,7,8]]#方法2print(np.concatenate((arr1,arr2),axis=1))[[1,2,5,6] [3,4,7,8]]# 合併矩陣的列，用vstack#方法1np.vstack((arr1,arr2))[[1,2] [3,4] [5,6] [7,8]]# 方法2print(np.contatenate((arr1,arr2),axis=0))

經過函數建立矩陣

arangeprint(np.arange(10))#0-9數組[0 1 2 3 4 5 6 7 8 9]print(np.arange(1,5))#1-4數組[1 2 3 4]print(np.arange(1,20,2))#1-19，步長爲2的數組[1 3 5 7 9 11 13 15 17 19]linspace/logspace#構造一個等差數列，取頭也取尾np.linspace(0,20,5)[0.5.10.15.20]#構造一個等比數列，從10**0取到10**20，取5個數np.logspace(0,20,5)[ 1.00000e+00   1.00000e+05  1.00000e+10  1.00000e+15  1.00000e+20]zero/ones/eye/empty#構造全0矩陣np.zeros((3,4))[[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]]#構造全1矩陣np.ones((X,Y))#構造N個主元的單位矩陣np.eye(n)#例[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]#構造一個隨機矩陣，裏面元素爲隨機生成np.empty((x,y))# fromstring經過對字符串的字符編碼所對應ASCII編碼的位置，生成一個ndarray對象s = 'abcdef'# np.int8表示一個字符的字節數爲8print(np.fromstring(s, dtype=np.int8))[ 97  98  99 100 101 102]def func(i, j):    """其中i爲矩陣的行，j爲矩陣的列"""    return i*j# 使用函數對矩陣元素的行和列的索引作處理，獲得當前元素的值，索引從0開始，並構造一個3*4的矩陣print(np.fromfunction(func, (3, 4)))[[ 0.  0.  0.  0.] [ 0.  1.  2.  3.] [ 0.  2.  4.  6.]]

矩陣的運算

+ - * / % **n

矩陣的點乘

必須知足第一個矩陣的列 = 第二個矩陣的行

arr1 = np.array([[1, 2, 3],  [4, 5, 6]])print(arr1.shape)(2, 3)arr2 = np.array([[7, 8], [9, 10], [11, 12]])print(arr2.shape)(3, 2)assert arr1.shape[0] == arr2.shape[1]# 2*3·3*2 = 2*2print(arr1.dot(arr2))[[ 58  64] [139 154]]

矩陣的轉置

至關於矩陣的行和列呼喚

arr = np.array([[1, 2, 3],  [4, 5, 6]])print(arr)[[1 2 3] [4 5 6]]print(arr.transpose())[[1 4] [2 5] [3 6]]print(arr.T)[[1 4] [2 5] [3 6]]

矩陣的逆

矩陣行和列相同時候纔可逆

arr = np.array([[1, 2, 3],  [4, 5, 6], [7, 8, 9]])print(arr)[[1 2 3] [4 5 6] [7 8 9]]print(np.linalg.inv(arr))[[  3.15251974e+15  -6.30503948e+15   3.15251974e+15] [ -6.30503948e+15   1.26100790e+16  -6.30503948e+15] [  3.15251974e+15  -6.30503948e+15   3.15251974e+15]]# 單位矩陣的逆是單位矩陣自己arr = np.eye(3)print(arr)[[ 1.  0.  0.] [ 0.  1.  0.] [ 0.  0.  1.]]print(np.linalg.inv(arr))[[ 1.  0.  0.] [ 0.  1.  0.] [ 0.  0.  1.]]

collections模塊

計數器（Counter）
雙向隊列（deque）
默認字典（defaultdict）
有序字典（OrderedDict）
可命名元組（namedtuple）

1. Counter

　　Counter做爲字典dicit（）的一個子類用來進行hashtable計數，將元素進行數量統計，計數後返回一個字典，鍵值爲元素，值爲元素個數函數

經常使用方法：

most_common(int)	按照元素出現的次數進行從高到低的排序，返回前int個元素的字典
elements	返回通過計算器Counter後的元素，返回的是一個迭代器
update	和set集合的update同樣，對集合進行並集更新
substract	和update相似，只是update是作加法，substract作減法,從另外一個集合中減去本集合的元素
iteritems	返回由Counter生成的字典的全部item
iterkeys	返回由Counter生成的字典的全部key
itervalues	返回由Counter生成的字典的全部value

2. deque

　　deque屬於高性能的數據結構之一，經常使用方法以下：工具

append	隊列右邊添加元素
appendleft	隊列左邊添加元素
clear	清空隊列中的全部元素
count	返回隊列中包含value的個數
extend	隊列右邊擴展，能夠是列表、元組或字典，若是是字典則將字典的key加入到deque
extendleft	同extend，在左邊擴展
pop	移除並返回隊列右邊的元素
popleft	移除並返回隊列左邊的元素
remove（value）	移除隊列第一個出現的元素
reverse	隊列的全部元素進行反轉
rotate（n）	對隊列數進行移動

3. defaultdict

默認字典，字典的一個子類，繼承全部字典的方法，默認字典在進行定義初始化的時候得指定字典值有默認類型
注：字典dic在定義的時候就定義好了值爲字典類型,雖然如今字典中尚未鍵值 k1，但仍然能夠執行字典的update方法. 這種操做方式在傳統的字典類型中是沒法實現的,必須賦值之後才能進行值得更新操做，不然會報錯。

4. OrderedDict

　　有序字典也是字典的一個子類性能

5. namedtuple

　　namedtuple由本身的類工廠namedtuple()進行建立，而不是由表中的元組進行初始化，經過namedtuple建立類的參數包括類名稱和一個包含元素名稱的字符串編碼

Matplotlib模塊：繪圖和可視化

1、簡單介紹Matplotlibspa

一、Matplotlib是一個強大的Python繪圖和數據可視化的工具包3d

二、安裝方法：pip install matplotlibcode

三、引用方法：import matplotlib.pyplot as plt

四、繪圖函數：plt.plot()

五、顯示圖像：plt.show()

2、Matplotlib：plot函數

一、plot函數：繪製折線圖

線型linestyle（-,-.,--,..）
點型marker（v,^,s,*,H,+,x,D,o,…）
顏色color（b,g,r,y,k,w,…）

二、plot函數繪製多條曲線
三、pandas包對plot的支持

3、Matplotlib-圖像標註

設置圖像標題：plt.title()
設置x軸名稱：plt.xlabel()
設置y軸名稱：plt.ylabel()
設置x軸範圍：plt.xlim()
設置y軸範圍：plt.ylim()
設置x軸刻度：plt.xticks()
設置y軸刻度：plt.yticks()
設置曲線圖例：plt.legend()

4、Matplotlib實例——繪製數學函數圖像

使用Matplotlib模塊在一個窗口中繪製數學函數y=x, y=x2, y=3x3+5x2+2x+1的圖像，使用不一樣顏色的線加以區別，並使用圖例說明各個線表明什麼函數。

5、Matplotlib：畫布與子圖

畫布：figure

fig = plt.figure()

圖：subplot

ax1 = fig.add_subplot(2,2,1)

調節子圖間距：

subplots_adjust(left, bottom, right, top, wspace, hspace)

6、Matplotlib-支持的圖類型

7、Matplotlib——繪製K線圖

matplotlib.finanace子包中有許多繪製金融相關圖的函數接口。
繪製K線圖：matplotlib.finance.candlestick_ochl函數

8、示例代碼

使用以前首先下載：pip install Matplotlib

接着引入：import matplotlib.pylot as plt

繪圖函數：plt.plot()

顯示函數：plt.show()

使用plt.plot?能夠查看它的參數

咱們經過加參數，能夠更改這個圖線的形狀

pandas模塊：

pandas是一個強大的Python數據分析的工具包，是基於NumPy構建的。

pandas的主要功能：

1. 具有對其功能的數據結構DataFrame、Series
2. 集成時間序列功能
3. 提供豐富的數學運算和操做
4. 靈活處理缺失數據

安裝方法：

pip install pandas

引用方法：

import pandas as pd

Series --- 一維數據對象

Series是一種相似於一維數據的對象，由一組數據和一組與之相關的數據標籤（索引）組成。

建立方式：

import pandas as pd
pd.Series([4,7,-5,3])
pd.Series([4,7,-5,3],index=['a','b','c','d'])
pd.Series({'a':1,'b':2})
pd.Series(0,index=['a','b','c','d'])

獲取值數組和索引數組： values屬性和index屬性
Series比較像列表（數組）和字典的結合體

示例代碼：

# Series建立方式
import pandas as pd
import numpy as np

pd.Series([2,3,4,5])  # 列表建立Series
"""
輸出結果：
0    2
1    3
2    4
3    5
dtype: int64

# 左邊一列是 索引，右邊一列是值
"""

pd.Series([2,3,4,5],index=["a","b","c","d"])  # 指定索引
"""
輸出結果：
a    2
b    3
c    4
d    5
dtype: int64
"""

# Series支持array 的特性（下標）
pd.Series(np.arange(5))  # 數組建立 Series
"""
輸出結果：
0    0
1    1
2    2
3    3
4    4
dtype: int32
"""

sr = pd.Series([2,3,4,5],index=["a","b","c","d"])
sr
"""
a    2
b    3
c    4
d    5
dtype: int64
"""

# 索引：
sr[0]
#  輸出結果： 2  # sr雖然指定了 標籤索引，但仍能夠利用 下標索引 的方式獲取值

sr[[1,2,0]]  # sr[[索引1,索引2,...]]
"""
b    3
c    4
a    2
dtype: int64
"""

sr['d']
# 輸出結果： 5

# Series能夠和標量進行運算
sr+2
"""
a    4
b    5
c    6
d    7
dtype: int64
"""

# 兩個相同大小（長度相同）的 Series 也能夠進行運算
sr + sr
"""
a     4
b     6
c     8
d    10
dtype: int64
"""

# 切片
sr[0:2]  # 也是 顧首不顧尾
"""
a    2
b    3
dtype: int64
"""

# Series也支持 numpy 的通用函數
np.abs(sr)
"""
a    2
b    3
c    4
d    5
dtype: int64
"""

# 支持布爾型索引過濾
sr[sr>3]
"""
c    4
d    5
dtype: int64
"""

sr>3
"""
a    False
b    False
c     True
d     True
dtype: bool
"""

# Series支持字典的特性（標籤）
# 經過字典建立 Series
sr = pd.Series({"a":1,"b":2})
sr 
"""
a    1
b    2
dtype: int64
# 字典的 key 會看成 標籤
"""
sr["a"]
# 輸出結果： 1
sr[0]
# 輸出結果： 1

# 判斷 一個字符串 是否是一個Series 中的標籤
"a" in sr
# 輸出結果： True

for i in sr:
    print(i)
"""
打印結果：
1
2

# for 循環中，遍歷的是 Seires 中的 值（value），而不是它的標籤；這是和字典不一樣的地方
"""

# 分別獲取 Series 的值和索引
sr.index  # 獲取索引
# 輸出結果： Index(['a', 'b'], dtype='object')  # 是一個 Index 類的對象，其和數組對象（Array）徹底同樣
sr.index[0]
# 輸出結果： 'a'

sr.values  # 獲取 Series 的值
# 輸出結果： array([1, 2], dtype=int64)

# 鍵索引
sr['a']
# 輸出結果： 1
sr[['a','b']] # 也是 花式索引
"""
a    1
b    2
dtype: int64
"""

sr = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
sr
"""
a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64
"""
sr[['a','c']]
"""
a    1
c    3
dtype: int64
"""
sr['a':'c']  # 經過標籤進行切片； 首尾相顧，前包後也包
"""
a    1
b    2
c    3
dtype: int64
"""

series 整數索引問題：

整數索引的pandas對象很容易出錯，如：

import pandas as pd
import numpy as np

sr = pd.Series(np.arange(10))
sr
"""
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32
# 上述的整數索引是自動生成的索引
"""

sr2 = sr[5:].copy()
sr2
"""
5    5
6    6
7    7
8    8
9    9
dtype: int32
# 上述的索引仍然是整數索引，可是不是從0開始的
"""
sr2[5]  # 此時的 5 解釋爲 標籤，而不是下標（索引）
# 輸出結果： 5

# sr2[-1]  # 會報錯；由於當索引是整數的時候，[] 中的內容必定會被解釋爲 標籤

# 解決方法： loc 和 iloc
sr2.loc[5]  # loc 表示 [] 中的內容解釋爲 標籤
# 輸出結果： 5
sr2.iloc[4] # iloc 表示 [] 的內容解釋爲 下標（索引）
# 輸出結果： 9
sr2.iloc[0:3]
"""
5    5
6    6
7    7
"""
# 因此 用整數索引的時候 必定要 loc 和 iloc 進行區分

若是索引是整數類型，則根據整數進行下標獲取值時老是面向標籤的
解決方法：loc 屬性（將索引解釋爲標籤）和 iloc 屬性（將索引解釋爲下標）

Series --- 數據對齊

pandas 在進行兩個Series對象的運算時，會按照索引進行對齊而後計算

示例代碼：

# Series -- 數據對齊
import pandas as pd

sr1 = pd.Series([12,23,34],index=["c","a","d"])
sr2 = pd.Series([11,20,10],index=["d","c","a"])
sr1 + sr2
"""
a    33    # 23+10
c    32    # 12+20
d    45    # 34+11
dtype: int64
# 數據會按照標籤對齊
"""
# pandas 在進行兩個Series對象的運算時，會按照索引進行對齊而後計算

# 注： pandas 的索引支持重複，但咱們不要讓索引重複 
pd.Series([1,2],index=["a","a"])  
"""
a    1
a    2
dtype: int64
"""

# 兩個 pandas對象的長度不同時
sr3 = pd.Series([12,23,34],index=["c","a","d"])
sr4 = pd.Series([11,20,10,21],index=["d","c","a","b"])
sr3+sr4
"""
a    33.0
b     NaN
c    32.0
d    45.0
dtype: float64
# 在 pandas 中 NaN 會被看成數據缺失值
"""

sr5 = pd.Series([12,23,34],index=["c","a","d"])
sr6 = pd.Series([11,20,10],index=["b","c","a"])
sr5+sr6
"""
a    33.0
b     NaN
c    32.0
d     NaN
dtype: float64
"""
#使上述結果中索引"b"處的值爲 2一、在索引"d"處的值爲34 的方法： add sub mul div  （分別是 加減乘除）；如：sr5.add(sr2,fill_value=0) 
sr5.add(sr6)
"""
a    33.0
b     NaN
c    32.0
d     NaN
dtype: float64
# 不加 fill_value 時， sr5.add(sr6) 和 sr5+sr6 同樣的效果
"""

sr5.add(sr6,fill_value=0)  # fill_value 的做用：若是一個Series對象中有某個標籤，但另一個Series對象中沒有該標籤，那麼沒有該標籤的那個值就被賦值爲 fill_value 的值
"""
a    33.0
b    11.0
c    32.0
d    34.0
dtype: float64
"""

相關標籤/搜索

day18

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。