操做 numpy 數組的經常使用函數

時間 2019-11-06

標籤 numpy 數組經常使用函數简体版

原文原文鏈接

操做 numpy 數組的經常使用函數

where

使用 where 函數能將索引掩碼轉換成索引位置：php

indices = where(mask)
indices

=> (array([11, 12, 13, 14]),) x[indices] # this indexing is equivalent to the fancy indexing x[mask] => array([ 5.5, 6. , 6.5, 7. ])

diag

使用 diag 函數可以提取出數組的對角線：css

diag(A)

=> array([ 0, 11, 22, 33, 44]) diag(A, -1) array([10, 21, 32, 43])

take

take 函數與高級索引（fancy indexing）用法類似：python

v2 = arange(-3,3) v2 => array([-3, -2, -1, 0, 1, 2]) row_indices = [1, 3, 5] v2[row_indices] # fancy indexing => array([-2, 0, 2]) v2.take(row_indices) => array([-2, 0, 2])

可是 take 也能夠用在 list 和其它對象上：算法

take([-3, -2, -1, 0, 1, 2], row_indices) => array([-2, 0, 2])

choose

選取多個數組的部分組成新的數組：數組

which = [1, 0, 1, 0] choices = [[-2,-2,-2,-2], [5,5,5,5]] choose(which, choices) => array([ 5, -2, 5, -2])

線性代數

矢量化是用 Python/Numpy 編寫高效數值計算代碼的關鍵，這意味着在程序中儘可能選擇使用矩陣或者向量進行運算，好比矩陣乘法等。ruby

標量運算

咱們可使用通常的算數運算符，好比加減乘除，對數組進行標量運算。bash

v1 = arange(0, 5) v1 * 2 => array([0, 2, 4, 6, 8]) v1 + 2 => array([2, 3, 4, 5, 6]) A * 2, A + 2 => (array([[ 0, 2, 4, 6, 8], [20, 22, 24, 26, 28], [40, 42, 44, 46, 48], [60, 62, 64, 66, 68], [80, 82, 84, 86, 88]]), array([[ 2, 3, 4, 5, 6], [12, 13, 14, 15, 16], [22, 23, 24, 25, 26], [32, 33, 34, 35, 36], [42, 43, 44, 45, 46]]))

Element-wise(逐項乘) 數組-數組運算

當咱們在矩陣間進行加減乘除時，它的默認行爲是 element-wise(逐項乘) 的:app

A * A # element-wise multiplication => array([[ 0, 1, 4, 9, 16], [ 100, 121, 144, 169, 196], [ 400, 441, 484, 529, 576], [ 900, 961, 1024, 1089, 1156], [1600, 1681, 1764, 1849, 1936]]) v1 * v1 => array([ 0, 1, 4, 9, 16]) A.shape, v1.shape => ((5, 5), (5,)) A * v1 => array([[ 0, 1, 4, 9, 16], [ 0, 11, 24, 39, 56], [ 0, 21, 44, 69, 96], [ 0, 31, 64, 99, 136], [ 0, 41, 84, 129, 176]])

矩陣代數

矩陣乘法要怎麼辦? 有兩種方法。python2.7

1.使用 dot 函數進行矩陣－矩陣，矩陣－向量，數量積乘法：ide

dot(A, A)

=> array([[ 300, 310, 320, 330, 340], [1300, 1360, 1420, 1480, 1540], [2300, 2410, 2520, 2630, 2740], [3300, 3460, 3620, 3780, 3940], [4300, 4510, 4720, 4930, 5140]]) dot(A, v1) => array([ 30, 130, 230, 330, 430]) dot(v1, v1) => 30

2.將數組對象映射到 matrix 類型。

M = matrix(A)
v = matrix(v1).T # make it a column vector v => matrix([[0], [1], [2], [3], [4]]) M * M => matrix([[ 300, 310, 320, 330, 340], [1300, 1360, 1420, 1480, 1540], [2300, 2410, 2520, 2630, 2740], [3300, 3460, 3620, 3780, 3940], [4300, 4510, 4720, 4930, 5140]]) M * v => matrix([[ 30], [130], [230], [330], [430]]) # inner product v.T * v => matrix([[30]]) # with matrix objects, standard matrix algebra applies v + M*v => matrix([[ 30], [131], [232], [333], [434]])

加減乘除不兼容的維度時會報錯：

v = matrix([1,2,3,4,5,6]).T shape(M), shape(v) => ((5, 5), (6, 1)) M * v => Traceback (most recent call last): File "<ipython-input-9-995fb48ad0cc>", line 1, in <module> M * v File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/numpy/matrixlib/defmatrix.py", line 341, in __mul__ return N.dot(self, asmatrix(other)) ValueError: shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)

查看其它運算函數: inner, outer, cross, kron, tensordot。可使用 help(kron)。

數組/矩陣變換

以前咱們使用 .T 對 v 進行了轉置。咱們也可使用 transpose 函數完成一樣的事情。

讓咱們看看其它變換函數：

C = matrix([[1j, 2j], [3j, 4j]])
C

=> matrix([[ 0.+1.j,  0.+2.j],
           [ 0.+3.j,  0.+4.j]])

共軛：

conjugate(C)

=> matrix([[ 0.-1.j, 0.-2.j], [ 0.-3.j, 0.-4.j]])

共軛轉置：

C.H

=> matrix([[ 0.-1.j, 0.-3.j], [ 0.-2.j, 0.-4.j]])

real 與 imag 可以分別獲得複數的實部與虛部：

real(C) # same as: C.real => matrix([[ 0., 0.], [ 0., 0.]]) imag(C) # same as: C.imag => matrix([[ 1., 2.], [ 3., 4.]])

angle 與 abs 能夠分別獲得幅角和絕對值：

angle(C+1) # heads up MATLAB Users, angle is used instead of arg => array([[ 0.78539816, 1.10714872], [ 1.24904577, 1.32581766]]) abs(C) => matrix([[ 1., 2.], [ 3., 4.]])

矩陣計算

矩陣求逆

from scipy.linalg import *
inv(C) # equivalent to C.I => matrix([[ 0.+2.j , 0.-1.j ], [ 0.-1.5j, 0.+0.5j]]) C.I * C => matrix([[ 1.00000000e+00+0.j, 4.44089210e-16+0.j], [ 0.00000000e+00+0.j, 1.00000000e+00+0.j]])

行列式

linalg.det(C)

=> (2.0000000000000004+0j) linalg.det(C.I) => (0.50000000000000011+0j)

數據處理

將數據集存儲在 Numpy 數組中能很方便地獲得統計數據。爲了有個感性地認識，讓咱們用 numpy 來處理斯德哥爾摩天氣的數據。

# reminder, the tempeature dataset is stored in the data variable: shape(data) => (77431, 7)

平均值

# the temperature data is in column 3 mean(data[:,3]) => 6.1971096847515925

過去200年裏斯德哥爾摩的日均溫度大約是 6.2 C。

標準差與方差

std(data[:,3]), var(data[:,3]) => (8.2822716213405663, 68.596023209663286)

最小值與最大值

# lowest daily average temperature data[:,3].min() => -25.800000000000001 # highest daily average temperature data[:,3].max() => 28.300000000000001

總和, 總乘積與對角線和

d = arange(0, 10) d => array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # sum up all elements sum(d) => 45 # product of all elements prod(d+1) => 3628800 # cummulative sum cumsum(d) => array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45]) # cummulative product cumprod(d+1) => array([ 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]) # same as: diag(A).sum() trace(A) => 110

對子數組的操做

咱們可以經過在數組中使用索引，高級索引，和其它從數組提取數據的方法來對數據集的子集進行操做。

舉個例子，咱們會再次用到溫度數據集：

!head -n 3 stockholm_td_adj.dat 1800 1 1 -6.1 -6.1 -6.1 1 1800 1 2 -15.4 -15.4 -15.4 1 1800 1 3 -15.0 -15.0 -15.0 1

該數據集的格式是：年，月，日，日均溫度，最低溫度，最高溫度，地點。

若是咱們只是關注一個特定月份的平均溫度，好比說2月份，那麼咱們能夠建立一個索引掩碼，只選取出咱們須要的數據進行操做：

unique(data[:,1]) # the month column takes values from 1 to 12 => array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.]) mask_feb = data[:,1] == 2 # the temperature data is in column 3 mean(data[mask_feb,3]) => -3.2121095707366085

擁有了這些工具咱們就擁有了很是強大的數據處理能力。像是計算每月的平均溫度只須要幾行代碼：

months = arange(1,13)
monthly_mean = [mean(data[data[:,1] == month, 3]) for month in months] fig, ax = subplots() ax.bar(months, monthly_mean) ax.set_xlabel("Month") ax.set_ylabel("Monthly avg. temp.");

對高維數組的操做

當諸如 min, max 等函數對高維數組進行操做時，有時咱們但願是對整個數組進行該操做，有時則但願是對每一行進行該操做。使用 axis 參數咱們能夠指定函數的行爲：

m = rand(3,3) m => array([[ 0.09260423, 0.73349712, 0.43306604], [ 0.65890098, 0.4972126 , 0.83049668], [ 0.80428551, 0.0817173 , 0.57833117]]) # global max m.max() => 0.83049668273782951 # max in each column m.max(axis=0) => array([ 0.80428551, 0.73349712, 0.83049668]) # max in each row m.max(axis=1) => array([ 0.73349712, 0.83049668, 0.80428551])

改變形狀與大小

Numpy 數組的維度能夠在底層數據不用複製的狀況下進行修改，因此 reshape 操做的速度很是快，即便是操做大數組。

A

=> array([[ 0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]]) n, m = A.shape B = A.reshape((1,n*m)) B => array([[ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]]) B[0,0:5] = 5 # modify the array B => array([[ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]]) A # and the original variable is also changed. B is only a different view of the same data => array([[ 5, 5, 5, 5, 5], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]])

咱們也可使用 flatten 函數建立一個高階數組的向量版本，可是它會將數據作一份拷貝。

B = A.flatten()
B

=> array([ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]) B[0:5] = 10 B => array([10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]) A # now A has not changed, because B's data is a copy of A's, not refering to the same data => array([[ 5, 5, 5, 5, 5], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]])

增長一個新維度: newaxis

newaxis 能夠幫助咱們爲數組增長一個新維度，好比說，將一個向量轉換成列矩陣和行矩陣：

v = array([1,2,3]) shape(v) => (3,) # make a column matrix of the vector v v[:, newaxis] => array([[1], [2], [3]]) # column matrix v[:,newaxis].shape => (3, 1) # row matrix v[newaxis,:].shape => (1, 3)

疊加與重複數組

函數 repeat, tile, vstack, hstack, 與 concatenate能幫助咱們以已有的矩陣爲基礎建立規模更大的矩陣。

`tile` 與 `repeat`

a = array([[1, 2], [3, 4]]) # repeat each element 3 times repeat(a, 3) => array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]) # tile the matrix 3 times tile(a, 3) => array([[1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4]])

`concatenate`

b = array([[5, 6]]) concatenate((a, b), axis=0) => array([[1, 2], [3, 4], [5, 6]]) concatenate((a, b.T), axis=1) => array([[1, 2, 5], [3, 4, 6]])

`hstack` 與 `vstack`

vstack((a,b)) => array([[1, 2], [3, 4], [5, 6]]) hstack((a,b.T)) => array([[1, 2, 5], [3, 4, 6]])

淺拷貝與深拷貝

爲了得到高性能，Python 中的賦值經常不拷貝底層對象，這被稱做淺拷貝。

A = array([[1, 2], [3, 4]]) A => array([[1, 2], [3, 4]]) # now B is referring to the same array data as A B = A # changing B affects A B[0,0] = 10 B => array([[10, 2], [ 3, 4]]) A => array([[10, 2], [ 3, 4]])

若是咱們但願避免改變原數組數據的這種狀況，那麼咱們須要使用 copy 函數進行深拷貝：

B = copy(A)
# now, if we modify B, A is not affected B[0,0] = -5 B => array([[-5, 2], [ 3, 4]]) A => array([[10, 2], [ 3, 4]])

遍歷數組元素

一般狀況下，咱們是但願儘量避免遍歷數組元素的。由於迭代相比向量運算要慢的多。

可是有些時候迭代又是不可避免的，這種狀況下用 Python 的 for 是最方便的：

v = array([1,2,3,4]) for element in v: print(element) => 1 2 3 4 M = array([[1,2], [3,4]]) for row in M: print("row", row) for element in row: print(element) => row [1 2] 1 2 row [3 4] 3 4

當咱們須要遍歷數組而且更改元素內容的時候，可使用 enumerate 函數同時獲取元素與對應的序號：

for row_idx, row in enumerate(M): print("row_idx", row_idx, "row", row) for col_idx, element in enumerate(row): print("col_idx", col_idx, "element", element) # update the matrix M: square each element M[row_idx, col_idx] = element ** 2 row_idx 0 row [1 2] col_idx 0 element 1 col_idx 1 element 2 row_idx 1 row [3 4] col_idx 0 element 3 col_idx 1 element 4 # each element in M is now squared M array([[ 1, 4], [ 9, 16]])

矢量化函數

像以前提到的，爲了得到更好的性能咱們最好儘量避免遍歷咱們的向量和矩陣，有時能夠用矢量算法代替。首先要作的就是將標量算法轉換爲矢量算法：

def Theta(x): """ Scalar implemenation of the Heaviside step function. """ if x >= 0: return 1 else: return 0 Theta(array([-3,-2,-1,0,1,2,3])) => Traceback (most recent call last): File "<ipython-input-11-1f7d89baf696>", line 1, in <module> Theta(array([-3, -2, -1, 0, 1, 2, 3])) File "<ipython-input-10-fbb0379ab8cb>", line 2, in Theta if x >= 0: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

很顯然 Theta 函數不是矢量函數因此沒法處理向量。

爲了獲得 Theta 函數的矢量化版本咱們可使用 vectorize 函數：

Theta_vec = vectorize(Theta)
Theta_vec(array([-3,-2,-1,0,1,2,3])) => array([0, 0, 0, 1, 1, 1, 1])

咱們也能夠本身實現矢量函數:

def Theta(x): """ Vector-aware implemenation of the Heaviside step function. """ return 1 * (x >= 0) Theta(array([-3,-2,-1,0,1,2,3])) => array([0, 0, 0, 1, 1, 1, 1]) # still works for scalars as well Theta(-1.2), Theta(2.6) => (0, 1)

數組與條件判斷

M

=> array([[ 1, 4], [ 9, 16]]) if (M > 5).any(): print("at least one element in M is larger than 5") else: print("no element in M is larger than 5") => at least one element in M is larger than 5 if (M > 5).all(): print("all elements in M are larger than 5") else: print("all elements in M are not larger than 5") => all elements in M are not larger than 5

類型轉換

既然 Numpy 數組是靜態類型，數組一旦生成類型就沒法改變。可是咱們能夠顯示地對某些元素數據類型進行轉換生成新的數組，使用 astype 函數（可查看功能類似的 asarray 函數）：

M.dtype

=> dtype('int64') M2 = M.astype(float) M2 => array([[ 1., 4.], [ 9., 16.]]) M2.dtype => dtype('float64') M3 = M.astype(bool) M3 => array([[ True, True], [ True, True]], dtype=bool)

操做 numpy 數組的經常使用函數