搭建模型第一步：你須要預習的NumPy基礎都在這了

時間 2019-11-17

標籤搭建模型第一步須要預習 numpy 基礎都在简体版

原文原文鏈接

選自 Numpy，機器之心編譯。html

NumPy 是一個爲 Python 提供高性能向量、矩陣和高維數據結構的科學計算庫。它經過 C 和 Fortran 實現，所以用向量和矩陣創建方程並實現數值計算有很是好的性能。NumPy 基本上是全部使用 Python 進行數值計算的框架和包的基礎，例如 TensorFlow 和 PyTorch，構建機器學習模型最基礎的內容就是學會使用 NumPy 搭建計算過程。

基礎知識數組

NumPy 主要的運算對象爲同質的多維數組，即由同一類型元素（通常是數字）組成的表格，且全部元素經過正整數元組進行索引。在 NumPy 中，維度 (dimension) 也被稱之爲軸線（axes)。bash

好比座標點 [1, 2, 1] 有一個軸線。這個軸上有 3 個點，因此咱們說它的長度（length）爲 3。而以下數組（array）有 2 個軸線，長度一樣爲 3。數據結構

[[ 1., 0., 0.],
[ 0., 1., 2.]]
複製代碼

NumPy 的數組類（array class）叫作 ndarray，同時咱們也常稱其爲數組（array）。注意 numpy.array 和標準 Python 庫中的類 array.array 是不一樣的。標準 Python 庫中的類 array.array 只處理一維的數組，提供少許的功能。ndarray 還具備以下不少重要的屬性：框架

ndarray.ndim：顯示數組的軸線數量（或維度）。
ndarray.shape：顯示在每一個維度裏數組的大小。如 n 行 m 列的矩陣，它的 shape 就是（n,m)。

>>> b = np.array([[1,2,3],[4,5,6]])
>>> b.shape
(2, 3)
複製代碼

ndarray.size：數組中全部元素的總量，至關於數組的 shape 中全部元素的乘積，例如矩陣的元素總量爲行與列的乘積。

>>> b = np.array([[1,2,3],[4,5,6]])
>>> b.size
6
複製代碼

ndarray.dtype：顯示數組元素的類型。Python 中的標準 type 函數一樣能夠用於顯示數組類型，NumPy 有它本身的類型如：numpy.int32, numpy.int16, 和 numpy.float64，其中「int」和「float」表明數據的種類是整數仍是浮點數，「32」和「16」表明這個數組的字節數（存儲大小）。
ndarray.itemsize：數組中每一個元素的字節存儲大小。例如元素類型爲 float64 的數組，其 itemsize 爲 8（=64/8）。

>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>
複製代碼

建立數組dom

NumPy 有不少種建立數組的方法。好比，你能夠用 Python 的列表（list）來建立 NumPy 數組，其中生成的數組元素類型與原序列相同。機器學習

>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
dtype('int64')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')
複製代碼

一個常見的偏差（error）在於調用 array 時使用了多個數值參數，而正確的方法應該是用「[]」來定義一個列表的數值而做爲數組的一個參數。ide

>>> a = np.array(1,2,3,4)    # WRONG
>>> a = np.array([1,2,3,4])  # RIGHT
複製代碼

array 將序列中的序列轉換爲二維的數組，序列中的序列中的序列轉換爲三維數組，以此類推。函數

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])
複製代碼

數組的類型也能夠在建立時指定清楚：佈局

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])
複製代碼

通常數組的內部元素初始是未知的，但它的大小是已知的。所以，NumPy 提供了一些函數能夠建立有初始數值的佔位符數組，這樣能夠減小沒必要要的數組增加及運算成本。

函數 zeros 可建立一個內部元素全是 0 的數組，函數 ones 可建立一個內部元素全是 1 的數組，函數 empty 可建立一個初始元素爲隨機數的數組，具體隨機量取決於內存狀態。默認狀態下，建立數組的數據類型（dtype）通常是 float64。

>>> np.zeros( (3,4) )
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> np.ones( (2,3,4), dtype=np.int16 )   # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) )                    # uninitialized, output may vary
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])
複製代碼

爲了建立數列，NumPy 提供一個與 range 相似的函數來建立數組：arange。

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )                 # it accepts float arguments
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])
複製代碼

當 arange 使用浮點型參數時，由於浮點精度的有限性，arange 不能判斷有須要建立的數組多少個元素。在這種狀況下，換成 linspace 函數能夠更好地肯定區間內到底須要產生多少個數組元素。

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 )                 # 9 numbers from 0 to 2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> x = np.linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
>>> f = np.sin(x)
複製代碼

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.rand, numpy.random.randn, fromfunction, fromfile （這些函數也能夠建立數組，有時間能夠嘗試解釋）

輸出數組

當你輸出一個數組時，NumPy 顯示這個數組的方式和嵌套列表是類似的。但將數組打印到屏幕須要遵照如下佈局：

最後一個軸由左至右打印
倒數第二個軸爲從上到下打印
其他的軸都是從上到下打印，且每一塊之間都經過一個空行分隔

以下所示，一維數組輸出爲一行、二維爲矩陣、三維爲矩陣列表。

>>> a = np.arange(6)                         # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3)           # 2d array
>>> print(b)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4)         # 3d array
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
複製代碼

上述使用的 reshape 函數可指定數組的行列數，並將全部元素按指定的維度數排列，詳細介紹請看後面章節。在數組的打印中，若是一個數組所含元素數太大，NumPy 會自動跳過數組的中間部分，只輸出兩邊。

>>> print(np.arange(10000))
[   0    1    2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[   0    1    2 ...,   97   98   99]
 [ 100  101  102 ...,  197  198  199]
 [ 200  201  202 ...,  297  298  299]
 ...,
 [9700 9701 9702 ..., 9797 9798 9799]
 [9800 9801 9802 ..., 9897 9898 9899]
 [9900 9901 9902 ..., 9997 9998 9999]]
複製代碼

若是想要 NumPy 輸出整個數組，你能夠用 set_printoptions 改變輸出設置。

>>> np.set_printoptions(threshold=np.nan)
複製代碼

基礎運算

數組中的算術運算通常是元素級的運算，運算結果會產生一個新的數組。以下所示減法、加法、平方、對應元素乘積和邏輯運算都是元素級的操做。

>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])
>>> a<35
array([ True, True, False, False])
複製代碼

不一樣於許多科學計算語言，乘法算子 * 或 multiple 函數在 NumPy 數組中用於元素級的乘法運算，矩陣乘法可用 dot 函數或方法來執行。

>>> A = np.array( [[1,1],
...             [0,1]] )
>>> B = np.array( [[2,0],
...             [3,4]] )
>>> A*B                         # elementwise product
array([[2, 0],
       [0, 4]])
>>> A.dot(B)                    # matrix product
array([[5, 4],
       [3, 4]])
>>> np.dot(A, B)                # another matrix product
array([[5, 4],
       [3, 4]])
複製代碼

有一些操做，如 += 和 *=，其輸出結果會改變一個已存在的數組，而不是如上述運算建立一個新數組。

>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
       [3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022  ,  3.72032449,  3.00011437],
       [ 3.30233257,  3.14675589,  3.09233859]])
>>> a += b                  # b is not automatically converted to integer type
Traceback (most recent call last):
  ...
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
複製代碼

當操做不一樣數據類型的數組時，最後輸出的數組類型通常會與更廣泛或更精準的數組相同（這種行爲叫作 Upcasting）。

>>> a = np.ones(3, dtype=np.int32)
>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1.        ,  2.57079633,  4.14159265])
>>> c.dtype.name
'float64'
>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
       -0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'
複製代碼

許多一元運算，如計算數組中全部元素的總和，是屬於 ndarray 類的方法。

>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021,  0.34556073,  0.39676747],
       [ 0.53881673,  0.41919451,  0.6852195 ]])
>>> a.sum()
2.5718191614547998
>>> a.min()
0.1862602113776709
>>> a.max()
0.6852195003967595
複製代碼

默認狀態下，這些運算會把數組視爲一個數列而不論它的 shape。然而，若是在指定 axis 參數下，你能夠指定針對哪個維度進行運算。以下 axis=0 將針對每個列進行運算，例如 b.sum(axis=0) 將矩陣 b 中每個列的全部元素都相加爲一個標量。

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])
複製代碼

索引、截取和迭代

一維數組能夠被索引、截取（Slicing）和迭代，就像 Python 列表和元組同樣。注意其中 a[0:6:2] 表示從第 1 到第 6 個元素，並對每兩個中的第二個元素進行操做。

>>> a = np.arange(10)**3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
>>> a[ : :-1]                                 # reversed a
array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])
>>> for i in a:
...     print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0
複製代碼

多維數組每一個軸均可以有一個索引。這些索引在元組中用逗號分隔：

>>> def f(x,y):
...     return 10*x+y
...
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])
複製代碼

當有些維度沒有指定索引時，空缺的維度被默認爲取全部元素。

>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])
複製代碼

如上由於省略了第二維，b[i] 表示輸出第 i 行。固然咱們也能夠用「:」表示省略的維度，例如 b[i] 等價於 b[i, :]。此外，NumPy 還容許使用 dots (...) 表示足夠多的冒號來構建完整的索引元組。

好比，若是 x 是 5 維數組：

x[1,2,...] 等於 x[1,2,:,:,:],
x[...,3] 等於 x[:,:,:,:,3]
x[4,...,5,:] 等於 x[4,:,:,5,:]

>>> c = np.array( [[[  0,  1,  2],               # a 3D array (two stacked 2D arrays)
...                 [ 10, 12, 13]],
...                [[100,101,102],
...                 [110,112,113]]])
>>> c.shape
(2, 2, 3)
>>> c[1,...]                                   # same as c[1,:,:] or c[1]
array([[100, 101, 102],
       [110, 112, 113]])
>>> c[...,2]                                   # same as c[:,:,2]
array([[  2,  13],
       [102, 113]])
複製代碼

多維數組中的迭代以第一條軸爲參照完成，以下每一次循環都輸出一個 b[i]：

>>> for row in b:
...     print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]
複製代碼

然而，若是想在數組的每一個元素上進行操做，能夠用 flat 方法。flat 是一個在數組全部元素中運算的迭代器，以下將逐元素地對數組進行操做。

>>> for element in b.flat:
...     print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43
複製代碼

Shape 變換

改變數組的 shape

一個數組的 shape 是由軸及其元素數量決定的，它通常由一個整型元組表示，且元組中的整數表示對應維度的元素數。

>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2.,  8.,  0.,  6.],
       [ 4.,  5.,  1.,  1.],
       [ 8.,  9.,  3.,  6.]])
>>> a.shape
(3, 4)
複製代碼

一個數組的 shape 能夠由許多方法改變。例如如下三種方法均可輸出一個改變 shape 後的新數組，它們都不會改變原數組。其中 reshape 方法在實踐中會常常用到，由於咱們須要改變數組的維度以執行不一樣的運算。

>>> a.ravel()  # returns the array, flattened
array([ 2.,  8.,  0.,  6.,  4.,  5.,  1.,  1.,  8.,  9.,  3.,  6.])
>>> a.reshape(6,2)  # returns the array with a modified shape
array([[ 2.,  8.],
       [ 0.,  6.],
       [ 4.,  5.],
       [ 1.,  1.],
       [ 8.,  9.],
       [ 3.,  6.]])
>>> a.T  # returns the array, transposed
array([[ 2.,  4.,  8.],
       [ 8.,  5.,  9.],
       [ 0.,  1.,  3.],
       [ 6.,  1.,  6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)
複製代碼

ravel() 和 flatten() 都是將多維數組降位一維，flatten() 返回一份新的數組，且對它所作的修改不會影響原始數組，而 ravel() 返回的是 view，會影響原始矩陣。

在矩陣的轉置中，行和列的維度將交換，且矩陣中每個元素將沿主對角線對稱變換。此外，reshape 以下所示返回修改過維度的新數組，而 resize 方法將直接修改原數組自己的維度。

>>> a
array([[ 2.,  8.,  0.,  6.],
       [ 4.,  5.,  1.,  1.],
       [ 8.,  9.,  3.,  6.]])
>>> a.resize((2,6))
>>> a
array([[ 2.,  8.,  0.,  6.,  4.,  5.],
       [ 1.,  1.,  8.,  9.,  3.,  6.]])
複製代碼

若是在 shape 變換中一個維度設爲-1，那麼這一個維度包含的元素數將會被自動計算。以下所示，a 一共有 12 個元素，在肯定一共有 3 行後，-1 會自動計算出應該須要 4 列才能安排全部的元素。

>>> a.reshape(3,-1)
array([[ 2.,  8.,  0.,  6.],
       [ 4.,  5.,  1.,  1.],
       [ 8.,  9.,  3.,  6.]])
複製代碼

數組堆疊

數組能夠在不一樣軸上被堆疊在一塊兒。以下所示 vstack 將在第二個維度（垂直）將兩個數組拼接在一塊兒，而 hstack 將在第一個維度（水平）將數組拼接在一塊兒。

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8.,  8.],
       [ 0.,  0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1.,  8.],
       [ 0.,  4.]])
>>> np.vstack((a,b))
array([[ 8.,  8.],
       [ 0.,  0.],
       [ 1.,  8.],
       [ 0.,  4.]])
>>> np.hstack((a,b))
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])
複製代碼

column_stack 函數可堆疊一維數組爲二維數組的列，做用相等於針對二維數組的 hstack 函數。

>>> from numpy import newaxis
>>> np.column_stack((a,b))     # with 2D arrays
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b))     # returns a 2D array
array([[ 4., 3.],
       [ 2., 8.]])
>>> np.hstack((a,b))           # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis]               # this allows to have a 2D columns vector
array([[ 4.],
       [ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4.,  3.],
       [ 2.,  8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis]))   # the result is the same
array([[ 4.,  3.],
       [ 2.,  8.]])
複製代碼

與 column_stack 類似，row_stack 函數相等於二維數組中的 vstack。通常在高於二維的狀況中，hstack 沿第二個維度堆疊、vstack 沿第一個維度堆疊，而 concatenate 更進一步能夠在任意給定的維度上堆疊兩個數組，固然這要求其它維度的長度都相等。concatenate 在不少深度模型中都有應用，例如權重矩陣的堆疊或 DenseNet 特徵圖的堆疊。

在複雜狀況中，r_ 和 c_ 能夠有效地在建立數組時幫助沿着一條軸堆疊數值，它們一樣容許使用範圍迭代「:」生成數組。

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])
複製代碼

當用數組爲參數時，r_ 和 c_ 在默認行爲下與 vstack 和 hstack 類似，但它們如 concatenate 同樣容許給定須要堆疊的維度。

拆分數組

使用 hsplit 能夠順着水平軸拆分一個數組，咱們指定切分後輸出的數組數，或指定在哪一列拆分數組：

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9.,  5.,  6.,  3.,  6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 1.,  4.,  9.,  2.,  2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])
>>> np.hsplit(a,3)   # Split a into 3
[array([[ 9.,  5.,  6.,  3.],
       [ 1.,  4.,  9.,  2.]]), array([[ 6.,  8.,  0.,  7.],
       [ 2.,  1.,  0.,  6.]]), array([[ 9.,  7.,  2.,  7.],
       [ 2.,  2.,  4.,  0.]])]
>>> np.hsplit(a,(3,4))   # Split a after the third and the fourth column
[array([[ 9.,  5.,  6.],
       [ 1.,  4.,  9.]]), array([[ 3.],
       [ 2.]]), array([[ 6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])]
複製代碼

vsplit 沿着垂直軸拆分，array_split 可指定順着哪一條軸拆分。

複製與 views

在進行數組運算或操做時，入門者常常很難判斷數據究竟是複製到了新的數組仍是直接在原始數據上修改。這對進一步的運算有很大的影響，所以有時候咱們也須要複製內容到新的變量內存中，而不能僅將新變量指向原內存。目前通常有三種複製方法，即不復制內存、淺複製以及深複製。

實際不復制

簡單的任務並不會複製數組目標或它們的數據，以下先把變量 a 賦值於 b，而後修改變量 b 就會同時修改變量 a，這種通常的賦值方法會令變量間具備關聯性。

>>> a = np.arange(12)
>>> b = a            # no new object is created
>>> b is a           # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4    # changes the shape of a
>>> a.shape
(3, 4)
複製代碼

Pythan 將不定對象做爲參照（references）傳遞，因此調用函數不會產生目標識別符的變化，也不會發生實際的內容複製。

>>> def f(x):
...     print(id(x))
...
>>> id(a)                           # id is a unique identifier of an object
148293216
>>> f(a)
148293216
複製代碼

View 或淺複製

不一樣數組對象能夠共享相同數據，view 方法能夠建立一個新數組對象來查看相同數據。以下 c 和 a 的目標識別符並不一致，且改變其中一個變量的 shape 並不會對應改變另外一個。但這兩個數組是共享全部元素的，因此改變一個數組的某個元素一樣會改變另外一個數組的對應元素。

>>> c = a.view()
>>> c is a
False
>>> c.base is a                        # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6                      # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234                      # a's data changes
>>> a
array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])
複製代碼

分割數組輸出的是它的一個 view，以下將數組 a 分割爲子數組 s，那麼 s 就是 a 的一個 view，修改 s 中的元素一樣會修改 a 中對應的元素。

>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])
複製代碼

深複製

copy 方法可完整地複製數組及數據，這種賦值方法會令兩個變量有不同的數組目標，且數據不共享。

>>> d = a.copy()                          # a new array object with new data is created
>>> d is a
False
>>> d.base is a                           # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])
複製代碼

深刻理解 NumPy

廣播機制

廣播操做是 NumPy 很是重要的一個特色，它容許 NumPy 擴展矩陣間的運算。例如它會隱式地把一個數組的異常維度調整到與另外一個算子相匹配的維度以實現維度兼容。例如將一個維度爲 [3,2] 的矩陣與另外一個維度爲 [3,1] 的矩陣相加是合法的，NumPy 會自動將第二個矩陣擴展到等同的維度。

爲了定義兩個形狀是不是可兼容的，NumPy 從最後開始往前逐個比較它們的維度大小。在這個過程當中，若是二者的對應維度相同，或者其一（或者全是）等於 1，則繼續進行比較，直到最前面的維度。若不知足這兩個條件，程序就會報錯。

以下展現了一個廣播操做：

>>>a = np.array([1.0,2.0,3.0,4.0, 5.0, 6.0]).reshape(3,2)
>>>b = np.array([3.0])
>>>a * b

array([[  3.,   6.],
       [  9.,  12.],
       [ 15.,  18.]])
複製代碼

高級索引

NumPy 比通常的 Python 序列提供更多的索引方式。除了以前看到的用整數和截取的索引，數組能夠由整數數組和布爾數組 indexed。

經過數組索引

以下咱們能夠根據數組 i 和 j 索引數組 a 中間的元素，其中輸出數組保持索引的 shape。

>>> a = np.arange(12)**2                       # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] )              # an array of indices
>>> a[i]                                       # the elements of a at the positions i
array([ 1,  1,  9, 64, 25])

>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] )      # a bidimensional array of indices
>>> a[j]                                       # the same shape as j
array([[ 9, 16],
       [81, 49]])
複製代碼

當使用多維數組做爲索引時，每個維度就會索引一次原數組，並按索引的 shape 排列。下面的代碼展現了這種索引方式，palette 能夠視爲簡單的調色板，而數組 image 中的元素則表示索引對應顏色的像素點。

>>> palette = np.array( [ [0,0,0],                # black
...                       [255,0,0],              # red
...                       [0,255,0],              # green
...                       [0,0,255],              # blue
...                       [255,255,255] ] )       # white
>>> image = np.array( [ [ 0, 1, 2, 0 ],           # each value corresponds to a color in the palette
...                     [ 0, 3, 4, 0 ]  ] )
>>> palette[image]                            # the (2,4,3) color image
array([[[  0,   0,   0],
        [255,   0,   0],
        [  0, 255,   0],
        [  0,   0,   0]],
       [[  0,   0,   0],
        [  0,   0, 255],
        [255, 255, 255],
        [  0,   0,   0]]])
       [81, 49]])
複製代碼

咱們也可使用多維索引獲取數組中的元素，多維索引的每一個維度都必須有相同的形狀。以下多維數組 i 和 j 能夠分別做爲索引 a 中第一個維度和第二個維度的參數，例如 a[i, j] 分別從 i 和 j 中抽取一個元素做爲索引 a 中元素的參數。

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array( [ [0,1],                        # indices for the first dim of a
...                 [1,2] ] )
>>> j = np.array( [ [2,1],                        # indices for the second dim
...                 [3,3] ] )
>>>
>>> a[i,j]                                     # i and j must have equal shape
array([[ 2,  5],
       [ 7, 11]])
>>>
>>> a[i,2]
array([[ 2,  6],
       [ 6, 10]])
>>>
>>> a[:,j]                                     # i.e., a[ : , j]
array([[[ 2,  1],
        [ 3,  3]],
       [[ 6,  5],
        [ 7,  7]],
       [[10,  9],
        [11, 11]]])
複製代碼

一樣，咱們把 i 和 j 放在一個序列中，而後用它做爲索引：

>>> l = [i,j]
>>> a[l]                                       # equivalent to a[i,j]
array([[ 2,  5],
       [ 7, 11]])
複製代碼

然而，咱們不能如上把 i 和 j 放在一個數組中做爲索引，由於數組會被理解爲索引 a 的第一維度。

>>> s = np.array( [i,j] )
>>> a[s]                                       # not what we want
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: index (3) out of range (0<=index<=2) in dimension 0
>>>
>>> a[tuple(s)]                                # same as a[i,j]
array([[ 2,  5],
       [ 7, 11]])
複製代碼

另外一個將數組做爲索引的經常使用方法是搜索時間序列的最大值：

>>> time = np.linspace(20, 145, 5)                 # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4)      # 4 time-dependent series
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
       [-0.53657292,  0.42016704,  0.99060736,  0.65028784],
       [-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
>>>
>>> ind = data.argmax(axis=0)                  # index of the maxima for each series
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ind]                       # times corresponding to the maxima
>>>
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>>
>>> time_max
array([  82.5 ,   20.  ,  113.75,   51.25])
>>> data_max
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])
>>>
>>> np.all(data_max == data.max(axis=0))
True
複製代碼

你也能夠用數組索引做爲一個分配目標：

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
複製代碼

然而，當索引列表中有重複時，賦值任務會執行屢次，並保留最後一次結果。

>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])
複製代碼

這是合理的，但注意若是你使用 Python 的 +=建立，可能不會得出預期的結果：

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])
複製代碼

雖然 0 在索引列表中出現兩次，第 0 個元素只會增長一次。這是由於 Python 中「a+=1」等於「a = a + 1」.

用布爾數組作索引

當咱們索引數組元素時，咱們在提供索引列表。但布爾值索引是不一樣的，咱們須要清楚地選擇被索引數組中哪一個元素是咱們想要的哪一個是不想要的。

布爾索引須要用和原數組相同 shape 的布爾值數組，以下只有在大於 4 的狀況下才輸出 True，而得出來的布爾值數組可做爲索引。

>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b                                          # b is a boolean with a's shape
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])
>>> a[b]                                       # 1d array with the selected elements
array([ 5,  6,  7,  8,  9, 10, 11])
複製代碼

這個性質在任務中很是有用，例如在 ReLu 激活函數中，只有大於 0 才輸出激活值，所以咱們就能使用這種方式實現 ReLU 激活函數。

>>> a[b] = 0                                   # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])
複製代碼

第二種使用布爾索引的方法與整數索引更加類似的；在數組的每一個維度中，咱們使用一維布爾數組選擇咱們想要的截取部分：

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True])             # first dim selection
>>> b2 = np.array([True,False,True,False])       # second dim selection
>>>
>>> a[b1,:]                                   # selecting rows
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[b1]                                     # same thing
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[:,b2]                                   # selecting columns
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
>>>
>>> a[b1,b2]                                  # a weird thing to do
array([ 4, 10])
複製代碼

注意一維布爾數組的長度必須和想截取軸的長度相同。在上面的例子中，b1 的長度三、b2 的長度爲 4，它們分別對應於 a 的第一個維度與第二個維度。

線性代數

簡單的數組運算

以下僅展現了簡單的矩陣運算更多詳細的方法可在實踐中遇到在查找 API。以下展現了矩陣的轉置、求逆、單位矩陣、矩陣乘法、矩陣的跡、解線性方程和求特徵向量等基本運算：

>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1.  2.]
 [ 3.  4.]]

>>> a.transpose()
array([[ 1.,  3.],
       [ 2.,  4.]])

>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1.,  0.],
       [ 0.,  1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> np.dot (j, j) # matrix product
array([[-1.,  0.],
       [ 0., -1.]])

>>> np.trace(u)  # trace
2.0

>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
       [ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j,  0.-1.j]), array([[ 0.70710678+0.j        ,  0.70710678-0.j        ],
       [ 0.00000000-0.70710678j,  0.00000000+0.70710678j]]))

Parameters:
    square matrix
Returns
    The eigenvalues, each repeated according to its multiplicity.
    The normalized (unit "length") eigenvectors, such that the
    column ``v[:,i]`` is the eigenvector corresponding to the
    eigenvalue ``w[i]`` .
複製代碼

原文檔連接：docs.scipy.org/doc/numpy/u…