NumPy

參見維基百科NumPyhtml

NumPy

Type: modulepython


Providesshell

  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentationjson


Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
the NumPy homepage http://www.scipy.org_.數組

We recommend exploring the docstrings using
IPython http://ipython.scipy.org_, an advanced Python shell with
TAB-completion and introspection capabilities.數據結構

For some objects, np.info(obj) may provide additional help(用來獲取函數,類,模塊的一些相關信息). This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page. Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.dom

To search for documents containing a keyword, do::ide

import numpy as np
np.lookfor('keyword')

General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the doc sub-module::函數

from numpy import doc
help(doc)
Available subpackages

---------------------
doc
    Topical documentation on broadcasting, indexing, etc.
lib
    Basic functions used by several sub-packages.
random
    Core Random Tools
linalg
    Core Linear Algebra Tools
fft
    Core FFT routines
polynomial
    Polynomial tools
testing
    NumPy testing tools
f2py
    Fortran to Python Interface Generator.
distutils
    Enhancements to distutils with support for
    Fortran compilers support and more.

Utilities

---------
test
    Run numpy unittests
show_config
    Show numpy build configuration
dual
    Overwrite certain functions with high-performance Scipy tools
matlib
    Make everything matrices.
__version__
    NumPy version string

下面舉幾個例子:oop

import numpy as np
help(doc)   

help(doc.creation)

doc.basics?

help(np.lib)

ndarray預覽

翻譯自Quickstart tutorial¶
NumPy的主要的對象是同類的多維數組(homogeneous multidimensional array)。 NumPy的維度(dimensions)被稱爲軸(axes)。 軸的數字表明rank

例如,在三維空間中一個座標(coordinates)爲[1, 2, 1]的點是一維數組,axis的長度(length)是3。而

[[ 1., 0., 0.],
 [ 0., 1., 2.]]

的rank是 2 (此數組是2-dimensional)。它的第一個維度(dimension (axis) )的長度是 2, 第二個維度長度是3。

NumPy的array類被稱爲ndarray

  • ndarray.ndim: 數組的座標軸(或軸或維度)(axes (dimensions))的個數。
  • ndarray.shape: 數組的維度(dimensions),是由每一個維度的length組成的整數元組。
    對於一個n行m列的矩陣(matrix), shape即是(n,m)
  • ndarray.size: 數組的元素(elements)的總數,等於shape的元素的積。
  • ndarray.dtype:一個描述數組的元素的類型的對象。
  • ndarray.itemsize:數組的每一個元素的二進制表示的大小。 例如,元素的類型爲float64的數組有 8 (=64/8)個itemsize,類型爲complex32itemsize 4 (=32/8)
  • ndarray.data:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

下面有一些示例:

z = np.array([[ 0,  1,  2,  3,  4],
              [ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14]])
t = np.array([z, 2 * z + 1])
t
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[ 1,  3,  5,  7,  9],
        [11, 13, 15, 17, 19],
        [21, 23, 25, 27, 29]]])
print('z.ndim = ', z.ndim)
print('t.ndim = ', t.ndim)
z.ndim =  2
t.ndim =  3
print('z.shape = ',z.shape)
print('t.shape = ',t.shape)
z.shape =  (3, 5)
t.shape =  (2, 3, 5)
print('z.size = ',z.size)
print('t.size = ',t.size)
z.size =  15
t.size =  30
t.dtype.name
'int32'
t.itemsize
4
type(t)
numpy.ndarray

ndarray索引

z
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
z[0]  # 第一行元素
array([0, 1, 2, 3, 4])
z[0, 2] # 第一行的第三個元素
2
t[0]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
t[0][2]
array([10, 11, 12, 13, 14])
t[0, 2]
array([10, 11, 12, 13, 14])
t[0, 2, 3]
13
t[0, :2, 2:4]
array([[2, 3],
       [7, 8]])

對於列表

e = [1, 2, 3, 4]
p = [e, e]
p[0][0]
1
p[0,0]  # 這種語法是錯誤的
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-300-d527d1725556> in <module>()
----> 1 p[0,0]  # 這種語法是錯誤的


TypeError: list indices must be integers or slices, not tuple

ndarray支持向量化運算

做用於每一個元素的運算

z
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
z.sum()  # 全部元素的sum
105
z.sum(axis = 0)    # sum along axis 0, i.e. column-wise sum,至關於矩陣的行向量
array([15, 18, 21, 24, 27])
z.sum(axis = 1)   # 至關於矩陣的列向量
array([10, 35, 60])
z.std()  # 全部元素標準差
4.3204937989385739
z.std(axis = 0)
array([ 4.0824829,  4.0824829,  4.0824829,  4.0824829,  4.0824829])
z.cumsum()  # 全部元素的累積和
array([  0,   1,   3,   6,  10,  15,  21,  28,  36,  45,  55,  66,  78,
        91, 105], dtype=int32)
z * 2   # 相似矩陣的數量乘法
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])
z ** 2
array([[  0,   1,   4,   9,  16],
       [ 25,  36,  49,  64,  81],
       [100, 121, 144, 169, 196]], dtype=int32)
np.sqrt(z)
array([[ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ],
       [ 2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ],
       [ 3.16227766,  3.31662479,  3.46410162,  3.60555128,  3.74165739]])
y = np.arange(10)  # 相似 Python 的 range, 可是回傳 array
y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array([1, 2, 3, 6])
b = np.linspace(0, 2, 4)  # 創建一個array, 在0與2的範圍之間4等分
c = a - b
c
array([ 1.        ,  1.33333333,  1.66666667,  4.        ])
# 全域方法
a = np.linspace(-np.pi, np.pi, 100) 
b = np.sin(a)
c = np.cos(a)
b = np.array([1,2,3,4])
a = np.array([4,5,6,7])
print('a + b = ', a + b)
print('a - b = ', a - b)
print('a * b = ', a * b)
print('a / b = ', a / b)
print('a // b = ', a // b)
print('a % b = ', a % b)
a + b =  [ 5  7  9 11]
a - b =  [3 3 3 3]
a * b =  [ 4 10 18 28]
a / b =  [ 4.    2.5   2.    1.75]
a // b =  [4 2 2 1]
a % b =  [0 1 0 3]

對於非數值型數組

a = np.array(list('python'))
a
array(['p', 'y', 't', 'h', 'o', 'n'],
      dtype='<U1')
b = np.array(list('numpy'))
b
array(['n', 'u', 'm', 'p', 'y'],
      dtype='<U1')
a + b
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-153-f96fb8f649b6> in <module>()
----> 1 a + b


TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
list(a) + list(b)
['p', 'y', 't', 'h', 'o', 'n', 'n', 'u', 'm', 'p', 'y']

線性代數

from numpy.random import rand
from numpy.linalg import solve, inv
a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
a.transpose()
array([[ 1. ,  3. ,  5. ],
       [ 2. ,  4. ,  9. ],
       [ 3. ,  6.7,  5. ]])
inv(a)
array([[-2.27683616,  0.96045198,  0.07909605],
       [ 1.04519774, -0.56497175,  0.1299435 ],
       [ 0.39548023,  0.05649718, -0.11299435]])
b =  np.array([3, 2, 1])
solve(a, b)  # 解方程式 ax = b
array([-4.83050847,  2.13559322,  1.18644068])
c = rand(3, 3)  # 創建一個 3x3 隨機矩陣
c
array([[ 0.98539238,  0.62602057,  0.63592577],
       [ 0.84697864,  0.86223698,  0.20982139],
       [ 0.15532627,  0.53992238,  0.65312854]])
np.dot(a, c)  # 矩陣相乘
array([[  3.14532847,   3.97026167,   3.01495417],
       [  7.38477771,   8.94448958,   7.1230241 ],
       [ 13.32640097,  13.58984759,   8.33366406]])

數組的建立

參考 np.doc.creation?
There are 5 general mechanisms for creating arrays:

  1. Conversion from other Python structures (e.g., lists, tuples)
  2. Intrinsic numpy array array creation objects (e.g., arange, ones, zeros,
    etc.)
  3. Reading arrays from disk, either from standard or custom formats
  4. Creating arrays from raw bytes through the use of strings or buffers
  5. Use of special library functions (e.g., random)
import numpy as np
x = np.array([2,3,1,0])
x1 = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, and types
x2 = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]])

y = np.zeros((2, 3))
y1 = np.ones((2,3))
y2 = np.arange(10)
y3 = np.arange(2, 10, dtype=np.float)
y4 = np.arange(2, 10, 0.2)
y5 = np.linspace(1., 4., 6)  # 將1和4之間六等分

z = np.indices((3,3))

r = [x, x1, x2, y, y1, y2, y3, y4, y5, z]
s = 'x, x1, x2, y, y1, y2, y3, y4, y5, z'.split(', ')

for i in range(len(r)):
    print('%s =  ' % s[i])
    print('')
    print(r[i])
    print(75 * '=')
x =  

[2 3 1 0]
===========================================================================
x1 =  

[[ 1.+0.j  2.+0.j]
 [ 0.+0.j  0.+0.j]
 [ 1.+1.j  3.+0.j]]
===========================================================================
x2 =  

[[ 1.+0.j  2.+0.j]
 [ 0.+0.j  0.+0.j]
 [ 1.+1.j  3.+0.j]]
===========================================================================
y =  

[[ 0.  0.  0.]
 [ 0.  0.  0.]]
===========================================================================
y1 =  

[[ 1.  1.  1.]
 [ 1.  1.  1.]]
===========================================================================
y2 =  

[0 1 2 3 4 5 6 7 8 9]
===========================================================================
y3 =  

[ 2.  3.  4.  5.  6.  7.  8.  9.]
===========================================================================
y4 =  

[ 2.   2.2  2.4  2.6  2.8  3.   3.2  3.4  3.6  3.8  4.   4.2  4.4  4.6  4.8
  5.   5.2  5.4  5.6  5.8  6.   6.2  6.4  6.6  6.8  7.   7.2  7.4  7.6  7.8
  8.   8.2  8.4  8.6  8.8  9.   9.2  9.4  9.6  9.8]
===========================================================================
y5 =  

[ 1.   1.6  2.2  2.8  3.4  4. ]
===========================================================================
z =  

[[[0 0 0]
  [1 1 1]
  [2 2 2]]

 [[0 1 2]
  [0 1 2]
  [0 1 2]]]
===========================================================================

Tips: 關於參數 order:

order 指內存中存儲元素的順序,C 指和 C語言 類似(即行優先),F 指和 Fortran 類似(即列優先)

g = np.ones((2,3,4), dtype = 'i', order = 'C')  # 還有 `np.zeros()`
g
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)
# 可將其餘數組做爲參數傳入,返回傳入數組的 `shape` 相同的全一矩陣
h = np.ones_like(g, dtype = 'float16', order = 'C')  # 還有 `np.zeros_like()`
h
array([[[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]],

       [[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]]], dtype=float16)

注意事項:

  1. 數組的組成/長度/大小在任何維度內都是同質的
  2. 整個數組只容許一種數據類型(numpy.dtype)。

NumPy dtype對象

dtype 描述 示例
t 位域 t4(4位)
b 布爾值 b(TrueFalse)
I 整數 i8(64位)
u 無符號整數 u8(64位)
f 浮點數 f8(64位)
c 浮點複數 c16(128位)
o 對象 o(指向對象的指針)
S,a 字符串 S24(24個字符)
U Unicode U24(24個Unicode字符)
V 其餘 V12(12字節數據塊)

結構數組

容許咱們至少在每列上使用不一樣的NumPy數據類型。

np.info(np.dtype)
dtype()

dtype(obj, align=False, copy=False)

Create a data type object.

A numpy array is homogeneous, and contains elements described by a
dtype object. A dtype object can be constructed from different
combinations of fundamental numeric types.

Parameters
----------
obj
    Object to be converted to a data type object.
align : bool, optional
    Add padding to the fields to match what a C compiler would output
    for a similar C-struct. Can be ``True`` only if `obj` is a dictionary
    or a comma-separated string. If a struct dtype is being created,
    this also sets a sticky alignment flag ``isalignedstruct``.
copy : bool, optional
    Make a new copy of the data-type object. If ``False``, the result
    may just be a reference to a built-in data-type object.

See also
--------
result_type

Examples
--------
Using array-scalar type:

>>> np.dtype(np.int16)
dtype('int16')

Structured type, one field name 'f1', containing int16:

>>> np.dtype([('f1', np.int16)])
dtype([('f1', '<i2')])

Structured type, one field named 'f1', in itself containing a structured
type with one field:

>>> np.dtype([('f1', [('f1', np.int16)])])
dtype([('f1', [('f1', '<i2')])])

Structured type, two fields: the first field contains an unsigned int, the
second an int32:

>>> np.dtype([('f1', np.uint), ('f2', np.int32)])
dtype([('f1', '<u4'), ('f2', '<i4')])

Using array-protocol type strings:

>>> np.dtype([('a','f8'),('b','S10')])
dtype([('a', '<f8'), ('b', '|S10')])

Using comma-separated field formats.  The shape is (2,3):

>>> np.dtype("i4, (2,3)f8")
dtype([('f0', '<i4'), ('f1', '<f8', (2, 3))])

Using tuples.  ``int`` is a fixed type, 3 the field's shape.  ``void``
is a flexible type, here of size 10:

>>> np.dtype([('hello',(np.int,3)),('world',np.void,10)])
dtype([('hello', '<i4', 3), ('world', '|V10')])

Subdivide ``int16`` into 2 ``int8``'s, called x and y.  0 and 1 are
the offsets in bytes:

>>> np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))
dtype(('<i2', [('x', '|i1'), ('y', '|i1')]))

Using dictionaries.  Two fields named 'gender' and 'age':

>>> np.dtype({'names':['gender','age'], 'formats':['S1',np.uint8]})
dtype([('gender', '|S1'), ('age', '|u1')])

Offsets in bytes, here 0 and 25:

>>> np.dtype({'surname':('S25',0),'age':(np.uint8,25)})
dtype([('surname', '|S25'), ('age', '|u1')])


Methods:

  newbyteorder  --  newbyteorder(new_order='S')
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'),
               ('Height', 'f'), ('Children/Pets', 'i4', 2)])
s = np.array([('Smith', 45, 1.83, (0, 1)),
              ('Jones', 53, 1.72, (2, 2))], dtype=dt)
s
array([(b'Smith', 45,  1.83000004, [0, 1]),
       (b'Jones', 53,  1.72000003, [2, 2])],
      dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])
s['Name']
array([b'Smith', b'Jones'],
      dtype='|S10')
s['Age']
array([45, 53])
s["Height"].mean()
1.7750001
s[1]
(b'Jones', 53,  1.72000003, [2, 2])
s[1]['Age']
53

代碼向量化

r = np.array([[1,2,3],[2,3,4],[3,4,5],[4,5,6]])
s = np.array([[2,3,4],[3,4,5],[4,5,6],[6,7,8]])

簡單的數學運算

r + s
array([[ 3,  5,  7],
       [ 5,  7,  9],
       [ 7,  9, 11],
       [10, 12, 14]])
r * s
array([[ 2,  6, 12],
       [ 6, 12, 20],
       [12, 20, 30],
       [24, 35, 48]])
r % s
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]], dtype=int32)
s // r
array([[2, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int32)

支持廣播

更多內容參考http://www.cnblogs.com/lyon2014/p/4696989.html

r
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])
2 * r + 3
array([[ 5,  7,  9],
       [ 7,  9, 11],
       [ 9, 11, 13],
       [11, 13, 15]])
f = np.array([9,8,7])
f
array([9, 8, 7])
r + f
array([[10, 10, 10],
       [11, 11, 11],
       [12, 12, 12],
       [13, 13, 13]])
# r.transpose() 轉置
np.shape(r.T)
(3, 4)
def f(x):
    return 3 * x + 5
f(r.T)
array([[ 8, 11, 14, 17],
       [11, 14, 17, 20],
       [14, 17, 20, 23]])
np.sin(r)
array([[ 0.84147098,  0.90929743,  0.14112001],
       [ 0.90929743,  0.14112001, -0.7568025 ],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.7568025 , -0.95892427, -0.2794155 ]])
np.sin(np.pi)
1.2246467991473532e-16

ufunc

http://docs.scipy.org/doc/numpy/reference/ufuncs.html

Memory Layout(內存佈局)

x = np.random.standard_normal((5, 10000000))
y = 2 * x + 3  # linear equation y = a * x + b
C = np.array((x, y), order='C')
F = np.array((x, y), order='F')
x = 0.0; y = 0.0  # memory clean-up
C[:2].round(2)
array([[[ 0.67,  0.29,  1.54, ...,  0.07,  2.64, -0.65],
        [ 0.4 , -0.63,  1.43, ...,  1.11,  0.93, -0.52],
        [-0.41,  2.23, -1.16, ..., -1.66,  0.07,  0.21],
        [ 1.46,  1.22,  0.2 , ..., -0.56,  2.36, -1.65],
        [-0.39,  1.73, -0.24, ..., -1.45,  0.43, -0.41]],

       [[ 4.34,  3.58,  6.08, ...,  3.15,  8.28,  1.69],
        [ 3.79,  1.73,  5.86, ...,  5.22,  4.87,  1.97],
        [ 2.17,  7.46,  0.67, ..., -0.32,  3.15,  3.42],
        [ 5.93,  5.44,  3.4 , ...,  1.89,  7.72, -0.3 ],
        [ 2.22,  6.46,  2.51, ...,  0.1 ,  3.85,  2.18]]])
%timeit C.sum()
135 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum()
134 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

加總數組元素時,兩種內存佈局沒有顯著差別。可是,考慮如下狀況便會有顯著的差別。

%timeit C[0].sum(axis=0)
128 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit C[0].sum(axis=1)
66.5 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum(axis=0)
1.06 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit F.sum(axis=1)
2.12 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
F = 0.0; C = 0.0  # memory clean-up

從上面能夠看出:
在少許大型的向量上的操做比在大量小型向量上性能好。
少許大型向量的元素保存在相鄰的內存位置上,這能夠解釋相對的性能優點。
可是,與類C語言變種相比,總體操做要慢得多。

選擇合適的內存佈局,可將代碼執行速度提升2個以上的數量級。

結語:

  1. 基本數據類型(整數,浮點數,字符串)提供了原始數據類型。
  2. 標準數據結構(元組,列表,字典,集合類)提供了對數據集的各類操做。
  3. 數組(numpy.ndarray類)提供了代碼的向量化操做,使得代碼變得更加簡潔、方便、高性能。

值得參考的資料:

相關文章
相關標籤/搜索
本站公眾號
   歡迎關注本站公眾號,獲取更多信息