參見維基百科NumPyhtml
NumPy
Type: modulepython
Providesshell
How to use the documentationjson
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
the NumPy homepage
http://www.scipy.org_.數組
We recommend exploring the docstrings using
IPython
http://ipython.scipy.org_, an advanced Python shell with
TAB-completion and introspection capabilities.數據結構
For some objects, np.info(obj)
may provide additional help(用來獲取函數,類,模塊的一些相關信息). This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page. Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.dom
To search for documents containing a keyword, do::ide
import numpy as np np.lookfor('keyword')
General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the doc
sub-module::函數
from numpy import doc help(doc)
Available subpackages --------------------- doc Topical documentation on broadcasting, indexing, etc. lib Basic functions used by several sub-packages. random Core Random Tools linalg Core Linear Algebra Tools fft Core FFT routines polynomial Polynomial tools testing NumPy testing tools f2py Fortran to Python Interface Generator. distutils Enhancements to distutils with support for Fortran compilers support and more. Utilities --------- test Run numpy unittests show_config Show numpy build configuration dual Overwrite certain functions with high-performance Scipy tools matlib Make everything matrices. __version__ NumPy version string
下面舉幾個例子:oop
import numpy as np help(doc) help(doc.creation) doc.basics? help(np.lib)
ndarray
預覽翻譯自Quickstart tutorial¶
NumPy的主要的對象是同類的多維數組
(homogeneous multidimensional array)。 NumPy的維度(dimensions)被稱爲軸(axes)
。 軸的數字表明rank
。
例如,在三維空間中一個座標(coordinates)爲[1, 2, 1]
的點是一維數組,axis的長度(length)是3。而
[[ 1., 0., 0.], [ 0., 1., 2.]]
的rank是 2 (此數組是2-dimensional)。它的第一個維度(dimension (axis)
)的長度是 2, 第二個維度長度是3。
NumPy的array類被稱爲ndarray
。
ndarray.ndim
: 數組的座標軸(或軸或維度)(axes (dimensions))的個數。ndarray.shape
: 數組的維度(dimensions),是由每一個維度的length
組成的整數元組。(n,m)
。ndarray.size
: 數組的元素(elements)的總數,等於shape
的元素的積。ndarray.dtype
:一個描述數組的元素的類型的對象。ndarray.itemsize
:數組的每一個元素的二進制表示的大小。 例如,元素的類型爲float64
的數組有 8 (=64/8)個itemsize
,類型爲complex32
是 itemsize 4 (=32/8)
。ndarray.data
:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.下面有一些示例:
z = np.array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) t = np.array([z, 2 * z + 1]) t
array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]], [[ 1, 3, 5, 7, 9], [11, 13, 15, 17, 19], [21, 23, 25, 27, 29]]])
print('z.ndim = ', z.ndim) print('t.ndim = ', t.ndim)
z.ndim = 2 t.ndim = 3
print('z.shape = ',z.shape) print('t.shape = ',t.shape)
z.shape = (3, 5) t.shape = (2, 3, 5)
print('z.size = ',z.size) print('t.size = ',t.size)
z.size = 15 t.size = 30
t.dtype.name
'int32'
t.itemsize
4
type(t)
numpy.ndarray
ndarray
索引z
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
z[0] # 第一行元素
array([0, 1, 2, 3, 4])
z[0, 2] # 第一行的第三個元素
2
t[0]
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
t[0][2]
array([10, 11, 12, 13, 14])
t[0, 2]
array([10, 11, 12, 13, 14])
t[0, 2, 3]
13
t[0, :2, 2:4]
array([[2, 3], [7, 8]])
e = [1, 2, 3, 4] p = [e, e] p[0][0]
1
p[0,0] # 這種語法是錯誤的
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-300-d527d1725556> in <module>() ----> 1 p[0,0] # 這種語法是錯誤的 TypeError: list indices must be integers or slices, not tuple
ndarray
支持向量化運算z
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
z.sum() # 全部元素的sum
105
z.sum(axis = 0) # sum along axis 0, i.e. column-wise sum,至關於矩陣的行向量
array([15, 18, 21, 24, 27])
z.sum(axis = 1) # 至關於矩陣的列向量
array([10, 35, 60])
z.std() # 全部元素標準差
4.3204937989385739
z.std(axis = 0)
array([ 4.0824829, 4.0824829, 4.0824829, 4.0824829, 4.0824829])
z.cumsum() # 全部元素的累積和
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105], dtype=int32)
z * 2 # 相似矩陣的數量乘法
array([[ 0, 2, 4, 6, 8], [10, 12, 14, 16, 18], [20, 22, 24, 26, 28]])
z ** 2
array([[ 0, 1, 4, 9, 16], [ 25, 36, 49, 64, 81], [100, 121, 144, 169, 196]], dtype=int32)
np.sqrt(z)
array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ], [ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ], [ 3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739]])
y = np.arange(10) # 相似 Python 的 range, 可是回傳 array y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array([1, 2, 3, 6]) b = np.linspace(0, 2, 4) # 創建一個array, 在0與2的範圍之間4等分 c = a - b c
array([ 1. , 1.33333333, 1.66666667, 4. ])
# 全域方法 a = np.linspace(-np.pi, np.pi, 100) b = np.sin(a) c = np.cos(a)
b = np.array([1,2,3,4]) a = np.array([4,5,6,7]) print('a + b = ', a + b) print('a - b = ', a - b) print('a * b = ', a * b) print('a / b = ', a / b) print('a // b = ', a // b) print('a % b = ', a % b)
a + b = [ 5 7 9 11] a - b = [3 3 3 3] a * b = [ 4 10 18 28] a / b = [ 4. 2.5 2. 1.75] a // b = [4 2 2 1] a % b = [0 1 0 3]
a = np.array(list('python')) a
array(['p', 'y', 't', 'h', 'o', 'n'], dtype='<U1')
b = np.array(list('numpy')) b
array(['n', 'u', 'm', 'p', 'y'], dtype='<U1')
a + b
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-153-f96fb8f649b6> in <module>() ----> 1 a + b TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
list(a) + list(b)
['p', 'y', 't', 'h', 'o', 'n', 'n', 'u', 'm', 'p', 'y']
from numpy.random import rand from numpy.linalg import solve, inv a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]]) a.transpose()
array([[ 1. , 3. , 5. ], [ 2. , 4. , 9. ], [ 3. , 6.7, 5. ]])
inv(a)
array([[-2.27683616, 0.96045198, 0.07909605], [ 1.04519774, -0.56497175, 0.1299435 ], [ 0.39548023, 0.05649718, -0.11299435]])
b = np.array([3, 2, 1]) solve(a, b) # 解方程式 ax = b
array([-4.83050847, 2.13559322, 1.18644068])
c = rand(3, 3) # 創建一個 3x3 隨機矩陣 c
array([[ 0.98539238, 0.62602057, 0.63592577], [ 0.84697864, 0.86223698, 0.20982139], [ 0.15532627, 0.53992238, 0.65312854]])
np.dot(a, c) # 矩陣相乘
array([[ 3.14532847, 3.97026167, 3.01495417], [ 7.38477771, 8.94448958, 7.1230241 ], [ 13.32640097, 13.58984759, 8.33366406]])
參考 np.doc.creation?
There are 5 general mechanisms for creating arrays:
import numpy as np x = np.array([2,3,1,0]) x1 = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, and types x2 = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]]) y = np.zeros((2, 3)) y1 = np.ones((2,3)) y2 = np.arange(10) y3 = np.arange(2, 10, dtype=np.float) y4 = np.arange(2, 10, 0.2) y5 = np.linspace(1., 4., 6) # 將1和4之間六等分 z = np.indices((3,3)) r = [x, x1, x2, y, y1, y2, y3, y4, y5, z] s = 'x, x1, x2, y, y1, y2, y3, y4, y5, z'.split(', ') for i in range(len(r)): print('%s = ' % s[i]) print('') print(r[i]) print(75 * '=')
x = [2 3 1 0] =========================================================================== x1 = [[ 1.+0.j 2.+0.j] [ 0.+0.j 0.+0.j] [ 1.+1.j 3.+0.j]] =========================================================================== x2 = [[ 1.+0.j 2.+0.j] [ 0.+0.j 0.+0.j] [ 1.+1.j 3.+0.j]] =========================================================================== y = [[ 0. 0. 0.] [ 0. 0. 0.]] =========================================================================== y1 = [[ 1. 1. 1.] [ 1. 1. 1.]] =========================================================================== y2 = [0 1 2 3 4 5 6 7 8 9] =========================================================================== y3 = [ 2. 3. 4. 5. 6. 7. 8. 9.] =========================================================================== y4 = [ 2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2 4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6 6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9. 9.2 9.4 9.6 9.8] =========================================================================== y5 = [ 1. 1.6 2.2 2.8 3.4 4. ] =========================================================================== z = [[[0 0 0] [1 1 1] [2 2 2]] [[0 1 2] [0 1 2] [0 1 2]]] ===========================================================================
Tips: 關於參數 order
:
order
指內存中存儲元素的順序,C
指和 C語言
類似(即行優先),F
指和 Fortran
類似(即列優先)
g = np.ones((2,3,4), dtype = 'i', order = 'C') # 還有 `np.zeros()` g
array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]], dtype=int32)
# 可將其餘數組做爲參數傳入,返回傳入數組的 `shape` 相同的全一矩陣 h = np.ones_like(g, dtype = 'float16', order = 'C') # 還有 `np.zeros_like()` h
array([[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]], [[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]], dtype=float16)
同質的
。dtype |
描述 | 示例 |
---|---|---|
t |
位域 | t4 (4位) |
b |
布爾值 | b (True 或False ) |
I |
整數 | i8 (64位) |
u |
無符號整數 | u8 (64位) |
f |
浮點數 | f8 (64位) |
c |
浮點複數 | c16 (128位) |
o |
對象 | o (指向對象的指針) |
S,a |
字符串 | S24 (24個字符) |
U |
Unicode |
U24 (24個Unicode字符) |
V |
其餘 | V12 (12字節數據塊) |
容許咱們至少在每列上使用不一樣的NumPy數據類型。
np.info(np.dtype)
dtype() dtype(obj, align=False, copy=False) Create a data type object. A numpy array is homogeneous, and contains elements described by a dtype object. A dtype object can be constructed from different combinations of fundamental numeric types. Parameters ---------- obj Object to be converted to a data type object. align : bool, optional Add padding to the fields to match what a C compiler would output for a similar C-struct. Can be ``True`` only if `obj` is a dictionary or a comma-separated string. If a struct dtype is being created, this also sets a sticky alignment flag ``isalignedstruct``. copy : bool, optional Make a new copy of the data-type object. If ``False``, the result may just be a reference to a built-in data-type object. See also -------- result_type Examples -------- Using array-scalar type: >>> np.dtype(np.int16) dtype('int16') Structured type, one field name 'f1', containing int16: >>> np.dtype([('f1', np.int16)]) dtype([('f1', '<i2')]) Structured type, one field named 'f1', in itself containing a structured type with one field: >>> np.dtype([('f1', [('f1', np.int16)])]) dtype([('f1', [('f1', '<i2')])]) Structured type, two fields: the first field contains an unsigned int, the second an int32: >>> np.dtype([('f1', np.uint), ('f2', np.int32)]) dtype([('f1', '<u4'), ('f2', '<i4')]) Using array-protocol type strings: >>> np.dtype([('a','f8'),('b','S10')]) dtype([('a', '<f8'), ('b', '|S10')]) Using comma-separated field formats. The shape is (2,3): >>> np.dtype("i4, (2,3)f8") dtype([('f0', '<i4'), ('f1', '<f8', (2, 3))]) Using tuples. ``int`` is a fixed type, 3 the field's shape. ``void`` is a flexible type, here of size 10: >>> np.dtype([('hello',(np.int,3)),('world',np.void,10)]) dtype([('hello', '<i4', 3), ('world', '|V10')]) Subdivide ``int16`` into 2 ``int8``'s, called x and y. 0 and 1 are the offsets in bytes: >>> np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)})) dtype(('<i2', [('x', '|i1'), ('y', '|i1')])) Using dictionaries. Two fields named 'gender' and 'age': >>> np.dtype({'names':['gender','age'], 'formats':['S1',np.uint8]}) dtype([('gender', '|S1'), ('age', '|u1')]) Offsets in bytes, here 0 and 25: >>> np.dtype({'surname':('S25',0),'age':(np.uint8,25)}) dtype([('surname', '|S25'), ('age', '|u1')]) Methods: newbyteorder -- newbyteorder(new_order='S')
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'), ('Height', 'f'), ('Children/Pets', 'i4', 2)]) s = np.array([('Smith', 45, 1.83, (0, 1)), ('Jones', 53, 1.72, (2, 2))], dtype=dt) s
array([(b'Smith', 45, 1.83000004, [0, 1]), (b'Jones', 53, 1.72000003, [2, 2])], dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])
s['Name']
array([b'Smith', b'Jones'], dtype='|S10')
s['Age']
array([45, 53])
s["Height"].mean()
1.7750001
s[1]
(b'Jones', 53, 1.72000003, [2, 2])
s[1]['Age']
53
r = np.array([[1,2,3],[2,3,4],[3,4,5],[4,5,6]]) s = np.array([[2,3,4],[3,4,5],[4,5,6],[6,7,8]])
r + s
array([[ 3, 5, 7], [ 5, 7, 9], [ 7, 9, 11], [10, 12, 14]])
r * s
array([[ 2, 6, 12], [ 6, 12, 20], [12, 20, 30], [24, 35, 48]])
r % s
array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]], dtype=int32)
s // r
array([[2, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=int32)
更多內容參考http://www.cnblogs.com/lyon2014/p/4696989.html
r
array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]])
2 * r + 3
array([[ 5, 7, 9], [ 7, 9, 11], [ 9, 11, 13], [11, 13, 15]])
f = np.array([9,8,7]) f
array([9, 8, 7])
r + f
array([[10, 10, 10], [11, 11, 11], [12, 12, 12], [13, 13, 13]])
# r.transpose() 轉置 np.shape(r.T)
(3, 4)
def f(x): return 3 * x + 5
f(r.T)
array([[ 8, 11, 14, 17], [11, 14, 17, 20], [14, 17, 20, 23]])
np.sin(r)
array([[ 0.84147098, 0.90929743, 0.14112001], [ 0.90929743, 0.14112001, -0.7568025 ], [ 0.14112001, -0.7568025 , -0.95892427], [-0.7568025 , -0.95892427, -0.2794155 ]])
np.sin(np.pi)
1.2246467991473532e-16
ufunc
http://docs.scipy.org/doc/numpy/reference/ufuncs.html
x = np.random.standard_normal((5, 10000000)) y = 2 * x + 3 # linear equation y = a * x + b C = np.array((x, y), order='C') F = np.array((x, y), order='F') x = 0.0; y = 0.0 # memory clean-up
C[:2].round(2)
array([[[ 0.67, 0.29, 1.54, ..., 0.07, 2.64, -0.65], [ 0.4 , -0.63, 1.43, ..., 1.11, 0.93, -0.52], [-0.41, 2.23, -1.16, ..., -1.66, 0.07, 0.21], [ 1.46, 1.22, 0.2 , ..., -0.56, 2.36, -1.65], [-0.39, 1.73, -0.24, ..., -1.45, 0.43, -0.41]], [[ 4.34, 3.58, 6.08, ..., 3.15, 8.28, 1.69], [ 3.79, 1.73, 5.86, ..., 5.22, 4.87, 1.97], [ 2.17, 7.46, 0.67, ..., -0.32, 3.15, 3.42], [ 5.93, 5.44, 3.4 , ..., 1.89, 7.72, -0.3 ], [ 2.22, 6.46, 2.51, ..., 0.1 , 3.85, 2.18]]])
%timeit C.sum()
135 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum()
134 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
加總數組元素時,兩種內存佈局沒有顯著差別。可是,考慮如下狀況便會有顯著的差別。
%timeit C[0].sum(axis=0)
128 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit C[0].sum(axis=1)
66.5 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum(axis=0)
1.06 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit F.sum(axis=1)
2.12 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
F = 0.0; C = 0.0 # memory clean-up
從上面能夠看出:
在少許大型的向量上的操做比在大量小型向量上性能好。
少許大型向量的元素保存在相鄰的內存位置上,這能夠解釋相對的性能優點。
可是,與類C語言變種相比,總體操做要慢得多。
選擇合適的內存佈局,可將代碼執行速度提升2個以上的數量級。
值得參考的資料: