Numpy學習

前言

最近在DL裏被各類矩陣計算虐得很渣,決定學習一波numpypython

正文

一、先學習一下用numpy打開txt文件

import numpy

world_alcohol = numpy.genfromtxt("world_alcohol.txt",delimiter=",",dtype=str)
print(type(world_alcohol))
print(world_alcohol)
print(help(numpy.genfromtxt))

#
<class 'numpy.ndarray'>
[['Year' 'WHO region' 'Country' 'Beverage Types' 'Display Value']
 ['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ...
 ['1987' 'Africa' 'Malawi' 'Other' '0.75']
 ['1989' 'Americas' 'Bahamas' 'Wine' '1.5']
 ['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]

txt內容以下:數組

Year,WHO region,Country,Beverage Types,Display Value
1986,Western Pacific,Viet Nam,Wine,0
1986,Americas,Uruguay,Other,0.5
1985,Africa,Cte d'Ivoire,Wine,1.62
1986,Americas,Colombia,Beer,4.27
1987,Americas,Saint Kitts and Nevis,Beer,1.98
1987,Americas,Guatemala,Other,0
1987,Africa,Mauritius,Wine,0.13
1985,Africa,Angola,Spirits,0.39
1986,Americas,Antigua and Barbuda,Spirits,1.55
1984,Africa,Nigeria,Other,6.1
1987,Africa,Botswana,Wine,0.2
1989,Americas,Guatemala,Beer,0.62

參數delimiter表示分割符號;dtype=str表示從文件中提取的數據以str格式存放;數據結構

ndarray是numpy中的數據結構;dom

2.ndarray

vector = numpy.array([5,10,15,20])
matrix = numpy.array([[5,10,15],[20,25,30],[35,40,45]])
print(vector)
print(matrix)

#
[ 5 10 15 20]
[[ 5 10 15]
 [20 25 30]
 [35 40 45]]

 看一下維度函數

vector1 = numpy.array([[1,2,3,4]])
vector2 = numpy.array([1,2,3,4])
print(vector1.shape)
print(vector2.shape)
matrix = numpy.array([[5,10,15],[20,25,30]])
print(matrix.shape)

#
(1, 4)
(4,)
(2, 3)

vector1能夠看做是1×4的矩陣;vector2是一維數組,能夠看做是列向量學習

再看ndarray的運算

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([1,2,3])
c = np.array([[1,2,3]])
print(np.dot(a,b))
print('------------------')
print(np.dot(b,a))
print('------------------')
print(np.dot(c,a))
print('------------------')
print(np.dot(b,b))
print('------------------')
print(a*a)

#
[14 32 50]
------------------
[30 36 42]
------------------
[[30 36 42]]
------------------
14
------------------
[[ 1  4  9]
 [16 25 36]
 [49 64 81]]

a是3×3的矩陣,b是一個長度爲3的一維數組,c是1×3的矩陣spa

np.dot(a,b)能夠看做是3×3的矩陣和3×1的矩陣相乘code

np.dot(b,a)看做是1×3的矩陣和3×3的矩陣相乘orm

np.dot(c,a)看做是矩陣和矩陣相乘blog

注意,np.dot(b,b)是向量內積,是把各個維度都加起來

a*a是矩陣對應位置上各個元素的乘積

 ——————分割線——————————————

numbers = numpy.array([1,2,3,'4'])
print(numbers)
numbers.dtype

#

['1' '2' '3' '4']

只要array中有一個元素不是int32或者int64,那麼全部的元素數據類型就會變成其餘的數據類型

對txt文件進行操做

 1 Year,WHO region,Country,Beverage Types,Display Value
 2 1986,Western Pacific,Viet Nam,Wine,0
 3 1986,Americas,Uruguay,Other,0.5
 4 1985,Africa,Cte d'Ivoire,Wine,1.62
 5 1986,Americas,Colombia,Beer,4.27
 6 1987,Americas,Saint Kitts and Nevis,Beer,1.98
 7 1987,Americas,Guatemala,Other,0
 8 1987,Africa,Mauritius,Wine,0.13
 9 1985,Africa,Angola,Spirits,0.39
10 1986,Americas,Antigua and Barbuda,Spirits,1.55
11 1984,Africa,Nigeria,Other,6.1
12 1987,Africa,Botswana,Wine,0.2
13 1989,Americas,Guatemala,Beer,0.62

①去掉表頭,取出第二行,第五列的值以及第三行,第三列的值

world_alcohol = np.genfromtxt("world_alcohol.txt",delimiter=",",dtype=str,skip_header=1)
uruguay_other_1986 = world_alcohol[1,4]
third_country = world_alcohol[2,2]
print(uruguay_other_1986)
print(third_country)

#
0.5
Cte d'Ivoire

 補充一下切片操做:

vector = numpy.array([5,10,15,20])
print(vector[0:3])

#
[ 5 10 15]

取出第0-2個元素

 ——————分割線——————————————

matrix = numpy.array([
                    [5,10,15],
                    [20,25,30],
                    [35,40,45]
                     ])
print(matrix[:,1])

#
[10 25 40]

取出第二列全部元素 

 ——————分割線——————————————

matrix = numpy.array([
                    [5,10,15],
                    [20,25,30],
                    [35,40,45]
                     ])
print(matrix[:,0:2])

#
[[ 5 10]
 [20 25]
 [35 40]]

取出前兩列元素

判斷

vector = np.array([5.10,15,20])
vector == 10

#
array([False, False, False])

判斷當前數據結構中有沒有該元素

也能夠借bool值來進行索引

vector = np.array([5,10,15,20])
equal_to_ten = (vector == 10)
print(equal_to_ten)
print(vector[equal_to_ten])

#
[False  True False False]
[10]

 固然,在矩陣中就是以下:

matrix = np.array([
                    [5,10,15],
                    [20,25,30],
                    [35,40,45]
                     ])
second_column_25 = (matrix[:,1]==25)
print(second_column_25)
print(matrix[second_column_25, :])

#
[False  True False]
[[20 25 30]]

對第二列的元素進行判斷;而後取出元素等於25的那一列所在行

——————————-分割線——————————————

接下去是數值類型轉換

vector = numpy.array(["1","2","3"])
print(vector.dtype)
print(vector)
vector = vector.astype(float)
print(vector.dtype)
print(vector)

#
<U1
['1' '2' '3']
float64
[1. 2. 3.]

從str類型轉到float類型

關於sum求和操做

matrix = np.array([
                    [5,10,15],
                    [20,25,30],
                    [35,40,45]
                     ])
print(matrix.sum(axis=1))
print(matrix.sum(axis=0))
[ 30  75 120]
[60 75 90]

axis=1表示按行求和;axis=0表示按列求和

 Numpy經常使用函數

①矩陣的屬性

print(np.arange(15))
a = np.arange(15).reshape(3,5)
print(a)
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.size)

#

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)
2
int32
15

np.arange,依次生成0-14的一維數組;

reshape(3,5),轉換爲3×5的矩陣;

ndim看看維度是多少

dtype.name,看看元素數據類型

size看看一共有多少元素 

②矩陣的初始化

a = np.zeros((3,4))
a

#
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
 

表示生成3行,四列的全0矩陣,其中(3,4)能夠是[3,4]格式;

默認dtype=np.float64

——————————————————分割線——————————

np.ones((2,3,4), dtype=np.int32)

#
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

表示生成2個3×4的全1矩陣

——————————————————分割線——————————

np.arange(10,30,5)
np.arange(0, 2, 0.3)
np.arange(12).reshape(4,3)

#
[10 15 20 25]
[0.  0.3 0.6 0.9 1.2 1.5 1.8]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

從10開始,每隔5取一個數,左閉右開

——————————————————分割線——————————

np.random.random((2,3))

#
array([[0.04286104, 0.02303494, 0.35769307],
       [0.34801234, 0.63580499, 0.99897693]])

表示隨機生成2×3的矩陣

——————————————————分割線——————————

from numpy import pi
np.linspace(0, 2*pi, 100)

#
array([0.        , 0.06346652, 0.12693304, 0.19039955, 0.25386607,
       0.31733259, 0.38079911, 0.44426563, 0.50773215, 0.57119866,
       0.63466518, 0.6981317 , 0.76159822, 0.82506474, 0.88853126,
       0.95199777, 1.01546429, 1.07893081, 1.14239733, 1.20586385,
       1.26933037, 1.33279688, 1.3962634 , 1.45972992, 1.52319644,
       1.58666296, 1.65012947, 1.71359599, 1.77706251, 1.84052903,
       1.90399555, 1.96746207, 2.03092858, 2.0943951 , 2.15786162,
       2.22132814, 2.28479466, 2.34826118, 2.41172769, 2.47519421,
       2.53866073, 2.60212725, 2.66559377, 2.72906028, 2.7925268 ,
       2.85599332, 2.91945984, 2.98292636, 3.04639288, 3.10985939,
       3.17332591, 3.23679243, 3.30025895, 3.36372547, 3.42719199,
       3.4906585 , 3.55412502, 3.61759154, 3.68105806, 3.74452458,
       3.8079911 , 3.87145761, 3.93492413, 3.99839065, 4.06185717,
       4.12532369, 4.1887902 , 4.25225672, 4.31572324, 4.37918976,
       4.44265628, 4.5061228 , 4.56958931, 4.63305583, 4.69652235,
       4.75998887, 4.82345539, 4.88692191, 4.95038842, 5.01385494,
       5.07732146, 5.14078798, 5.2042545 , 5.26772102, 5.33118753,
       5.39465405, 5.45812057, 5.52158709, 5.58505361, 5.64852012,
       5.71198664, 5.77545316, 5.83891968, 5.9023862 , 5.96585272,
       6.02931923, 6.09278575, 6.15625227, 6.21971879, 6.28318531])

表示從0開始,到2π,按均勻分佈取100個值

——————————————分割線—————————

a = np.array([20,30,40,50])
b = np.arange(4)
print(a)
print(b)
c = a-b
print(c)
c = c-1
print(c)
print(b**2)
print(a<35)

#
[20 30 40 50]
[0 1 2 3]
[20 29 38 47]
[19 28 37 46]
[0 1 4 9]
[ True  True False False]

a-b就是每一個對應位置元素相減

———————————————分割線———————

A = np.array([[1,1],
             [0,1]])
B = np.array([[2,0],
             [3,4]])
print(A)
print('------------')
print(B)
print('------------')
print(A*B)
print('------------')
print(A.dot(B))
print('------------')
print(np.dot(A, B))

#
[[1 1]
 [0 1]]
------------
[[2 0]
 [3 4]]
------------
[[2 0]
 [0 4]]
------------
[[5 4]
 [3 4]]
------------
[[5 4]
 [3 4]]

前文也詳細寫了numpy不一樣乘法的區別

③矩陣經常使用操做

B = np.arange(3)
print(B)
print(np.exp(B))
print(np.sqrt(B))

#
[0 1 2]
[1.         2.71828183 7.3890561 ]
[0.         1.         1.41421356]
 

exp表示e的x次冪;sqrt表示開平方

———————————————分割線———————

a = np.floor(10*np.random.random((3,4)))
print(a)
print('--------------')
print(a.ravel())
print('--------------')
a.shape = (6,2)
print(a)
print('--------------')
print(a.T)

#
[[8. 3. 6. 2.]
 [0. 8. 7. 3.]
 [5. 2. 0. 7.]]
--------------
[8. 3. 6. 2. 0. 8. 7. 3. 5. 2. 0. 7.]
--------------
[[8. 3.]
 [6. 2.]
 [0. 8.]
 [7. 3.]
 [5. 2.]
 [0. 7.]]
--------------
[[8. 6. 0. 7. 5. 0.]
 [3. 2. 8. 3. 2. 7.]]

floor表示向下取整,也就是往小的數取整;

a.ravel()表示把矩陣拉成一個向量;

a.shape=(6,2) 至關於a = a.reshape(6,2)

也能夠是a.shape=(6,-1)或者a = a.reshape(6,-1)意思是指定一個維度,另外一個維度本身去算

a.T也就是求轉置

—————————————分割線————————

a = np.floor(10*np.random.random((2,2)))
b = np.floor(10*np.random.random((2,2)))
print(a)
print('-----')
print(b)
print('-----')
print(np.hstack((a,b)))
print('-----')
print(np.vstack((a,b)))

#
[[6. 7.]
 [1. 1.]]
-----
[[0. 3.]
 [9. 9.]]
-----
[[6. 7. 0. 3.]
 [1. 1. 9. 9.]]
-----
[[6. 7.]
 [1. 1.]
 [0. 3.]
 [9. 9.]]

hstack表示按行拼接;vstack表示按列拼接

————————————————分割線——————

a = np.floor(10*np.random.random((2,12)))
print(a)
print('------')
print(np.hsplit(a,3))
print('------')
print(np.hsplit(a,(3,4)))
a = np.floor(10*np.random.random((12,2)))
print('------')
print(a)
np.vsplit(a,3)

#
[[4. 6. 5. 6. 0. 9. 7. 9. 3. 3. 0. 6.]
 [5. 5. 6. 1. 6. 2. 5. 3. 3. 9. 8. 1.]]
------
[array([[4., 6., 5., 6.],
       [5., 5., 6., 1.]]), array([[0., 9., 7., 9.],
       [6., 2., 5., 3.]]), array([[3., 3., 0., 6.],
       [3., 9., 8., 1.]])]
------
[array([[4., 6., 5.],
       [5., 5., 6.]]), array([[6.],
       [1.]]), array([[0., 9., 7., 9., 3., 3., 0., 6.],
       [6., 2., 5., 3., 3., 9., 8., 1.]])]
------
[[2. 0.]
 [0. 2.]
 [1. 4.]
 [3. 7.]
 [0. 0.]
 [7. 5.]
 [0. 7.]
 [3. 5.]
 [7. 9.]
 [1. 4.]
 [2. 4.]
 [8. 1.]]
[array([[2., 0.],
       [0., 2.],
       [1., 4.],
       [3., 7.]]), array([[0., 0.],
       [7., 5.],
       [0., 7.],
       [3., 5.]]), array([[7., 9.],
       [1., 4.],
       [2., 4.],
       [8., 1.]])]
hsplit(a,3)表示按行切分,均分紅3部分;

hsplit(a,(3,4))表示在第2個位置切分,以及第3個位置切分
同理,vsplit表示按列切分
③複製經常使用操做
a = np.arange(12)
b = a
print(b is a)
b.shape = 3,4
print(a.shape)
print(id(a))
print(id(b))

#
True
(3, 4)
2570788546080
2570788546080
賦值操做,a和b徹底指向一個元素,把b的維度修改一下,a也改變了
————————————————分割線——————————————————
c = a.view()
print(c is a)
c.shape =2,6
print(a.shape)
c[0,4] = 1234
print(a)
print(id(a))
print(id(c))

#
False
(3, 4)
[[   0    1    2    3]
 [1234    5    6    7]
 [   8    9   10   11]]
2570788546080
2570788626560

能夠看出這是一個淺複製,a和c指向不一樣的元素,但共用值,改變c的元素,其實也改變了a的元素

————————————————分割線——————————————————
 那麼既想讓複製的值指向不一樣的元素,也不想讓他們共用值,能夠進行深複製
d = a.copy()
d is a
d[0,0]= 9999
print(d)
print(a)

#

[[9999    1    2    3]
 [   4    5    6    7]
 [   8    9   10   11]]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
 

改變d的元素,對a毫無影響,同時改變a或者d的維度,對對方也毫無影響

————————————————分割線————————————————
補充一點關於索引的內容
data = np.sin(np.arange(20)).reshape(5,4)
print(data)
ind = data.argmax(axis=0)
print(ind)
data_max = data[ind, range(data.shape[1])]
print(data_max)

#
[[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]
 [-0.53657292  0.42016704  0.99060736  0.65028784]
 [-0.28790332 -0.96139749 -0.75098725  0.14987721]]
[2 0 3 1]
[0.98935825 0.84147098 0.99060736 0.6569866 ]
axis表示按列索引;
ind至關因而把每列中最大數的行給拿出來放到一個ndarray中;
注意data[,]取數的方法,很巧妙,ind是按行做索引,range(data.shape[1])至關於把一共多少列給變成了range循環(python基礎)

————————————分割線—————————

接下去是擴展操做

a = np.arange(0,40,10)
print(a)
b= np.tile(a,(3,5))
print(b)

#
[ 0 10 20 30]
[[ 0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30]
 [ 0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30]
 [ 0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30  0 10 20 30]]

至關於把一個維度的向量擴展成所需類型的矩陣,矩陣每一個元素都是相同的向量

————————————分割————————————

a = np.array([[4,3,5],[1,2,1]])
print(a)
print('-------')
b = np.sort(a,axis=1)
print(b)
a.sort(axis=1)
print('-------')
print(a)
a = np.array([4,3,1,2])
j= np.argsort(a)
print('-------')
print(j)
print('-------')
print(a[j])

#
[[4 3 5]
 [1 2 1]]
-------
[[3 4 5]
 [1 1 2]]
-------
[[3 4 5]
 [1 1 2]]
-------
[2 3 1 0]
-------
[1 2 3 4]

axis=1表示按行索引排序

argsort表示按元素值從小到大,對其索引進行排序

a[j]又表示按照索引對其元素值進行排序,其實就是從小到大排序

————————————————分割線————————————
x_data = np.linspace(-1,1,300)[:, np.newaxis]
好比,今天寫了這麼一行代碼,意思是生成-1到1的300個均勻分佈值(300,),而後變成(300,1)
noise = np.random.normal(0,0.05,x_data.shape)

表示生成一個正態分佈,均值爲0,標準差爲0.05,維度爲x_data的維度,

因此標準正態分佈則是

np.random.normal(loc=0,scale=1,size=None)

也至關於

np.random.randn(size=None)
相關文章
相關標籤/搜索