最近在DL裏被各類矩陣計算虐得很渣,決定學習一波numpypython
import numpy world_alcohol = numpy.genfromtxt("world_alcohol.txt",delimiter=",",dtype=str) print(type(world_alcohol)) print(world_alcohol) print(help(numpy.genfromtxt))
#
<class 'numpy.ndarray'> [['Year' 'WHO region' 'Country' 'Beverage Types' 'Display Value'] ['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0'] ['1986' 'Americas' 'Uruguay' 'Other' '0.5'] ... ['1987' 'Africa' 'Malawi' 'Other' '0.75'] ['1989' 'Americas' 'Bahamas' 'Wine' '1.5'] ['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]
txt內容以下:數組
Year,WHO region,Country,Beverage Types,Display Value 1986,Western Pacific,Viet Nam,Wine,0 1986,Americas,Uruguay,Other,0.5 1985,Africa,Cte d'Ivoire,Wine,1.62 1986,Americas,Colombia,Beer,4.27 1987,Americas,Saint Kitts and Nevis,Beer,1.98 1987,Americas,Guatemala,Other,0 1987,Africa,Mauritius,Wine,0.13 1985,Africa,Angola,Spirits,0.39 1986,Americas,Antigua and Barbuda,Spirits,1.55 1984,Africa,Nigeria,Other,6.1 1987,Africa,Botswana,Wine,0.2 1989,Americas,Guatemala,Beer,0.62
參數delimiter表示分割符號;dtype=str表示從文件中提取的數據以str格式存放;數據結構
ndarray是numpy中的數據結構;dom
vector = numpy.array([5,10,15,20]) matrix = numpy.array([[5,10,15],[20,25,30],[35,40,45]]) print(vector) print(matrix)
#
[ 5 10 15 20] [[ 5 10 15] [20 25 30] [35 40 45]]
看一下維度函數
vector1 = numpy.array([[1,2,3,4]]) vector2 = numpy.array([1,2,3,4]) print(vector1.shape) print(vector2.shape) matrix = numpy.array([[5,10,15],[20,25,30]]) print(matrix.shape)
#
(1, 4) (4,) (2, 3)
vector1能夠看做是1×4的矩陣;vector2是一維數組,能夠看做是列向量學習
a = np.array([[1,2,3],[4,5,6],[7,8,9]]) b = np.array([1,2,3]) c = np.array([[1,2,3]]) print(np.dot(a,b)) print('------------------') print(np.dot(b,a)) print('------------------') print(np.dot(c,a)) print('------------------') print(np.dot(b,b)) print('------------------') print(a*a)
#
[14 32 50] ------------------ [30 36 42] ------------------ [[30 36 42]] ------------------ 14 ------------------ [[ 1 4 9] [16 25 36] [49 64 81]]
a是3×3的矩陣,b是一個長度爲3的一維數組,c是1×3的矩陣spa
np.dot(a,b)能夠看做是3×3的矩陣和3×1的矩陣相乘code
np.dot(b,a)看做是1×3的矩陣和3×3的矩陣相乘orm
np.dot(c,a)看做是矩陣和矩陣相乘blog
注意,np.dot(b,b)是向量內積,是把各個維度都加起來
a*a是矩陣對應位置上各個元素的乘積
——————分割線——————————————
numbers = numpy.array([1,2,3,'4'])
print(numbers)
numbers.dtype
#
['1' '2' '3' '4']
只要array中有一個元素不是int32或者int64,那麼全部的元素數據類型就會變成其餘的數據類型
1 Year,WHO region,Country,Beverage Types,Display Value 2 1986,Western Pacific,Viet Nam,Wine,0 3 1986,Americas,Uruguay,Other,0.5 4 1985,Africa,Cte d'Ivoire,Wine,1.62 5 1986,Americas,Colombia,Beer,4.27 6 1987,Americas,Saint Kitts and Nevis,Beer,1.98 7 1987,Americas,Guatemala,Other,0 8 1987,Africa,Mauritius,Wine,0.13 9 1985,Africa,Angola,Spirits,0.39 10 1986,Americas,Antigua and Barbuda,Spirits,1.55 11 1984,Africa,Nigeria,Other,6.1 12 1987,Africa,Botswana,Wine,0.2 13 1989,Americas,Guatemala,Beer,0.62
①去掉表頭,取出第二行,第五列的值以及第三行,第三列的值
world_alcohol = np.genfromtxt("world_alcohol.txt",delimiter=",",dtype=str,skip_header=1) uruguay_other_1986 = world_alcohol[1,4] third_country = world_alcohol[2,2] print(uruguay_other_1986) print(third_country)
#
0.5 Cte d'Ivoire
vector = numpy.array([5,10,15,20]) print(vector[0:3])
#
[ 5 10 15]
取出第0-2個元素
——————分割線——————————————
matrix = numpy.array([ [5,10,15], [20,25,30], [35,40,45] ]) print(matrix[:,1])
#
[10 25 40]
取出第二列全部元素
——————分割線——————————————
matrix = numpy.array([ [5,10,15], [20,25,30], [35,40,45] ]) print(matrix[:,0:2])
#
[[ 5 10] [20 25] [35 40]]
取出前兩列元素
vector = np.array([5.10,15,20]) vector == 10
#
array([False, False, False])
判斷當前數據結構中有沒有該元素
也能夠借bool值來進行索引
vector = np.array([5,10,15,20]) equal_to_ten = (vector == 10) print(equal_to_ten) print(vector[equal_to_ten])
#
[False True False False] [10]
固然,在矩陣中就是以下:
matrix = np.array([ [5,10,15], [20,25,30], [35,40,45] ]) second_column_25 = (matrix[:,1]==25) print(second_column_25) print(matrix[second_column_25, :])
#
[False True False] [[20 25 30]]
對第二列的元素進行判斷;而後取出元素等於25的那一列所在行
——————————-分割線——————————————
vector = numpy.array(["1","2","3"]) print(vector.dtype) print(vector) vector = vector.astype(float) print(vector.dtype) print(vector)
#
<U1 ['1' '2' '3'] float64 [1. 2. 3.]
從str類型轉到float類型
matrix = np.array([ [5,10,15], [20,25,30], [35,40,45] ]) print(matrix.sum(axis=1)) print(matrix.sum(axis=0))
[ 30 75 120]
[60 75 90]
axis=1表示按行求和;axis=0表示按列求和
print(np.arange(15))
a = np.arange(15).reshape(3,5)
print(a)
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.size)
#
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] [[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14]] (3, 5) 2 int32 15
np.arange,依次生成0-14的一維數組;
reshape(3,5),轉換爲3×5的矩陣;
ndim看看維度是多少
dtype.name,看看元素數據類型
size看看一共有多少元素
a = np.zeros((3,4))
a
#
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
表示生成3行,四列的全0矩陣,其中(3,4)能夠是[3,4]格式;
默認dtype=np.float64
——————————————————分割線——————————
np.ones((2,3,4), dtype=np.int32)
#
array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]])
表示生成2個3×4的全1矩陣
——————————————————分割線——————————
np.arange(10,30,5) np.arange(0, 2, 0.3) np.arange(12).reshape(4,3) # [10 15 20 25] [0. 0.3 0.6 0.9 1.2 1.5 1.8] [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]]
從10開始,每隔5取一個數,左閉右開
——————————————————分割線——————————
np.random.random((2,3)) # array([[0.04286104, 0.02303494, 0.35769307], [0.34801234, 0.63580499, 0.99897693]])
表示隨機生成2×3的矩陣
——————————————————分割線——————————
from numpy import pi np.linspace(0, 2*pi, 100) # array([0. , 0.06346652, 0.12693304, 0.19039955, 0.25386607, 0.31733259, 0.38079911, 0.44426563, 0.50773215, 0.57119866, 0.63466518, 0.6981317 , 0.76159822, 0.82506474, 0.88853126, 0.95199777, 1.01546429, 1.07893081, 1.14239733, 1.20586385, 1.26933037, 1.33279688, 1.3962634 , 1.45972992, 1.52319644, 1.58666296, 1.65012947, 1.71359599, 1.77706251, 1.84052903, 1.90399555, 1.96746207, 2.03092858, 2.0943951 , 2.15786162, 2.22132814, 2.28479466, 2.34826118, 2.41172769, 2.47519421, 2.53866073, 2.60212725, 2.66559377, 2.72906028, 2.7925268 , 2.85599332, 2.91945984, 2.98292636, 3.04639288, 3.10985939, 3.17332591, 3.23679243, 3.30025895, 3.36372547, 3.42719199, 3.4906585 , 3.55412502, 3.61759154, 3.68105806, 3.74452458, 3.8079911 , 3.87145761, 3.93492413, 3.99839065, 4.06185717, 4.12532369, 4.1887902 , 4.25225672, 4.31572324, 4.37918976, 4.44265628, 4.5061228 , 4.56958931, 4.63305583, 4.69652235, 4.75998887, 4.82345539, 4.88692191, 4.95038842, 5.01385494, 5.07732146, 5.14078798, 5.2042545 , 5.26772102, 5.33118753, 5.39465405, 5.45812057, 5.52158709, 5.58505361, 5.64852012, 5.71198664, 5.77545316, 5.83891968, 5.9023862 , 5.96585272, 6.02931923, 6.09278575, 6.15625227, 6.21971879, 6.28318531])
表示從0開始,到2π,按均勻分佈取100個值
——————————————分割線—————————
a = np.array([20,30,40,50]) b = np.arange(4) print(a) print(b) c = a-b print(c) c = c-1 print(c) print(b**2) print(a<35) # [20 30 40 50] [0 1 2 3] [20 29 38 47] [19 28 37 46] [0 1 4 9] [ True True False False]
a-b就是每一個對應位置元素相減
———————————————分割線———————
A = np.array([[1,1], [0,1]]) B = np.array([[2,0], [3,4]]) print(A) print('------------') print(B) print('------------') print(A*B) print('------------') print(A.dot(B)) print('------------') print(np.dot(A, B)) # [[1 1] [0 1]] ------------ [[2 0] [3 4]] ------------ [[2 0] [0 4]] ------------ [[5 4] [3 4]] ------------ [[5 4] [3 4]]
前文也詳細寫了numpy不一樣乘法的區別
B = np.arange(3) print(B) print(np.exp(B)) print(np.sqrt(B)) #
[0 1 2] [1. 2.71828183 7.3890561 ] [0. 1. 1.41421356]
exp表示e的x次冪;sqrt表示開平方
———————————————分割線———————
a = np.floor(10*np.random.random((3,4))) print(a) print('--------------') print(a.ravel()) print('--------------') a.shape = (6,2) print(a) print('--------------') print(a.T)
#
[[8. 3. 6. 2.] [0. 8. 7. 3.] [5. 2. 0. 7.]] -------------- [8. 3. 6. 2. 0. 8. 7. 3. 5. 2. 0. 7.] -------------- [[8. 3.] [6. 2.] [0. 8.] [7. 3.] [5. 2.] [0. 7.]] -------------- [[8. 6. 0. 7. 5. 0.] [3. 2. 8. 3. 2. 7.]]
floor表示向下取整,也就是往小的數取整;
a.ravel()表示把矩陣拉成一個向量;
a.shape=(6,2) 至關於a = a.reshape(6,2)
也能夠是a.shape=(6,-1)或者a = a.reshape(6,-1)意思是指定一個維度,另外一個維度本身去算
a.T也就是求轉置
—————————————分割線————————
a = np.floor(10*np.random.random((2,2))) b = np.floor(10*np.random.random((2,2))) print(a) print('-----') print(b) print('-----') print(np.hstack((a,b))) print('-----') print(np.vstack((a,b)))
#
[[6. 7.] [1. 1.]] ----- [[0. 3.] [9. 9.]] ----- [[6. 7. 0. 3.] [1. 1. 9. 9.]] ----- [[6. 7.] [1. 1.] [0. 3.] [9. 9.]]
hstack表示按行拼接;vstack表示按列拼接
————————————————分割線——————
a = np.floor(10*np.random.random((2,12))) print(a) print('------') print(np.hsplit(a,3)) print('------') print(np.hsplit(a,(3,4))) a = np.floor(10*np.random.random((12,2))) print('------') print(a) np.vsplit(a,3)
#
[[4. 6. 5. 6. 0. 9. 7. 9. 3. 3. 0. 6.] [5. 5. 6. 1. 6. 2. 5. 3. 3. 9. 8. 1.]] ------ [array([[4., 6., 5., 6.], [5., 5., 6., 1.]]), array([[0., 9., 7., 9.], [6., 2., 5., 3.]]), array([[3., 3., 0., 6.], [3., 9., 8., 1.]])] ------ [array([[4., 6., 5.], [5., 5., 6.]]), array([[6.], [1.]]), array([[0., 9., 7., 9., 3., 3., 0., 6.], [6., 2., 5., 3., 3., 9., 8., 1.]])] ------ [[2. 0.] [0. 2.] [1. 4.] [3. 7.] [0. 0.] [7. 5.] [0. 7.] [3. 5.] [7. 9.] [1. 4.] [2. 4.] [8. 1.]] [array([[2., 0.], [0., 2.], [1., 4.], [3., 7.]]), array([[0., 0.], [7., 5.], [0., 7.], [3., 5.]]), array([[7., 9.], [1., 4.], [2., 4.], [8., 1.]])]
hsplit(a,3)表示按行切分,均分紅3部分;
hsplit(a,(3,4))表示在第2個位置切分,以及第3個位置切分
同理,vsplit表示按列切分
③複製經常使用操做
a = np.arange(12) b = a print(b is a) b.shape = 3,4 print(a.shape) print(id(a)) print(id(b))
#
True (3, 4) 2570788546080 2570788546080
賦值操做,a和b徹底指向一個元素,把b的維度修改一下,a也改變了
————————————————分割線——————————————————
c = a.view() print(c is a) c.shape =2,6 print(a.shape) c[0,4] = 1234 print(a) print(id(a)) print(id(c))
#
False (3, 4) [[ 0 1 2 3] [1234 5 6 7] [ 8 9 10 11]] 2570788546080 2570788626560
能夠看出這是一個淺複製,a和c指向不一樣的元素,但共用值,改變c的元素,其實也改變了a的元素
————————————————分割線——————————————————
那麼既想讓複製的值指向不一樣的元素,也不想讓他們共用值,能夠進行深複製
d = a.copy() d is a d[0,0]= 9999 print(d) print(a) # [[9999 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
改變d的元素,對a毫無影響,同時改變a或者d的維度,對對方也毫無影響
————————————————分割線————————————————
補充一點關於索引的內容
data = np.sin(np.arange(20)).reshape(5,4) print(data) ind = data.argmax(axis=0) print(ind) data_max = data[ind, range(data.shape[1])] print(data_max)
#
[[ 0. 0.84147098 0.90929743 0.14112001] [-0.7568025 -0.95892427 -0.2794155 0.6569866 ] [ 0.98935825 0.41211849 -0.54402111 -0.99999021] [-0.53657292 0.42016704 0.99060736 0.65028784] [-0.28790332 -0.96139749 -0.75098725 0.14987721]] [2 0 3 1] [0.98935825 0.84147098 0.99060736 0.6569866 ]
axis表示按列索引;
ind至關因而把每列中最大數的行給拿出來放到一個ndarray中;
注意data[,]取數的方法,很巧妙,ind是按行做索引,range(data.shape[1])至關於把一共多少列給變成了range循環(python基礎)
————————————分割線—————————
接下去是擴展操做
a = np.arange(0,40,10) print(a) b= np.tile(a,(3,5)) print(b) # [ 0 10 20 30] [[ 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30] [ 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30] [ 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30]]
至關於把一個維度的向量擴展成所需類型的矩陣,矩陣每一個元素都是相同的向量
————————————分割————————————
a = np.array([[4,3,5],[1,2,1]]) print(a) print('-------') b = np.sort(a,axis=1) print(b) a.sort(axis=1) print('-------') print(a) a = np.array([4,3,1,2]) j= np.argsort(a) print('-------') print(j) print('-------') print(a[j])
#
[[4 3 5] [1 2 1]] ------- [[3 4 5] [1 1 2]] ------- [[3 4 5] [1 1 2]] ------- [2 3 1 0] ------- [1 2 3 4]
axis=1表示按行索引排序
argsort表示按元素值從小到大,對其索引進行排序
a[j]又表示按照索引對其元素值進行排序,其實就是從小到大排序
————————————————分割線————————————
x_data = np.linspace(-1,1,300)[:, np.newaxis]
好比,今天寫了這麼一行代碼,意思是生成-1到1的300個均勻分佈值(300,),而後變成(300,1)
noise = np.random.normal(0,0.05,x_data.shape)
表示生成一個正態分佈,均值爲0,標準差爲0.05,維度爲x_data的維度,
因此標準正態分佈則是
np.random.normal(loc=0,scale=1,size=None)
也至關於
np.random.randn(size=None)