數據分析(Numpy基礎)

時間 2019-12-01

標籤數據分析 numpy 基礎简体版

原文原文鏈接

數據分析(Numpy基礎)

1.什麼是數據分析?

數據分析是指,用適當的統計分析方法,對收集來的大量數據進行分析，提取有用信息並造成結論,從而實現對數據的詳細研究和歸納總結的過程。

2.python作數據分析的經常使用庫

1. numpy        基礎數值算法
2. scipy        科學計算
3. matplotlib   數據可視化
4. pandas       序列高級函數

1、numpy概述

1.什麼是numpy?

1. Numerical Python，數字的Python，彌補了Python語言所欠缺的數值計算能力。
2. Numpy是其它數據分析及機器學習庫的底層庫。
3. Numpy徹底標準C語言實現，運行效率充分優化。
4. Numpy開源免費。

2.numpy發展歷史

1. 1995年，Numeric，Python語言數值計算擴充。
2. 2001年，Scipy->Numarray，多維數組運算。
3. 2005年，Numeric+Numarray->Numpy。
4. 2006年，Numpy脫離Scipy成爲獨立的項目。

3.numpy的性能

1. 代碼簡潔：減小Python代碼中的循環。
2. 底層實現：厚內核(C)+薄接口(Python)，保證性能。

2、numpy基礎

1.ndarray數組

1.Numpy中的數組是numpy.ndarray類實例化的對象，其中包括：python

元數據（metadata）：存儲對目標數組的描述信息，如：shape、dtype、size、data等,算法

實際數據：完整的數組數據數組

將實際數據與元數據分開存放，一方面提升了內存空間的使用效率，另外一方面大部分對數組的操做僅僅是對元數據的操做,從而減小對實際數據的訪問頻率，提升性能。機器學習

2.ndarray數組的特色:函數

1. Numpy數組是同質數組，即全部元素的 數據類型必須相同

2. Numpy數組的 下標從0開始，最後一個元素的下標爲數組長度減1

 1 import numpy
 2 
 3 #numpy.ndarray類的對象表示數組
 4 ary=numpy.array([1,2,3,4,5,6])
 5 print(ary,type(ary)) #[1 2 3 4 5 6] <class 'numpy.ndarray'>
 6 
 7 #ndarray的運算規則:數組一個矩陣
 8 ary=ary+10
 9 print(ary) #[11 12 13 14 15 16]
10 
11 ary=ary+ary  #個數對應才能運算
12 print(ary)#[22 24 26 28 30 32]
13 
14 ary=ary>30  #比較運算
15 print(ary) #[False False False False False  True]

2.數組的4種建立方式

(1)numpy.array(任何可被解釋爲Numpy數組的邏輯結構)

(2)numpy.arange(起始值(0),終止值,步長(1))

(3)numpy.zeros(數組元素個數, dtype='類型')

(4)numpy.ones(數組元素個數, dtype='類型')

 1 import numpy
 2 
 3 #1.numpy.array()
 4 a1=numpy.array([[1,2,3],[4,5,6]])
 5 print(a1,type(a1))  #[[1 2 3][4 5 6]] <class 'numpy.ndarray'>
 6 
 7 #2.numpy.arange()
 8 a2=numpy.arange(0,10,2)
 9 print(a2,type(a2))  #[0 2 4 6 8] <class 'numpy.ndarray'>
10 
11 #3.numpy.zeros()
12 a3=numpy.zeros((10,),dtype='float32')
13 print(a3,type(a3))  #[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
14 
15 #4.numpy.ones()
16 a4=numpy.ones((2,3),dtype='int32')
17 print(a4,type(a4))  #[[1 1 1][1 1 1]]
18 
19 #numpy.ones_like()  numpy.zeros_like()
20 #構建一個結構與a1相同的全1數組
21 print(numpy.ones_like(a1))  #[[1 1 1][1 1 1]]

3.數組的屬性

(1)數組屬性彙總

shape - 維度性能
dtype - 元素類型學習
size - 元素數量優化
ndim - 維數，len(shape)ui
itemsize - 元素字節數spa
nbytes - 總字節數 = size x itemsize
real - 複數數組的實部數組
imag - 複數數組的虛部數組
T - 數組對象的轉置視圖
flat - 扁平迭代器

 1 import numpy as np
 2 
 3 a = np.array([[1 + 1j, 2 + 4j, 3 + 7j],
 4               [4 + 2j, 5 + 5j, 6 + 8j],
 5               [7 + 3j, 8 + 6j, 9 + 9j]])
 6 print(a.shape) # (3, 3)    一個元組表示3行3列
 7 print(a.dtype) # complex128    一個字符串，表示複數類型，爲每一個數據元素開闢16字節的內存空間
 8 print(a.ndim) # 2
 9 print(a.size)  # 9
10 print(a.itemsize)  # 16    每一個數據元素佔用內存16字節
11 print(a.nbytes) # 144    整個數組總大小爲144字節
12 print(a.real, a.imag, sep='\n')  #複數的實部和虛部
13 # [[1. 2. 3.]
14 #  [4. 5. 6.]
15 #  [7. 8. 9.]]
16 # [[1. 4. 7.]
17 #  [2. 5. 8.]
18 #  [3. 6. 9.]]
19 print(a.T)  #矩陣轉置
20 # [[1.+1.j 4.+2.j 7.+3.j]
21 #  [2.+4.j 5.+5.j 8.+6.j]
22 #  [3.+7.j 6.+8.j 9.+9.j]]
23 print([elem for elem in a.flat])  #.flat多維數組轉一維數組
24 # [(1+1j), (2+4j), (3+7j), (4+2j), (5+5j), (6+8j), (7+3j), (8+6j), (9+9j)]
25 b = a.tolist()   #數組轉列表（加，）
26 print(b)
27 # [[(1+1j), (2+4j), (3+7j)], [(4+2j), (5+5j), (6+8j)], [(7+3j), (8+6j), (9+9j)]]

(2)數組變維

1.視圖變維（數據共享）：操做的是元數據，若任意實際數據改變，全部數組跟着變，.reshape()，.ravel()

2.複製變維（數據獨立）：.flatten()

3.就地變維：直接改變原數組對象的維度，不返回新數組，.shape，.resize()

 1 import numpy as np
 2 
 3 """
 4 1.共享數據變維:原數組維度不變,實際數據改變卻都要跟着變
 5     ndarray.reshape(),ndarray.ravel()
 6 """
 7 a=np.arange(1,7)
 8 print(a,a.shape) #[1 2 3 4 5 6] (6,)
 9 
10 b=a.reshape(2,3)
11 print(b,b.shape) #[[1 2 3] [4 5 6]] (2, 3)
12 
13 #元數據獨立,實際數據只一份
14 print(a.shape)   #(6,)
15 b[0,1]=666
16 print(b)  #[[1 666 3] [4 5 6]]
17 print(a)  #[1 666 3 4 5 6]
18 
19 c=a.reshape(-1,2)#-1表示自適應行數,指定2列
20 print(c,c.shape)  #[[1 666] [3 4] [5 6]] (3, 2)
21 
22 # d=c.reshape(6,)  #多維變一維
23 d=c.ravel()  #多維變一維
24 print(d,d.shape)  #[1 666 3 4 5 6] (6,)
25 
26 """
27 2.複製變維:元數據和實際數據互不影響
28     ndarray.flatten(),copy.deepcopy()
29 """
30 e = c.flatten()  #多維變一維
31 print(e)  #[  1 666   3   4   5   6]
32 a += 10
33 print(a, e, sep='\n')
34 # [ 11 676  13  14  15  16]
35 # [  1 666   3   4   5   6]
36 
37 """
38 3.就地變維:直接對改變原數組
39     ndarray.shape,ndarray.resize()
40 """
41 print("----------------------------------------")
42 a=np.arange(1,9)
43 print(a,a.shape)  #[1 2 3 4 5 6 7 8] (8,)
44 a.shape = (2, 4)
45 print(a,a.shape)  #[[1 2 3 4] [5 6 7 8]] (2, 4)
46 a.resize(2, 2, 2)
47 print(a,a.shape)  #[[[1 2] [3 4]] [[5 6] [7 8]]] (2, 2, 2)

(3)數組元素的數據類型:np.ndarray.dtype

1.基本類型

類型	字符碼	字符碼簡寫
布爾型	'bool_ '	?
整數型	'int8/16/32/64'	i1/i2/i4/i8
無符號整形	'uint8/16/32/64'	u1/u2/u4/u8
浮點型	'float/16/32/64'	f2/f4/f8
複數型	'complex64/128'	c8/c16
字符串型	'str_ '	U<字符數>
日期類型	'datetime64'	M8[Y] M8[M] M8[D] M8[h] M8[m] M8[s]

2.簡寫字符碼解讀

類型	釋義
U7	包含7個字符的Unicode字符串，每一個字符佔4個字節，採用默認字節序。
M8[D]	包含8字節的日期類型,每一個日期數據佔8個字節,採用默認字節序。
3i4	大端字節序，3個元素的一維數組，每一個元素都是整型，每一個整型元素佔4個字節。
<(2,3)u8	小端字節序，6個元素2行3列的二維數組，每一個元素都是無符號整型，每一個無符號整型元素佔8個字節。

(4)數組元素的個數:np.ndarray.size

(5)數組元素的索引(下標)

1.多維數組的切片

 1 import numpy as np
 2 
 3 a=np.arange(1,10)
 4 a.resize(3,3)
 5 print(a)
 6 # [[1 2 3]
 7 #  [4 5 6]
 8 #  [7 8 9]]
 9 
10 print(a[:2, :2]) #以,做爲分隔,切前兩行的前兩列
11 # [[1 2]
12 #  [4 5]]
13 print(a[::2,::2]) #切一三行的一三列
14 # [[1 3]
15 #  [7 9]]

　　2.數組的掩碼操做

 1 import numpy as np
 2 
 3 a=np.arange(0,10)
 4 # mask = [True, False,True, False,True, False,True, False,True, False]
 5 # print(a[mask])
 6 
 7 #1.bool掩碼
 8 # mask=a%2==0
 9 # print(mask)
10 # print(a[mask])
11 print(a[a%2==0])  #[0 2 4 6 8]
12 #輸出100之內3與7的公倍數
13 b=np.arange(1,100)
14 print(b[(b%3==0)&(b%7==0)])  #[21 42 63 84]
15 
16 #2.索引掩碼
17 c=np.array([10,20,30,40])
18 mask=[0,1,2,3,3,2,1,0]
19 print(c[mask])  #[10 20 30 40 40 30 20 10]
20 #爲商品排序
21 products=np.array(['Mi','Huawei','Apple','Samsang'])
22 price=np.array([2999,4999,8888,3999])
23 indices=np.argsort(price)# 爲數組排序,返回有序索引
24 # print(indices)  #[0 3 1 2]
25 print(products[indices])