前置機器學習（三）：30分鐘掌握經常使用NumPy用法

時間 2020-12-06

標籤 web 數組 app dom 機器學習學習 spa code orm blog 欄目 HTML 简体版

原文原文鏈接

NumPy支持大量的維度數組與矩陣運算，是針對數組運算的Python庫。

1、Python基礎

咱們首先鞏固一下Python的基礎知識。Python有6種標準數據類型：Number（數字）,String（字符串）,List（列表）,Tuple（元組）,Set（集合）,Dictionary（字典）。
其中：
不可變數據：Number（數字）、String（字符串）、Tuple（元組）。
可變數據：List（列表）、Dictionary（字典）、Set（集合）。數組

1. List[列表]

列表由方括號 [ ] 包裹，每一個位置的數值可變。app

list = [1, 2, 3, 4, 5, 6]

根據位置取值，如取第2個位置的值：dom

list[1]

獲得 2。
從第3個位置取值，到列表末尾的全部值：機器學習

a[2:]

獲得 [3, 4, 5, 6]。學習

改變指定位置的值：spa

list[0] = 9

列表a此時輸出爲 [9, 2, 3, 4, 5, 6]。code

2. Tuple(元組)

元組由圓括號 ( ) 包裹，每一個位置的數值不可變。容許數據重複。orm

tuple = ('a', 'a, 'c', 1, 2, 3.0)

輸出('a', 'a', 'c', 1, 2, 3.0)。
取最後一個位置的元素：blog

tuple[-1]

輸出 3.0。

元組操做與列表相似，但不可改變元組內元素的值，不然會報錯。

tuple[2] = 'caiyongji'

3. Set{集合}

集合是包含不重複元素的集體，由花括號 { } 包裹。

set1 = {'a','b','c','a'}
set2 = {'b','c','d','e'}

set1的輸出結果爲：{'a', 'b', 'c'}。注意：集合會刪除重複元素。
set2的輸出結果爲：{'b', 'c', 'd', 'e'}。

與列表和元組不一樣，集合是不可下標的，如：

set1[0]

下面，咱們來看看集合運算。

set1和set2的差集：

set1 - set2
#set1.difference(set2)

輸出：{'a'}。

set1和set2的並集：

set1 | set2
#set1.union(set2)

輸出：{'a', 'b', 'c', 'd', 'e'}。

set1和set2的交集：

set1 & set2
#set1.intersection(set2)

輸出：{'b', 'c'}。

set1和set2的對稱差集：

set1 ^ set2 
#(set1 - set2) | (set2 - set1)
#set1.symmetric_difference(set2)

輸出：{'a', 'd', 'e'}。

以上差集、並集、交集、對稱差集都有對應的集合方法，能夠註釋方法本身試試。

4. Dictionary{字典:Dictionary}

字典是一種映射關係，是無序有鍵值對（key-value）集合。字典不容許重複的鍵(key)，但容許重複的值(value)。

dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}

字典輸出{'gongzhonghao': 'caiyongji', 'website': 'blog.caiyongji.com'}，須要注意的是，當字典包含重複鍵，後面的會覆蓋前面的元素。

dict['gongzhonghao']

輸出字符串 caiyongji。咱們也能夠使用get方法獲得相同效果。

dict.get('gongzhonghao')

查看全部的鍵(key):

dict.keys()

輸出 dict_keys(['gongzhonghao', 'website'])。

查看全部的值(value):

dict.values()

輸出 dict_values(['caiyongji', 'blog.caiyongji.com'])。
改變某一項的值：

dict['website'] = 'caiyongji.com'
dict

輸出 {'gongzhonghao': 'caiyongji', 'website': 'caiyongji.com'}。

瞭解了Python的數據類型，咱們能夠學着使用NumPy了。

2、Numpy常見用法

1. 建立數組

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

arr的輸出爲array([1, 2, 3, 4, 5])。

咱們輸入如下代碼建立二維數組：

my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
mtrx= np.array(my_matrix)

mtrx的輸出以下：

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

2. 索引與切片

索引一維數組與二位數組以下：

print('arr[0]=',arr[0],'mtrx[1,1]=',mtrx[1,1])

輸出 arr[0]= 1 mtrx[1,1]= 5。

對數組切片：

arr[:3]

輸出結果爲 array([1, 2, 3])。

倒數切片：

arr[-3:-1]

輸出 array([3, 4])。

加入步長(step)，步長決定了切片間隔：

arr[1:4:2]

輸出 array([2, 4])。

二維數組切片：

mtrx[0:2, 0:2]

輸出，代碼意義爲取第一、2行，第一、2列：

array([[1, 2],
       [4, 5]])

3. dtype

NumPy的dtpe有以下幾種數據類型：

i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
V - fixed chunk of memory for other type ( void )

import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array(['apple', 'banana', 'cherry'])
print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)

輸出爲 arr1.dtype= int32 arr2.dtype= <U6。arr1數據類型爲int32，arr2的<U6表示不超過6位Unicode字符串。

咱們能夠指定dtype類型。

arr = np.array(['1', '2', '3'], dtype='f')

輸出結果位 array([1., 2., 3.], dtype=float32)，其中1.表示1.0，能夠看到dtype被設置位float32數據類型。

4. 通常方法

4.1 arange

np.arange(0,101,2)輸出結果以下，該命令表示，在[0,101)區間內均勻地生成數據，間隔步長爲2。

array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
        26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
        52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
        78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])

4.2 zeros

np.zeros((2,5))輸出結果以下，該命令表示，輸出2行5列全爲0的矩陣（二維數組）。

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

4.3 ones

np.ones((4,4))輸出結果以下，該命令表示，輸出4行4列全爲1的矩陣。

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

4.4 eye

np.eye(5)輸出結果以下，該命令表示，輸出對角線爲1其他全爲0的5行5列方陣。方陣爲行列相同的矩陣。

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

4.5 rand

np.random.rand(5,2) 命令生成5行2列的隨機數。

array([[0.67227856, 0.4880784 ],
       [0.82549517, 0.03144639],
       [0.80804996, 0.56561742],
       [0.2976225 , 0.04669572],
       [0.9906274 , 0.00682573]])

若是想保證隨機出與本例同樣的隨機數，可以使用與本例相同的隨機種子。經過np.random.seed方法設置。

np.random.seed(99)
np.random.rand(5,2)

4.6 randint

np.random.randint(0,101,(4,5))輸出結果以下，該命令表示，在[0,101)區間內隨機選取整數生成4行5列的數組。

array([[ 1, 35, 57, 40, 73],
       [82, 68, 69, 52,  1],
       [23, 35, 55, 65, 48],
       [93, 59, 87,  2, 64]])

4.7 max min argmax argmin

咱們先隨機生成一組數：

np.random.seed(99)
ranarr = np.random.randint(0,101,10)
ranarr

輸出：

array([ 1, 35, 57, 40, 73, 82, 68, 69, 52,  1])

查看最大最小值分別爲：

print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())

輸出結果爲ranarr.max()= 82 ranarr.min()= 1。
其中最大值和最小值的索引位置分別爲：

print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())

輸出：ranarr.argmax()= 5 ranarr.argmin()= 0。注意，當出現多個最大最小值時，取前面的索引位置。

3、NumPy進階用法

1. reshape

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)

其中，arr爲一維數組，newarr爲二位數組，其中行爲4，列爲3。

print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)

輸出 arr.shape= (12,) newarr.shape= (4, 3)。

newarr的輸出結果以下：

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

2. 合併與分割

2.1 concatenate

一維數組合並：

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
arr

輸出： array([1, 2, 3, 4, 5, 6])。

二維數組合並：

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2))
arr

輸出爲：

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

咱們添加參數axis=1：

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
arr

輸出爲：

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

咱們把鼠標移到 concatenate，按快捷鍵Shift+Tab查看方法說明。能夠看到concatenate方法沿着現有的軸進行合併操做，默認axis=0。當咱們設置axis=1後，合併操做沿着列進行。

2.2 array_split

分割數組：

arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
newarr

newarr的值爲：

[array([[1, 2],
        [3, 4]]),
 array([[5, 6],
        [7, 8]]),
 array([[ 9, 10],
        [11, 12]])]

3. 搜索與篩選

3.1 搜索

NumPy可經過where方法查找知足條件的數組索引。

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
x

輸出：

(array([1, 3, 5, 7], dtype=int64),)

3.2 篩選

咱們看看下面的代碼：

bool_arr = arr > 4
arr[bool_arr]

輸出：array([5, 6, 7, 8])。這回咱們返回的是數組中的值，而非索引。
咱們看看bool_arr的內容到底是什麼。
bool_arr的輸出爲：

array([False, False, False, False,  True,  True,  True,  True])

因此咱們能夠用如下命令代替以上篩選。

arr[arr > 4]

4. 排序

sort方法可對ndarry數組進行排序。

arr = np.array(['banana', 'cherry', 'apple'])
np.sort(arr)

輸出排序後的結果：array(['apple', 'banana', 'cherry'], dtype='<U6')。

針對二維數組，sort方法對每一行單獨排序。

arr = np.array([[3, 2, 4], [5, 0, 1]])
np.sort(arr)

輸出結果：

array([[2, 3, 4],
       [0, 1, 5]])

5. 隨機

5.1 隨機機率

若是咱們想完成以下需求該如何處理？

生成包含100個值的一維數組，其中每一個值必須爲三、五、7或9。
將該值爲3的機率設置爲0.1。
將該值爲5的機率設置爲0.3。
將該值爲7的機率設置爲0.6。
將該值爲9的機率設置爲0。

咱們用以下命令解決：

random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))

輸出結果：

array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7,
       7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3,
       5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5,
       7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7,
       7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])

5.2 隨機排列

5.2.1 permutation

根據原有數組生成新的隨機排列。

np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.random.permutation(arr)
new_arr

輸出爲：array([3, 1, 5, 4, 2])。原數組arr不變。

5.2.2 shuffle

改變原有數組爲隨機排列。shuffle在英文中有洗牌的意思。

np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
arr

輸出爲：array([3, 1, 5, 4, 2])。原數組arr改變。

5.3 隨機分佈

5.3.1 正太分佈

使用np.random.normal方法生成符合正太分佈的隨機數。

x = np.random.normal(loc=1, scale=2, size=(2, 3))
x

輸出結果爲：

array([[ 0.14998973,  3.22564777,  1.48094109],
       [ 2.252752  , -1.64038195,  2.8590667 ]])

若是咱們想查看x的隨機分佈，需安裝seaborn繪製圖像。使用pip安裝：

pip install -i https://pypi.tuna.tsinghua.ed... seaborn

import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=False)
plt.show()

5.3.2 二項分佈

使用np.random.binomial方法生成符合二項分佈的隨機數。

x = np.random.binomial(n=10, p=0.5, size=10)
x

輸出結果爲： array([8, 6, 6, 2, 5, 5, 5, 5, 3, 4])。

繪製圖像：

import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=True, kde=False)
plt.show()

5.3.3 多項式分佈

多項式分佈是二項分佈的通常表示。使用np.random.multinomial方法生成符合多項式分佈的隨機數。

x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
x

上面代碼，咱們能夠簡單理解爲投擲骰子。n=6爲骰子的面，pvals表示每一面的機率爲1/6。

5.3.4 其餘

除以上分佈外還有泊松分佈、均勻分佈、指數分佈、卡方分佈、帕累託分佈等。感興趣的可自行搜索。

本文收錄於機器學習前置教程系列。歡迎你們點贊、收藏、關注，更多關於機器學習的精彩內容持續更新中……!

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。