numpy之random學習

時間 2019-12-14

標籤 numpy random 學習简体版

原文原文鏈接

在機器學習中參數初始化須要進行隨機生成，同時樣本也須要隨機生成，或者聽從必定規則隨機生成，因此對隨機生成的使用顯得格外重要。javascript

有的是生成隨機數，有的是隨機序列，有點是從隨機序列中選擇元素等等。html

簡單的隨機數據

`rand`(d0, d1, ..., dn)java	隨機值python >>> np.random.rand(3,2) array([[ 0.14022471, 0.96360618], #random [ 0.37601032, 0.25528411], #random [ 0.49313049, 0.94909878]]) #random
`randn`(d0, d1, ..., dn)web	返回一個樣本，具備標準正態分佈。數組 Notesapp For random samples from $技術分享$ , use:dom sigma * np.random.randn(...) + mu Examplesiphone >>> np.random.randn() 2.1923875335537315 #random Two-by-four array of samples from N(3, 6.25):機器學習 >>> 2.5 * np.random.randn(2, 4) + 3 array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], #random [ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) #random
`randint`(low[, high, size])	返回隨機的整數，位於半開區間 [low, high)。 >>> np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) >>> np.random.randint(1, size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Generate a 2 x 4 array of ints between 0 and 4, inclusive: >>> np.random.randint(5, size=(2, 4)) array([[4, 0, 2, 1], [3, 2, 2, 0]])
`random_integers`(low[, high, size])	返回隨機的整數，位於閉區間 [low, high]。 Notes To sample from N evenly spaced floating-point numbers between a and b, use: a + (b - a) * (np.random.random_integers(N) - 1) / (N - 1.) Examples >>> np.random.random_integers(5) 4 >>> type(np.random.random_integers(5)) <type ‘int‘> >>> np.random.random_integers(5, size=(3.,2.)) array([[5, 4], [3, 3], [4, 5]]) Choose five random numbers from the set of five evenly-spaced numbers between 0 and 2.5, inclusive (i.e., from the set $技術分享$ ): >>> 2.5 * (np.random.random_integers(5, size=(5,)) - 1) / 4. array([ 0.625, 1.25 , 0.625, 0.625, 2.5 ]) Roll two six sided dice 1000 times and sum the results: >>> d1 = np.random.random_integers(1, 6, 1000) >>> d2 = np.random.random_integers(1, 6, 1000) >>> dsums = d1 + d2 Display results as a histogram: >>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(dsums, 11, normed=True) >>> plt.show()
`random_sample`([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。 To sample $技術分享$ multiply the output of`random_sample` by (b-a) and add a: (b - a) * random_sample() + a Examples >>> np.random.random_sample() 0.47108547995356098 >>> type(np.random.random_sample()) <type ‘float‘> >>> np.random.random_sample((5,)) array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428]) Three-by-two array of random numbers from [-5, 0): >>> 5 * np.random.random_sample((3, 2)) - 5 array([[-3.99149989, -0.52338984], [-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])
`random`([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample徹底同樣）
`ranf`([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample徹底同樣）
`sample`([size])	返回隨機的浮點數，在半開區間 [0.0, 1.0)。（官網例子與random_sample徹底同樣）
`choice`(a[, size, replace, p])	生成一個隨機樣本，從一個給定的一維數組 Examples Generate a uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3) array([0, 3, 4]) >>> #This is equivalent to np.random.randint(0,5,3) Generate a non-uniform random sample from np.arange(5) of size 3: >>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) Generate a uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False) array([3,1,0]) >>> #This is equivalent to np.random.permutation(np.arange(5))[:3] Generate a non-uniform random sample from np.arange(5) of size 3 without replacement: >>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance: >>> aa_milne_arr = [‘pooh‘, ‘rabbit‘, ‘piglet‘, ‘Christopher‘] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array([‘pooh‘, ‘pooh‘, ‘pooh‘, ‘Christopher‘, ‘piglet‘], dtype=‘\|S11‘)
`bytes`(length)	返回隨機字節。 >>> np.random.bytes(10) ‘ eh\x85\x022SZ\xbf\xa4‘ #random

產生隨機數的方式不少種，應用比較廣的是

一、rand()、random產生隨機的浮點數，但基本在0-1區間內，添加一個參數標註隨機序列的size,這個函數的用法和random的一致，二者區別就是一個能夠生成二維序列，一個不行。

numpy.random.rand(d0,d1,…,dn)

rand函數根據給定維度生成[0,1)之間的數據，包含0，不包含1

dn表格每一個維度

返回值爲指定維度的array

np.random.rand(4,2)

array([[ 0.02173903, 0.44376568], [ 0.25309942, 0.85259262], [ 0.56465709, 0.95135013], [ 0.14145746, 0.55389458]])

np.random.rand(4,3,2) # shape: 4*3*2

array([[[ 0.08256277, 0.11408276], [ 0.11182496, 0.51452019], [ 0.09731856, 0.18279204]], [[ 0.74637005, 0.76065562], [ 0.32060311, 0.69410458], [ 0.28890543, 0.68532579]], [[ 0.72110169, 0.52517524], [ 0.32876607, 0.66632414], [ 0.45762399, 0.49176764]], [[ 0.73886671, 0.81877121], [ 0.03984658, 0.99454548], [ 0.18205926, 0.99637823]]])

二、randint()，產生隨機整數，特色是能夠指定最大值和最小值，返回的是整數。固然也能夠指定size，size中每一個數都在low和high內。

numpy.random.randint(low, high=None, size=None, dtype=’l’)

返回隨機整數，範圍區間爲[low,high），包含low，不包含high

參數：low爲最小值，high爲最大值，size爲數組維度大小，dtype爲數據類型，默認的數據類型是np.int

high沒有填寫時，默認生成隨機數的範圍是[0，low)

np.random.randint(1,size=5) # 返回[0,1)之間的整數，因此只有0

array([0, 0, 0, 0, 0])

np.random.randint(1,5) # 返回1個[1,5)時間的隨機整數

np.random.randint(-5,5,size=(2,2))

array([[ 2, -1], [ 2, 0]])

三、randn（），其中n表示標準正態分佈，這個在生成樣本的時候常用，須要指定這個序列的尺寸

numpy.random.randn(d0,d1,…,dn)

randn函數返回一個或一組樣本，具備標準正態分佈。

dn表格每一個維度

返回值爲指定維度的array

np.random.randn() # 當沒有參數時，返回單個數據

-1.1241580894939212

np.random.randn(2,4)

array([[ 0.27795239, -2.57882503, 0.3817649 , 1.42367345], [-1.16724625, -0.22408299, 0.63006614, -0.41714538]])

np.random.randn(4,3,2)

array([[[ 1.27820764, 0.92479163], [-0.15151257, 1.3428253 ], [-1.30948998, 0.15493686]], [[-1.49645411, -0.27724089], [ 0.71590275, 0.81377671], [-0.71833341, 1.61637676]], [[ 0.52486563, -1.7345101 ], [ 1.24456943, -0.10902915], [ 1.27292735, -0.00926068]], [[ 0.88303 , 0.46116413], [ 0.13305507, 2.44968809], [-0.73132153, -0.88586716]]])

標準正態分佈介紹

標準正態分佈—-standard normal distribution

標準正態分佈又稱爲u分佈，是以0爲均值、以1爲標準差的正態分佈，記爲N（0，1）。

四、choice（），這個函數常用，常常在從序列中隨機選擇元素，並且仍是指定選擇出元素序列，非常方便。關鍵是能夠爲每一個元素制定選擇的機率。即p

numpy.random.choice(a, size=None, replace=True, p=None)

從給定的一維數組中生成隨機數

參數： a爲一維數組相似數據或整數；size爲數組維度；p爲數組中的數據出現的機率

a爲整數時，對應的一維數組爲np.arange(a)

np.random.choice(5,3)

array([4, 1, 4])

np.random.choice(5, 3, replace=False) # 當replace爲False時，生成的隨機數不能有重複的數值

array([0, 3, 1])

np.random.choice(5,size=(3,2))

array([[1, 0], [4, 2], [3, 3]])

demo_list = ['lenovo', 'sansumg','moto','xiaomi', 'iphone'] np.random.choice(demo_list,size=(3,3))

array([['moto', 'iphone', 'xiaomi'], ['lenovo', 'xiaomi', 'xiaomi'], ['xiaomi', 'lenovo', 'iphone']], dtype='<U7')

參數p的長度與參數a的長度須要一致；

參數p爲機率，p裏的數據之和應爲1

demo_list = ['lenovo', 'sansumg','moto','xiaomi', 'iphone'] np.random.choice(demo_list,size=(3,3), p=[0.1,0.6,0.1,0.1,0.1])

array([['sansumg', 'sansumg', 'sansumg'], ['sansumg', 'sansumg', 'sansumg'], ['sansumg', 'xiaomi', 'iphone']], dtype='<U7')

五、sample

lists=[1,2,3,4,5,6,7,8,10] #從指定序列中隨機獲取指定長度的片段
a=random.sample(lists,3)
print (a)
[8, 6, 10]

排列

shuffle(x)

現場修改序列，改變自身內容。（相似洗牌，打亂順序）

>>> arr = np.arange(10)
>>> np.random.shuffle(arr)
>>> arr
[1 7 5 2 9 4 3 6 0 8]

This function only shuffles the array along the first index of a multi-dimensional array:

>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.shuffle(arr)
>>> arr
array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])

permutation(x)

返回一個隨機排列

>>> np.random.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])

>>> np.random.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12])

>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.permutation(arr)
array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

上面二者都收用於打亂數組的排序，功能是同樣的。

numpy.random.permutation(x)：與numpy.random.shuffle(x)函數功能相同，二者區別：peumutation(x)不會修改X的順序。

由於前者返回了一個副本。

分佈

`beta`(a, b[, size])	貝塔分佈樣本，在 `[0, 1]`內。
`binomial`(n, p[, size])	二項分佈的樣本。
`chisquare`(df[, size])	卡方分佈樣本。
`dirichlet`(alpha[, size])	狄利克雷分佈樣本。
`exponential`([scale, size])	指數分佈
`f`(dfnum, dfden[, size])	F分佈樣本。
`gamma`(shape[, scale, size])	伽馬分佈
`geometric`(p[, size])	幾何分佈
`gumbel`([loc, scale, size])	耿貝爾分佈。
`hypergeometric`(ngood, nbad, nsample[, size])	超幾何分佈樣本。
`laplace`([loc, scale, size])	拉普拉斯或雙指數分佈樣本
`logistic`([loc, scale, size])	Logistic分佈樣本
`lognormal`([mean, sigma, size])	對數正態分佈
`logseries`(p[, size])	對數級數分佈。
`multinomial`(n, pvals[, size])	多項分佈
`multivariate_normal`(mean, cov[, size])	多元正態分佈。 >>> mean = [0,0] >>> cov = [[1,0],[0,100]] # diagonal covariance, points lie on x or y-axis >>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, ‘x‘); plt.axis(‘equal‘); plt.show()
`negative_binomial`(n, p[, size])	負二項分佈
`noncentral_chisquare`(df, nonc[, size])	非中心卡方分佈
`noncentral_f`(dfnum, dfden, nonc[, size])	非中心F分佈
`normal`([loc, scale, size])	正態(高斯)分佈 Notes The probability density for the Gaussian distribution is where $技術分享$ is the mean and $技術分享$ the standard deviation. The square of the standard deviation, $技術分享$ , is called the variance. The function has its peak at the mean, and its 「spread」 increases with the standard deviation (the function reaches 0.607 times its maximum at $技術分享$ and $技術分享$ [R217]). Examples Draw samples from the distribution: >>> mu, sigma = 0, 0.1 # mean and standard deviation >>> s = np.random.normal(mu, sigma, 1000) Verify the mean and the variance: >>> abs(mu - np.mean(s)) < 0.01 True >>> abs(sigma - np.std(s, ddof=1)) < 0.01 True Display the histogram of the samples, along with the probability density function: >>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 30, normed=True) >>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * ... np.exp( - (bins - mu)*2 / (2 sigma**2) ), ... linewidth=2, color=‘r‘) >>> plt.show()
`pareto`(a[, size])	帕累託（Lomax）分佈
`poisson`([lam, size])	泊松分佈
`power`(a[, size])	Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
`rayleigh`([scale, size])	Rayleigh 分佈
`standard_cauchy`([size])	標準柯西分佈
`standard_exponential`([size])	標準的指數分佈
`standard_gamma`(shape[, size])	標準伽馬分佈
`standard_normal`([size])	標準正態分佈 (mean=0, stdev=1).
`standard_t`(df[, size])	Standard Student’s t distribution with df degrees of freedom.
`triangular`(left, mode, right[, size])	三角形分佈
`uniform`([low, high, size])	均勻分佈
`vonmises`(mu, kappa[, size])	von Mises分佈
`wald`(mean, scale[, size])	瓦爾德（逆高斯）分佈
`weibull`(a[, size])	Weibull 分佈
`zipf`(a[, size])	齊普夫分佈

random的的分佈，其實就是隨機生成器的完善版，好比：

numpy.random.uniform介紹：
1. 函數原型： numpy.random.uniform(low,high,size)
功能：從一個均勻分佈[low,high)中隨機採樣，注意定義域是左閉右開，即包含low，不包含high.
參數介紹:
    low: 採樣下界，float類型，默認值爲0；
    high: 採樣上界，float類型，默認值爲1；
    size: 輸出樣本數目，爲int或元組(tuple)類型，例如，size=(m,n,k), 則輸出m*n*k個樣本，缺省時輸出1個值。

2. 相似uniform,還有如下隨機數產生函數：
    a. randint: 原型：numpy.random.randint(low, high=None, size=None, dtype='l')，產生隨機整數；
    b. random_integers: 原型： numpy.random.random_integers(low, high=None, size=None)，在閉區間上產生隨機整數；
    c. random_sample: 原型： numpy.random.random_sample(size=None)，在[0.0,1.0)上隨機採樣；
    d. random: 原型： numpy.random.random(size=None)，和random_sample同樣，是random_sample的別名；
    e. rand: 原型： numpy.random.rand(d0, d1, ..., dn)，產生d0 - d1 - ... - dn形狀的在[0,1)上均勻分佈的float型數。
    f. randn: 原型：numpy.random.randn（d0,d1,...,dn),產生d0 - d1 - ... - dn形狀的標準正態分佈的float型數

# -*- coding: utf-8 -*-  
import matplotlib.pyplot as plt  
import numpy as np  
  
s = np.random.uniform(0,1,1200)      # 產生1200個[0,1)的數  
count, bins, ignored = plt.hist(s, 12, normed=True)  
 """ 
 hist原型： 
         matplotlib.pyplot.hist(x, bins=10, range=None, normed=False, weights=None, 
         cumulative=False, bottom=None, histtype='bar', align='mid',  
         orientation='vertical',rwidth=None, log=False, color=None, label=None,  
         stacked=False, hold=None,data=None,**kwargs) 
 
 輸入參數不少，具體查看matplotlib.org,本例中用到3個參數，分別表示：s數據源，bins=12表示bin  
 的個數，即畫多少條條狀圖，normed表示是否歸一化，每條條狀圖y座標爲n/(len(x)`dbin),整個條狀圖積分值爲1 
 
 輸出：count表示數組，長度爲bins，裏面保存的是每一個條狀圖的縱座標值 
      bins:數組，長度爲bins+1,裏面保存的是全部條狀圖的橫座標，即邊緣位置 
      ignored: patches，即附加參數，列表或列表的列表，本例中沒有用到。 
"""  
plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')  
plt.show()

在分佈中標準正態分佈和正態分佈很重要，由於分佈狀態相似高斯分佈，這個在數據樣本產生中常用。

normal和standnormal，可是這些能夠經過randn生成。

隨機數生成器

RandomState	Container for the Mersenne Twister pseudo-random number generator.
`seed`([seed])	Seed the generator.
`get_state`()	Return a tuple representing the internal state of the generator.
`set_state`(state)	Set the internal state of the generator from a tuple.

numpy.random.seed()是個頗有意思的方法，它可使屢次生成的隨機數相同。

若是在seed()中傳入的數字相同，那麼接下來使用random()或者rand()方法所生成的隨機數序列都是相同的（僅限使用一次random()或者rand()方法，第二次以及更屢次仍然是隨機的數字），知道改變傳入seed()的值，之後再改回來，random()生成的隨機數序列仍然與以前所產生的序列相同

import numpy as np
np.random.seed(5)
for i in range(5):
    print(np.random.random())

若是這樣設置，則seed只起到第一次做用，後續隨機數則同。若是代碼爲：

import numpy as np

for i in range(5):
    np.random.seed(5)
    print(np.random.random())

這樣則隨機數產生的是相同的，因此若是想生成相同的隨機數，必須在生成前佈下種子。

參考文獻

一、爲何你用很差Numpy的random函數？

二、Numpy之random學習

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。