用Python作統計分析 (Scipy.stats的文檔)-轉載

轉載自https://www.cnblogs.com/jkmiao/p/5200635.htmlhtml

這個文檔說了如下內容,對python如何作統計分析感興趣的人能夠看看,畢竟Python的庫也有點亂。有的看上去應該在一塊兒的內容分散在scipy,pandas,sympy等庫中。這裏是通常統計功能的使用,在scipy庫中。像什麼時間序列之類的固然在其餘地方,並且它們反過來就沒這些功能。python

隨機變量樣本抽取
84個連續性分佈(告訴你有那麼多,沒具體介紹)
12個離散型分佈
分佈的密度分佈函數,累計分佈函數,殘存函數,分位點函數,逆殘存函數
分佈的統計量:均值,方差,峯度,偏度,矩
分佈的線性變換生成
數據的分佈擬合
分佈構造
描述統計
t檢驗,ks檢驗,卡方檢驗,正態性檢,同分布檢驗
核密度估計(從樣本估計機率密度分佈函數)數組


Statistics (scipy.stats)
Introduction
介紹
In this tutorial we discuss many, but certainly not all, features of scipy.stats. The intention here is to provide a user with a working knowledge of this package. We refer to the reference manual for further details.
在這個教程咱們討論一些而非所有的scipy.stats模塊的特性。這裏咱們的意圖是提供給使用者一個關於這個包的實用性知識。咱們推薦reference manual來介紹更多的細節。
Note: This documentation is work in progress.
注意:這個文檔還在發展中。
Random Variables
隨機變量
There are two general distribution classes that have been implemented for encapsulating continuous random variables anddiscrete random variables . Over 80 continuous random variables (RVs) and 10 discrete random variables have been implemented using these classes. Besides this, new routines and distributions can easily added by the end user. (If you create one, please contribute it).
有一些通用的分佈類被封裝在continuous random variables以及discrete random variables中。有80多個連續性隨機變量(RVs)以及10個離散隨機變量已經用這些類創建。一樣,新的程序和分佈能夠被用戶新建立(若是你建立了一個,請提供它幫助發展這個包)。
All of the statistics functions are located in the sub-package scipy.stats and a fairly complete listing of these functions can be obtained using info(stats). The list of the random variables available can also be obtained from the docstring for the stats sub-package.
全部統計函數被放在子包scipy.stats中,且有這些函數的一個幾乎完整的列表可使用info(stats)得到。這個列表裏的隨機變量也能夠從stats子包的docstring中得到介紹。
In the discussion below we mostly focus on continuous RVs. Nearly all applies to discrete variables also, but we point out some differences here: Specific Points for Discrete Distributions.
在接下來的討論中,沃恩着重於連續性隨機變量(RVs)。幾乎全部離散變量也符合下面的討論,可是咱們也要指出一些區別在Specific Points for Discrete Distributions中。app

Getting Help
得到幫助
First of all, all distributions are accompanied with help functions. To obtain just some basic information we can call
在開始前,全部分佈可使用help函數獲得解釋。爲得到這些信息只須要使用簡單的調用:
>>>
>>> from scipy import stats
>>> from scipy.stats import norm
>>> print norm.__doc__less

To find the support, i.e., upper and lower bound of the distribution, call:
爲了找到支持,做爲例子,咱們用這種方式找分佈的上下界
>>>
>>> print 'bounds of distribution lower: %s, upper: %s' % (norm.a,norm.b)
bounds of distribution lower: -inf, upper: infdom

We can list all methods and properties of the distribution with dir(norm). As it turns out, some of the methods are private methods although they are not named as such (their name does not start with a leading underscore), for example veccdf, are only available for internal calculation (those methods will give warnings when one tries to use them, and will be removed at some point).
咱們能夠經過調用dir(norm)來得到關於這個(正態)分佈的全部方法和屬性。應該看到,一些方法是私有方法儘管其並無以名稱表示出來(好比它們前面沒有如下劃線開頭),好比veccdf就只用於內部計算(試圖使用那些方法將引起警告,它們可能會在後續開發中被移除)
To obtain the real main methods, we list the methods of the frozen distribution. (We explain the meaning of a frozen distribution below).
爲了得到真正的主要方法,咱們列舉凍結分佈的方法(咱們將在下文解釋何謂「凍結分佈」)
>>>
>>> rv = norm()
>>> dir(rv)  # reformatted
    ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',
    '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',
    '__repr__', '__setattr__', '__str__', '__weakref__', 'args', 'cdf', 'dist',
    'entropy', 'isf', 'kwds', 'moment', 'pdf', 'pmf', 'ppf', 'rvs', 'sf', 'stats']ide

Finally, we can obtain the list of available distribution through introspection:
最後,咱們能經過內省得到全部的可用分佈。
>>>
>>> import warnings
>>> warnings.simplefilter('ignore', DeprecationWarning)
>>> dist_continu = [d for d in dir(stats) if
...                 isinstance(getattr(stats,d), stats.rv_continuous)]
>>> dist_discrete = [d for d in dir(stats) if
...                  isinstance(getattr(stats,d), stats.rv_discrete)]
>>> print 'number of continuous distributions:', len(dist_continu)
number of continuous distributions: 84
>>> print 'number of discrete distributions:  ', len(dist_discrete)
number of discrete distributions:   12函數

Common Methods
通用方法
The main public methods for continuous RVs are:
連續隨機變量的主要公共方法以下:
rvs: Random Variates
pdf: Probability Density Function
cdf: Cumulative Distribution Function
sf: Survival Function (1-CDF)
ppf: Percent Point Function (Inverse of CDF)
isf: Inverse Survival Function (Inverse of SF)
stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
moment: non-central moments of the distribution
rvs:隨機變量
pdf:機率密度函。
cdf:累計分佈函數
sf:殘存函數(1-CDF)
ppf:分位點函數(CDF的逆)
isf:逆殘存函數(sf的逆)
stats:返回均值,方差,(費舍爾)偏態,(費舍爾)峯度。
moment:分佈的非中心矩。
Let’s take a normal RV as an example.
讓咱們取得一個標準的RV做爲例子。
>>>
>>> norm.cdf(0)
0.5this

To compute the cdf at a number of points, we can pass a list or a numpy array.
爲了計算在一個點上的cdf,咱們能夠傳遞一個列表或一個numpy數組。
>>>
>>> norm.cdf([-1., 0, 1])
array([ 0.15865525,  0.5       ,  0.84134475])
>>> import numpy as np
>>> norm.cdf(np.array([-1., 0, 1]))
array([ 0.15865525,  0.5       ,  0.84134475])rest

Thus, the basic methods such as pdf, cdf, and so on are vectorized with np.vectorize.
Other generally useful methods are supported too:
相應的,像pdf,cdf之類的簡單方法能夠被矢量化經過np.vectorize.
其餘游泳的方法能夠像這樣使用。
>>>
>>> norm.mean(), norm.std(), norm.var()
(0.0, 1.0, 1.0)
>>> norm.stats(moments = "mv")
(array(0.0), array(1.0))

To find the median of a distribution we can use the percent point function ppf, which is the inverse of the cdf:
爲了找到一個分部的中心,咱們可使用分位數函數ppf,其是cdf的逆。
>>>
>>> norm.ppf(0.5)
0.0

To generate a set of random variates:
爲了產生一個隨機變量集合。
>>>
>>> norm.rvs(size=5)
array([-0.35687759,  1.34347647, -0.11710531, -1.00725181, -0.51275702])

Don’t think that norm.rvs(5) generates 5 variates:
不要認爲norm.rvs(5)產生了五個變量。
>>>
>>> norm.rvs(5)
7.131624370075814

This brings us, in fact, to the topic of the next subsection.
這個引導咱們能夠得以進入下一部分的內容。
Shifting and Scaling
位移與縮放(線性變換)
All continuous distributions take loc and scale as keyword parameters to adjust the location and scale of the distribution, e.g. for the standard normal distribution the location is the mean and the scale is the standard deviation.
全部連續分佈能夠操縱loc以及scale參數做爲修正location和scale的方式。做爲例子,標準正態分佈的location是均值而scale是標準差。
>>>
>>> norm.stats(loc = 3, scale = 4, moments = "mv")
(array(3.0), array(16.0))

In general the standardized distribution for a random variable X is obtained through the transformation (X - loc) / scale. The default values are loc = 0 and scale = 1.
一般經標準化的分佈的隨機變量X能夠經過變換(X-loc)/scale得到。它們的默認值是loc=0以及scale=1.
Smart use of loc and scale can help modify the standard distributions in many ways. To illustrate the scaling further, the cdf of an exponentially distributed RV with mean 1/λ is given by
F(x)=1−exp(−λx)
By applying the scaling rule above, it can be seen that by taking scale  = 1./lambda we get the proper scale.
聰明的使用loc與scale能夠幫助以靈活的方式調整標準分佈。爲了進一步說明縮放的效果,下面給出指望爲1/λ指數分佈的cdf。
F(x)=1−exp(−λx)
經過像上面那樣使用scale,能夠看到獲得想要的指望值。
>>>
>>> from scipy.stats import expon
>>> expon.mean(scale=3.)
3.0

The uniform distribution is also interesting:
均勻分佈也是使人感興趣的:
>>>
>>> from scipy.stats import uniform
>>> uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)
array([ 0.  ,  0.  ,  0.25,  0.5 ,  0.75,  1.  ])

Finally, recall from the previous paragraph that we are left with the problem of the meaning of norm.rvs(5). As it turns out, calling a distribution like this, the first argument, i.e., the 5, gets passed to set the loc parameter. Let’s see:
最後,聯繫起咱們在前面段落中留下的norm.rvs(5)的問題。事實上,像這樣調用一個分佈,其第一個參數,在這裏是5,是把loc參數調到了5,讓咱們看:
>>>
>>> np.mean(norm.rvs(5, size=500))
4.983550784784704

Thus, to explain the output of the example of the last section: norm.rvs(5) generates a normally distributed random variate with mean loc=5.I prefer to set the loc and scale parameter explicitly, by passing the values as keywords rather than as arguments. This is less of a hassle as it may seem. We clarify this below when we explain the topic of freezing a RV.在這裏,爲解釋最後一段的輸出:norm.rvs(5)產生了一個正態分佈變量,其指望,即loc=5.我傾向於明確的使用loc,scale做爲關鍵字而非參數。這看上去只是個小麻煩。咱們澄清這一點在咱們解釋凍結RV的主題以前。

相關文章
相關標籤/搜索