目前,作數據分析工做,基本人手Numpy,pandas,scikit-learn。而這些計算程序包都是基於python平臺的,因此搞數據的都得先裝個python環境。。。(固然,你用R或Julia請忽略本文)html
在macOS上,默認安裝有python 2.7,鑑於python2即將中止更新,若是沒有大量的python2代碼須要維護,就直接安裝python3吧。python
作數據運算,流行的方式是直接下載Anaconda安裝包,大概500M左右,各類依賴包(綁定了四五百個科學計算程序包),開發工具(jupyter notebook,spyder)一古腦兒都包含了,按照步驟安裝完成,開箱即用,不過裝完後會佔用幾個G的硬盤空間。shell
我這邊因爲硬盤空間有限,採用Miniconda這個發行版本,最新的基於python3.7版本的不到50M。而Miniconda同樣使用conda做爲包管理器,能夠輕鬆的安裝本身須要的包,例如Numpy,pandas, matplotlib等等。瀏覽器
固然,也能夠從安裝包或homebrew開始裝,而後再使用pip來安裝相關的程序包。整體上來講,python自身的版本和執行路徑是至關混亂的,可參考下圖。 bash
下載完成後,能夠先覈對下hash值,與官網的值(5cf91dde8f6024061c8b9239a1b4c34380238297adbdb9ef2061eb9d1a7f69bc)是否一致保證安裝文件未被篡改。服務器
$ shasum -a 256 Miniconda3-latest-MacOSX-x86_64.sh 5cf91dde8f6024061c8b9239a1b4c34380238297adbdb9ef2061eb9d1a7f69bc Miniconda3-latest-MacOSX-x86_64.sh
$ bash ./Miniconda3-latest-MacOSX-x86_64.sh Welcome to Miniconda3 4.7.12 In order to continue the installation process, please review the license agreement. Please, press ENTER to continue Do you accept the license terms? [yes|no] [no] >>> yes Miniconda3 will now be installed into this location: /Users/shenfeng/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/Users/shenfeng/miniconda3] >>> >>>
按照提示,敲擊回車。中間須要贊成使用條款,須要輸入yes
,按照路徑點回車默認便可。工具
Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no] [yes] >>> yes ==> For changes to take effect, close and re-open your current shell. <== If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false: conda config --set auto_activate_base false Thank you for installing Miniconda3!
最後的提示是,能夠用conda config --set auto_activate_base false
命令取消python3環境在啓動時自行加載。開發工具
(base) my:~ shenfeng$ python Python 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
$ conda env list # conda environments: # base * /Users/shenfeng/miniconda3
使用conda deactivate
能夠python3的執行環境,使用conda activate base
能夠激活默認的python3環境。ui
.condarc
文件。channels: - defaults show_channel_urls: true default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
conda list
檢查已經安裝的包,使用conda install
須要的程序包$ $ conda list numpy # packages in environment at /Users/shenfeng/miniconda3: # # Name Version Build Channel $ conda install numpy $ conda list numpy # packages in environment at /Users/shenfeng/miniconda3: # # Name Version Build Channel numpy 1.17.3 py37h4174a10_0 defaults numpy-base 1.17.3 py37h6575580_0 defaults
相同的方式,咱們能夠安裝scipy,pandas等包,再也不贅述。this
你們耳熟能詳的交互式工具確定就是Jupyter notebook,但我在本機一樣因爲磁盤空間問題只安裝ipython。實際上,Jupyter是基於ipython notebook的瀏覽器版本。
$ conda install ipython
$ ipython Python 3.7.4 (default, Aug 13 2019, 15:17:50) Type 'copyright', 'credits' or 'license' for more information IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import numpy as np In [2]: dataset= [2,6,8,12,18,24,28,32] In [3]: sd= np.std(dataset,ddof=1) In [4]: print(sd) 10.977249200050075
先從網上下載一個樣例數據https://pan.baidu.com/s/1lXAnyvSoti-U44MU2fubgw,爲excel文件,另存爲成csv進行處理。
如下結合上週文章中的歸納性度量,計算這組數據的歸納性度量。
import numpy as np from scipy import stats dataset = np.genfromtxt('/Users/shenfeng/Downloads/test1.csv',delimiter=',', skip_header=1) print('Shape of numpy array: ', dataset.shape) Shape of numpy array: (699,)
mode = stats.mode(dataset) print('該組數據的衆數爲: ', mode) 該組數據的衆數爲: ModeResult(mode=array([1.]), count=array([145])) # 結果說明衆數爲1,出現了145次
print('該組數據的中位數爲: ', np.median(dataset)) 該組數據的中位數爲: 4.0
# 不須要提早排序 print("1/4分位數: ", np.percentile(dataset, 25, interpolation='linear')) 1/4分位數: 2.0 print("1/2分位數: ", np.percentile(dataset, 50, interpolation='linear')) 1/2分位數: 4.0 print("3/4分位數: ", np.percentile(dataset, 75, interpolation='linear')) 3/4分位數: 6.0
print('該組數據的平均數爲: ', np.mean(dataset)) 該組數據的平均數爲: 4.417739628040057
print('該組數據的整體標準差爲: ', np.std(dataset,ddof=0)) 該組數據的整體標準差爲: 2.8137258170785375
# 變量值與其平均數的離差除以標準差後的稱爲標準分數(standard score) print('該組數據的標準分數爲: ', stats.zscore(dataset)) 該組數據的標準分數爲: [ 0.20693572 0.20693572 -0.50386559 0.56233637 -0.14846494 1.27313768 -1.2146669 -0.85926625 -0.85926625 -0.14846494 -1.2146669 -0.85926625 ...省略 ]
# 離散係數是測度數據離散程度的統計量,主要用於比較不一樣樣本數據的離散程度。 print('該組數據的離散係數爲: ', stats.variation(dataset)) 該組數據的離散係數爲: 0.6369152675317026
import matplotlib.pyplot as plt plt.style.use('ggplot') plt.hist(dataset, bins=30)
得到如下分佈圖
print('該組數據的偏態係數爲: ', stats.skew(dataset)) 該組數據的偏態係數爲: 0.5915855449527385 # 偏態係數在0.5~1或-1~-0.5之間,則認爲是中等偏態分佈
print('該組數據的峯態係數爲: ', stats.kurtosis(dataset)) 該組數據的峯態係數爲: -0.6278342838815454 # 當K<0時爲扁平分佈,數據的分佈更分散
本文使用Miniconda發行版配置本地數據運算環境,並對樣例作數據的歸納性度量。
原文出處:https://www.cnblogs.com/shenfeng/p/install_miniconda_on_mac.html