在windows下安裝pandas,除了安裝pandas外,還需把用到的相關包都裝上,共須要安裝以下包:python
pyparsing-2.0.7-py2.py3-none-any.whlwindows
matplotlib-1.5.1-cp27-none-win32.whl
openpyxl-2.3.2-py2.py3-none-any.whl數組
setuptools-19.2-py2.py3-none-any.whl函數
numpy-1.10.4+mkl-cp27-none-win32.whlspa
six-1.10.0-py2.py3-none-any.whl.net
python_dateutil-2.4.2-py2.py3-none-any.whlcode
這些安裝包的下載地址是:
http://www.lfd.uci.edu/~gohlke/pythonlibsblog
若是搜不到原包,只要是這個模塊就行索引
pip install 模塊名three
****************************************************************************************************************
numpy
import numpy as np In [16]: data2 = [[1, 2, 3, 4], [5, 6, 7, 8]] In [17]: arr2 = np.array(data2) In [18]: arr2 Out[18]:array([[1, 2, 3, 4], [5, 6, 7, 8]]) 查看長度,結構,數據類型dtype In [19]: arr2.ndim Out[19]: 2 In [20]: arr2.shape Out[20]: (2, 4) In [21]: arr2.dtype 改變dtype In [38]: numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_) In [39]: numeric_strings.astype(float) Out[39]: array([ 1.25, -9.6 , 42. ]) In [57]: arr_slice = arr[5:8] In [58]: arr_slice[1] = 12345 In [59]: arr Out[59]: array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8, 9]) In [140]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5]) In [141]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5]) In [142]: cond = np.array([True, False, True, True, False]) 假如咱們想要當對應的 cond 值爲 True 時,從 xarr 中獲取一個值,不然從 yarr 中獲取值 In [145]: result = np.where(cond, xarr, yarr) In [146]: result Out[146]: array([ 1.1, 2.2, 1.3, 1.4, 2.5]) np.where 的第一個和第二個參數不須要是數組;它們中的一個或兩個能夠是純量。 在數據分析中 where 的典型使用是生成一個新的數組,其值基於另外一個數組。假如你有一個矩陣,其數據是隨機生成的,你想要把其中的正值替換爲2,負值替換爲-2,使用 np.where 很是容易: In [147]: arr = randn(4, 4) In [148]: arr Out[148]: array([[ 0.6372, 2.2043, 1.7904, 0.0752], [-1.5926, -1.1536, 0.4413, 0.3483], [-0.1798, 0.3299, 0.7827, -0.7585], [ 0.5857, 0.1619, 1.3583, -1.3865]]) In [149]: np.where(arr > 0, 2, -2) Out[149]: array([[ 2, 2, 2, 2], [-2, -2, 2, 2], [-2, 2, 2, -2], [ 2, 2, 2, -2]]) In [150]: np.where(arr > 0, 2, arr) # 僅設置正值爲 2 Out[150]: array([[ 2. , 2. , 2. , 2. ], [-1.5926, -1.1536, 2. , 2. ], [-0.1798, 2. , 2. , -0.7585], [ 2. , 2. , 2. , -1.3865]])
pandas---應用
In [1]: from pandas import Series, DataFrame In [2]: import pandas as pd
Series
列表創建(index長度和列表長度必須同樣)
In [8]: obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
字典創建(index能夠是字典的部分key)
In [20]: sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000} In [21]: obj3 = Series(sdata,index=['Ohio','Texas'])
檢索
In [11]: obj2['a'] Out[11]: -5 In [12]: obj2['d'] = 6 In [13]: obj2[['c', 'a', 'd']] #比列表多一項 In [13]: obj2['a':'d'] #包括結束點d In [15]: obj2[obj2 > 0] In [16]: obj2 * 2 In [17]: np.exp(obj2) In [18]: 'b' in obj2 #字典的性質 Out[18]: True In [19]: 'e' in obj2 Out[19]: False In[69]: for i in obj3: ... print i ... 35000 #輸出值並不是索引 71000
改變索引
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan'] #只能以列表的形式改變,不能是obj.index[1]='b'單獨改變 改變索引長度及值 In [81]: obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e']) In [82]: obj2 Out[82]: a -5.3 b 7.2 c 3.6 d 4.5 e NaN In [83]: obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0) Out[83]: a -5.3 b 7.2 c 3.6 d 4.5 e 0.0
檢索
DataFrame
列表創建(index長度和列表長度必須同樣,columns可多可少)
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} In [40]: frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'], ....: index=['one', 'two', 'three', 'four', 'five']) In [41]: frame2 Out[41]: year state pop debt one 2000 Ohio 1.5 NaN two 2001 Ohio 1.7 NaN three 2002 Ohio 3.6 NaN four 2001 Nevada 2.4 NaN five 2002 Nevada 2.9 NaN
字典創建(index能夠是字典的部分key)
In [57]: pop = {'Nevada': {2001: 2.4, 2002: 2.9}, ....: 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}} In [58]: frame3 = DataFrame(pop) Out[87]: Nevada Ohio 2000 NaN 1.5 2001 2.4 1.7 2002 2.9 3.6
檢索:
columns檢索 In [43]: frame2['state'] In [44]: frame2.year frame2[['state','year']] index檢索 In [45]: frame2.ix['three'] frame2[:2] In[106]: frame2.ix[['one','two']] 按index的'one','two'索引 ,選取columns的前兩個 In[105]: frame2.ix[['one','two'],:2] Out[105]: year state one 2000 Ohio two 2001 Ohio 按index的'one','two'索引 ,選取columns的pop列 In[107]: frame2.ix[['one','two'],'pop'] 按index的'one','two'索引 ,選取columns的1,2,3列 In[108]: frame2.ix[['one','two'],[1,2,3]] In[109]: frame2.ix[['one','two'],['state','pop','debt']] Out[108/109]: state pop debt one Ohio 1.5 0 two Ohio 1.7 0 In[110]: frame2.ix['three',['state','pop','debt']] 按index的'one','two''three'索引 ,選取columns的1,2,3列 In[111]: frame2.ix[:'three',['state','pop','debt']] Out[111]: state pop debt one Ohio 1.5 0 two Ohio 1.7 0 three 0 0.0 0 In[104]: frame2.ix[frame2['pop']>0,:2] Out[104]: year state one 2000 Ohio two 2001 Ohio four 2001 Nevada 聯合檢索: frame3['Nevada'][:2] 函數檢索(frame3不會改變) In[88]: frame3.drop(2000) Out[88]: Nevada Ohio 2001 2.4 1.7 2002 2.9 3.6 In[89]: frame3.drop('Ohio',axis=1)# Out[89]: Nevada 2000 NaN 2001 2.4 2002 2.9 條件檢索 In[94]: frame2[frame2['year']>2001] Out[94]: year state pop debt three 2002 Ohio 3.6 0 five 2002 Nevada 2.9 0
改變一列:
In [46]: frame2['debt'] = 16.5 In [48]: frame2['debt'] = range(5) 該表一列的某些值 In [50]: val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five']) In [51]: frame2['debt'] = val 隨即改變 frame2[frame2['year']>2001]=0 添加一列: In[46]: frame2['gyf']=[1,2,3,4,5] In[70]: del frame2['gyf']
改變索引
In [86]: frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], ....: columns=['Ohio', 'Texas', 'California']) In [87]: frame Out[87]: Ohio Texas California a 0 1 2 c 3 4 5 d 6 7 8 改變index In [88]: frame2 = frame.reindex(['a', 'b', 'c', 'd']) In [89]: frame2 Out[89]: Ohio Texas California a 0 1 2 b NaN NaN NaN c 3 4 5 d 6 7 8 改變columns索引: In [90]: states = ['Texas', 'Utah', 'California'] In [91]: frame.reindex(columns=states) Out[91]: Texas Utah California a 1 NaN 2 c 4 NaN 5 d 7 NaN 8 In [92]: frame.reindex(index=['a', 'b', 'c', 'd'], method='ffill', ....: columns=states) Out[92]: Texas Utah California a 1 NaN 2 b 1 NaN 2 c 4 NaN 5 d 7 NaN 8
轉置:
In [60]: frame3.T
先後向填充(ffill,bfill)
In [84]: obj3 = Series(['blue', 'purple', 'yellow'], index=[0, 2, 4]) In [85]: obj3.reindex(range(6), method='ffill') Out[85]: 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow
更多信息關注博主:http://my.oschina.net/lionets/blog 的DA & ML系列文章