3-8 pivot操做

 

數據透視表

In [1]:
import pandas as pd
excelample=pd.DataFrame({'Month':["January","January","January","January",
                                "February", "February","February","February",
                                 "March","March","March","March"],
                        'Category':["Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment"],
                          'Amount':[74.,235.,175.,100.,115.,240.,225.,125.,390.,260.,200.,120.]})
In [2]:
excelample
Out[2]:
 
  Month Category Amount
0 January Transportation 74.0
1 January Grocery 235.0
2 January Household 175.0
3 January Entertainment 100.0
4 February Transportation 115.0
5 February Grocery 240.0
6 February Household 225.0
7 February Entertainment 125.0
8 March Transportation 390.0
9 March Grocery 260.0
10 March Household 200.0
11 March Entertainment 120.0
 

1.統計指標:每月的各個種類的花費:pivotjavascript

In [3]:
example_pivot=excelample.pivot(index='Category',columns='Month',values='Amount')
example_pivot
Out[3]:
 
Month February January March
Category      
Entertainment 125.0 100.0 120.0
Grocery 240.0 235.0 260.0
Household 225.0 175.0 200.0
Transportation 115.0 74.0 390.0
In [4]:
example_pivot.sum(axis=1)#計算每一個種類的總和
Out[4]:
Category
Entertainment     345.0
Grocery           735.0
Household         600.0
Transportation    579.0
dtype: float64
In [5]:
example_pivot.sum(axis=0)#每月的總和
Out[5]:
Month
February    705.0
January     584.0
March       970.0
dtype: float64
In [6]:
df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
df.head()#讀取前幾行數據
Out[6]:
 
  PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
 

2.經過性別索引,船艙的等級分類,統計不一樣性別在不一樣船艙的費用:pivot_table(默認求平均值)css

In [8]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare')#默認求平均值
Out[8]:
 
Pclass 1 2 3
Sex      
female 106.125798 21.970121 16.118810
male 67.226127 19.741782 12.661633
In [9]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='max')#求最大
Out[9]:
 
Pclass 1 2 3
Sex      
female 512.3292 65.0 69.55
male 512.3292 73.5 69.55
In [12]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='count')#求計數
Out[12]:
 
Pclass 1 2 3
Sex      
female 94 76 144
male 122 108 347
In [13]:
pd.crosstab(index=df['Sex'],columns=df['Pclass'])#pd.crosstab和df.pivot_table的count是同樣的效果
Out[13]:
 
Pclass 1 2 3
Sex      
female 94 76 144
male 122 108 347
 

3.求不一樣等級的艙位,不一樣性別的獲救機率html

In [14]:
df.pivot_table(index='Pclass',columns='Sex',values='Survived',aggfunc='mean')#求平均值的機率
Out[14]:
 
Sex female male
Pclass    
1 0.968085 0.368852
2 0.921053 0.157407
3 0.500000 0.135447
 

4.新加一列,計算未成年的,不一樣性別的獲救狀況機率html5

In [15]:
df['Underaged']=df['Age']<=18#新加一列
df.pivot_table(index='Underaged',columns='Sex',values='Survived',aggfunc='mean')#求平均值的機率
Out[15]:
 
Sex female male
Underaged    
False 0.760163 0.167984
True 0.676471 0.338028
相關文章
相關標籤/搜索