pandas練習(二)------ 數據過濾與排序

數據過濾與排序------探索2012歐洲盃數據

相關數據見(githubhtml

步驟1 - 導入pandas庫

import pandas as pd

步驟2 - 數據集

path2 = "./data/Euro2012.csv"      # Euro2012.csv

步驟3 - 將數據集命名爲euro12

euro12 = pd.read_csv(path2) euro12.tail()

輸出:python

步驟4 選取 Goals 這一列

euro12.Goals  # euro12['Goals'] 

輸出:git

步驟5 有多少球隊參與了2012歐洲盃?

euro12.shape[0]

輸出:github

16

步驟6 該數據集中一共有多少列(columns)?

euro12.info()

輸出:ide

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team                          16 non-null object
Goals                         16 non-null int64
Shots on target               16 non-null int64
Shots off target              16 non-null int64
Shooting Accuracy             16 non-null object
% Goals-to-shots              16 non-null object
Total shots (inc. Blocked)    16 non-null int64
Hit Woodwork                  16 non-null int64
Penalty goals                 16 non-null int64
Penalties not scored          16 non-null int64
Headed goals                  16 non-null int64
Passes                        16 non-null int64
Passes completed              16 non-null int64
Passing Accuracy              16 non-null object
Touches                       16 non-null int64
Crosses                       16 non-null int64
Dribbles                      16 non-null int64
Corners Taken                 16 non-null int64
Tackles                       16 non-null int64
Clearances                    16 non-null int64
Interceptions                 16 non-null int64
Clearances off line           15 non-null float64
Clean Sheets                  16 non-null int64
Blocks                        16 non-null int64
Goals conceded                16 non-null int64
Saves made                    16 non-null int64
Saves-to-shots ratio          16 non-null object
Fouls Won                     16 non-null int64
Fouls Conceded                16 non-null int64
Offsides                      16 non-null int64
Yellow Cards                  16 non-null int64
Red Cards                     16 non-null int64
Subs on                       16 non-null int64
Subs off                      16 non-null int64
Players Used                  16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB

步驟7 將數據集中的列Team, Yellow Cards和Red Cards單獨存爲一個名叫discipline的數據框

discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']] discipline

輸出:ui

 

步驟8 對數據框discipline按照先Red Cards再Yellow Cards進行排序

discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending = False)

 輸出:spa

 

步驟9 計算每一個球隊拿到的黃牌數的平均值

round(discipline['Yellow Cards'].mean())

輸出:3d

7.0

步驟10 找到進球數Goals超過6的球隊數據

euro12[euro12.Goals > 6]

輸出:code

步驟11 選取以字母G開頭或以e結尾的球隊數據

# euro12[euro12.Team.str.startswith('G')]
euro12[euro12.Team.str.endswith('e')]  # 以字母e結束的球隊

輸出:htm

步驟12 選取前7列

euro12.iloc[: , 0:7]

輸出:

步驟13 選取除了最後3列以外的所有列

euro12.iloc[: , :-3]

輸出:

 

步驟14 找到英格蘭(England)、意大利(Italy)和俄羅斯(Russia)的命中率(Shooting Accuracy)

euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]

輸出:

 

參考連接:

一、http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook

二、https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/

三、https://github.com/guipsamora/pandas_exercises

相關文章
相關標籤/搜索