3-1 Pandas-概述

 Pandas章節應用的數據能夠在如下連接下載:javascript

https://files.cnblogs.com/files/AI-robort/Titanic_Data-master.zipcss

 

           Pandas:數據分析處理庫

In [1]:
import pandas as pd
In [4]:
df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
 

.head():能夠讀取前幾條數據,或指定前幾條均可以html

In [5]:
df.head(6)
Out[5]:
 
  PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
 

.info():返回當前的信息html5

In [6]:
df.info()
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
 

查看錶格的各項屬性和細節

In [7]:
df.index#索引值的屬性
Out[7]:
RangeIndex(start=0, stop=891, step=1)
In [8]:
df.columns#每一列的名字
Out[8]:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
In [9]:
df.dtypes#每一列的值的類型
Out[9]:
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object
In [10]:
df.values#每行的值
Out[10]:
array([[1, 0, 3, ..., 7.25, nan, 'S'],
       [2, 1, 1, ..., 71.2833, 'C85', 'C'],
       [3, 1, 3, ..., 7.925, nan, 'S'],
       ...,
       [889, 0, 3, ..., 23.45, nan, 'S'],
       [890, 1, 1, ..., 30.0, 'C148', 'C'],
       [891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
 

本身建立data_frame數據java

In [11]:
data={'country':['aaa','bbb','ccc'],'population':[10,12,14]}
df_data=pd.DataFrame(data)
df_data
Out[11]:
 
  country population
0 aaa 10
1 bbb 12
2 ccc 14
In [12]:
df_data.info()
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
country       3 non-null object
population    3 non-null int64
dtypes: int64(1), object(1)
memory usage: 128.0+ bytes
In [15]:
age=df['Age']#搜索對應的一列
age[:5]#顯示前5行數據
Out[15]:
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
 

series:dataframe中的一行/列python

In [16]:
age.index
Out[16]:
RangeIndex(start=0, stop=891, step=1)
In [17]:
age.values[:5]
Out[17]:
array([22., 38., 26., 35., 35.])
In [18]:
df.head()
Out[18]:
 
  PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [19]:
df['Age'][:5]
Out[19]:
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
 

改變索引對象jquery

In [20]:
df=df.set_index('Name')
df.head()
Out[20]:
 
  PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
Name                      
Braund, Mr. Owen Harris 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN S
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
Heikkinen, Miss. Laina 3 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
Allen, Mr. William Henry 5 0 3 male 35.0 0 0 373450 8.0500 NaN S
In [21]:
df['Age'][:5]
Out[21]:
Name
Braund, Mr. Owen Harris                                22.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer)    38.0
Heikkinen, Miss. Laina                                 26.0
Futrelle, Mrs. Jacques Heath (Lily May Peel)           35.0
Allen, Mr. William Henry                               35.0
Name: Age, dtype: float64
In [25]:
age=df['Age']
age[:5]
Out[25]:
Name
Braund, Mr. Owen Harris                                22.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer)    38.0
Heikkinen, Miss. Laina                                 26.0
Futrelle, Mrs. Jacques Heath (Lily May Peel)           35.0
Allen, Mr. William Henry                               35.0
Name: Age, dtype: float64
In [26]:
age['Allen, Mr. William Henry']#索引名字對應的值
Out[26]:
35.0
In [27]:
age=age+10
age[:5]
Out[27]:
Name
Braund, Mr. Owen Harris                                32.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer)    48.0
Heikkinen, Miss. Laina                                 36.0
Futrelle, Mrs. Jacques Heath (Lily May Peel)           45.0
Allen, Mr. William Henry                               45.0
Name: Age, dtype: float64
 

對值統計指標linux

In [28]:
age.mean()
Out[28]:
39.69911764705882
In [29]:
age.max()
Out[29]:
90.0
In [30]:
age.min()
Out[30]:
10.42
In [31]:
df.describe()####總體一次性統計各項的指標基本統計特性
Out[31]:
 
  PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
相關文章
相關標籤/搜索