數據分析大數據之路五 pandas 報表

時間 2019-12-11

標籤數據分析之路 pandas 報表简体版

原文原文鏈接

pandas: 在內存中或對象，會有一套基於對象屬性的方法，能夠視爲 pandas 是一個存儲一維表，二維表，三維表的工具，json

主要以二維表爲主

一維的表，　　　　　　（系列(Series)）工具

二維的表，DataFrame，也叫報表excel

三維的表，（面板(Panel)）對象

文本格式：blog

CSV 以文本方式存儲， item 之間用逗號分割，記錄與記錄之間以回車分開 , 能夠用 excel 方式打開索引

json 格式 , 以 key ，value 方式存儲內存

import numpy as np
import pandas as pd

# data 裏的 key 能夠當作是表頭，
data = {
    'animal   ': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
    'age      ': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
    'visits'   : [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
    'priority' : ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
    }

# 給每一條記錄起個別名
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data, index=labels)
print(df)

   age animal priority  visits
a  2.5    cat      yes       1
b  3.0    cat      yes       3
c  0.5  snake       no       2
d  NaN    dog      yes       3
e  5.0    dog       no       2
f  2.0    cat       no       3
g  4.5  snake       no       1
h  NaN    cat      yes       1
i  7.0    dog       no       2
j  3.0    dog       no       1

　　df.head() ， head() 默認輸出前 5 條記錄pandas

　　df [1:5] 也能夠經過切片方式操做（行索引）it

　　df [['age', 'animal']] （列索引）io

　　 df.iloc[0:3, 0:3] 指定行，列輸出

   age       animal    priority
a        2.5       cat      yes
b        3.0       cat      yes
c        0.5     snake       no

缺失數據/異常數據處理
Ø 找到缺失值
df[df['age'].isnull()]

填充缺失值
df['age'].fillna(0, inplace=True)

將字符值替換成布爾值
df['priority'] = df['priority'].map({'yes': True, 'no': False})

2.4 可

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

數據分析 大數據之路 五 pandas 報表

主要以二維表爲主

數據分析大數據之路五 pandas 報表