Python學習------pandas模塊整理筆記

時間 2020-05-18

原文原文鏈接

Introduce:pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.(details transfer to http://pandas.pydata.org/index.html)html

The two primary data structures of pandas :Series and DataFramepython

Download and Import api

pip3 install pandas(用pip3導入，其餘方法見官網)
 
import pandas as pd
 
import numpy as np
 
import matplotlib.pyplot as plt
 
#------查看pandas的版本
 
print(pd__version__)

instructions:pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries數組

組成：數據結構

一組帶標籤的數組數據結構，主要是Series和DataFrame。
索引對象啓用簡單軸索引和多級/分層軸索引。
引擎的集成組，用於聚合和轉換數據集。
日期範圍生成（date_range）和自定義日期偏移，可實現自定義頻率。
輸入/輸出工具：從平面文件（CSV，分隔符，Excel 2003）加載表格數據，以及從快速有效的PyTables / HDF5格式保存和加載pandas對象。
內存高效的「稀疏」版本的標準數據結構，用於存儲大部分缺失或大部分不變的數據（某些固定值）。
移動窗口統計（滾動平均值，滾動標準誤差等)
詳細介紹----------Seriesapp

1.Series:which is a single column. A DataFrame contains one or more Series and a name for each Series.dom

series是一維標記的數組，可以保存任何數據類型（整數，字符串，浮點數，Python對象等）。軸標籤統稱爲索引。函數

2.調用:工具

series(data, index=index，name)

在這裏，data:能夠有不少不一樣的東西：一個Python字典,一個ndarray,標量值（如5）,index:傳遞的索引是軸標籤列表,name:系統將自動分配，用Series.name查看，用Series.rename重命名ui

注意：NaN（不是數字）是pandas中使用的標準缺失數據標記。

3.切片操做和字典操做:

#知足Python類型的切片操做和字典操做均可以在pandas裏實現
s[0]
0.469112299907186
 
s[:3] 
 
a    0.4691
b   -0.2829
c   -1.5091
 
s['a']
0.46911229990718628
 
 
s['e'] = 12
print(s)
 
a     0.4691
b    -0.2829
c    -1.5091
d    -1.1356
e    12.0000
 
#其餘操做
s + s
s * 2
np.exp(s)
s[1:] + s[:-1]

詳細介紹---------DataFrame

1.DataFrame:which you can imagine as a relational data table, with rows and named columns

DataFrame是一個二維標記數據結構，具備可能不一樣類型的列。您能夠將其視爲電子表格或SQL表，或Series對象的字典。它一般是最經常使用的pandas對象。

2.調用:

DataFrame(sequence,index,column,name)：
index：行索引。
columns：列索引。
values：值的二維數組。
name：名字。

date=pd.date_range('20170101',periods=6)
date
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06'],
              dtype='datetime64[ns]', freq='D')
 
 
 
 
 
#---------利用numpy模塊
pd.DataFrame(np.random.randn(6,4),index=date,columns=['a','b','c','d'])
df
                 a         b         c         d
2017-01-01  -1.993447  1.272175 -1.578337 -1.972526
2017-01-02   0.092701 -0.503654 -0.540655 -0.126386
2017-01-03   0.191769 -0.578872 -1.693449  0.457891
2017-01-04   2.121120  0.521884 -0.419368 -1.916585
2017-01-05   1.642063  0.222134  0.108531 -1.858906
2017-01-06   0.636639  0.487491  0.617841 -1.597920
 
#------利用字典
 
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']
 
df = pd.DataFrame(d)
 
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
 
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
 
   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN

But most of the time, you load an entire file into a DataFrame. The following example loads a file with California housing data. Run the following cell to load the data and create feature definitions:

california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/mledu-datasets/california_housing_train.csv", sep=",")

3 經常使用屬性和方法(將DataFrame看作是索引爲列名和其對應的seires構成的字典集)

DataFrame.dtypes:查看每列的數據類型

DataFrame.index:查看行名

DataFrame.column:查看列名

DataFrame.values:查看

DataFrame.iloc[loc]:Select row by integer location

DataFrame.loc[label]:Select row by label

切片操做和索引操做，相似Series

增刪改查，相似python的dict對象（insert(),pop()......）

DataFrame.head(n):讀取頭n條數據，n默認5行

DataFrame.tail(n):讀取末尾n條數據

california_housing_dataframe.hist('housing_median_age'):顯示某一列的直方圖

DataFrame.assign(new column=expr):增長新的列

DataFrame.T(transpose):數據的轉置

DataFrame.describe():查看數據的統計結果

DataFrame.sort_index(axis=[0|1],ascending=[false|true]):0表明行，1表明列，對數據進行排序

DataFrame.idxmin([axis=0|1]):返回dateframe中最小的值，若是axis=0，返回每列中的最小值的索引，若是axis=1，返回每行中的最小值的索引

DataFrame.idxmax([axis=0|1])：返回dataframe中最小的值的索引

Series.value_counts():返回各個值的個數

DataFrame.mode():返回dataframe或series中最多見的值

DataFrame.apply(function,axis):function爲應用在每行或每列的函數，根據應用的函數返回結果

以上內容閱讀其餘做者整理而成，十分感謝！

原文：https://blog.csdn.net/qq_38420451/article/details/81357158

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。