1、建立Series
參數
- Series (Series)是可以保存任何類型的數據(整數,字符串,浮點數,Python對象等)的一維標記數組。軸標籤統稱爲索引。 - data 參數 - index 索引 索引值必須是惟一的和散列的,與數據的長度相同。 默認np.arange(n)若是沒有索引被傳遞。 - dtype 輸出的數據類型 若是沒有,將推斷數據類型 - copy 複製數據 默認爲false
數組建立
data = ['a','b','c','d','e'] res= pd.Series(data,index=[i for i in range(1,6)],dtype=str) print(res) 1 a 2 b 3 c 4 d 5 e dtype: object
字典建立
data = {"a":1.,"b":2,"c":3,"d":4} res = pd.Series(data,index=["d","c","b","a"]) print(res) # 字典的鍵用於構建索引 d 4.0 c 3.0 b 2.0 a 1.0 dtype: float64
常量建立
# 若是數據是常量值,則必須提供索引。將重複該值以匹配索引的長度。 res = pd.Series(5,index=[1,2,3,4,5]) print(res) 1 5 2 5 3 5 4 5 5 5 dtype: int64
2、數據查詢
切片
data = [1,2,3,4,5] res = pd.Series(data,index=["a","b","c","d","e"]) print(res[0:3],"---") # 這裏跟python的切片同樣 print(res[3],"---") print(res[-3:],"---") a 1 b 2 c 3 dtype: int64 --- 4 --- c 3 d 4 e 5 dtype: int64 ---
使用索引檢索數據
data = [1,2,3,4,5] res = pd.Series(data,index=["a","b","c","d","e"]) print(res["a"]) # 檢索多個值 標籤用中括號包裹 print(res[["a","b"]]) # 若是用沒有的標籤檢索則會拋出異常KeyError: 'f' 1 a 1 b 2 dtype: int64
data = [1,2,3,4,5] res = pd.Series(data) res[[2,4]] 2 3 4 5 dtype: int64
使用head()/tail()查看前幾個或後幾個
data = [1,2,3,4,5] res = pd.Series(data,index=["a","b","c","d","e"]) res.head(3) # 查看前三個 res.tail(2) # 查看後兩個
3、其餘操做
series元素進行去重
unique() 對series元素進行去重html
s = pd.Series(data=[1,1,2,2,3,4,5,6,6,6,7,6,6,7,8]) s.unique() array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64)
兩個series元素相加
Series之間的運算python
- 在運算中自動對齊不一樣索引的數據
- 若是索引不對應,則補NaN數組
# 當索引沒有對應的值時,可能出現缺失數據顯示NaN(not a number)的狀況 s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"]) s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"]) s = s1 + s2 s a 2.0 b 4.0 c 6.0 d 8.0 e NaN f NaN dtype: float64
監測缺失的數據
isnull() # 缺失的數據返回的布爾值爲True notnull() # 缺失的數據返回的布爾值爲False
isnull
s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"]) s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"]) s = s1 + s2 s.isnull() # 缺失的數據返回的布爾值爲True a False b False c False d False e True f True dtype: bool
notnull
s1 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","e"]) s2 = pd.Series(data=[1,2,3,4,5],index=["a","b","c","d","f"]) s = s1 + s2 s.notnull() # 缺失的數據返回的布爾值爲False a True b True c True d True e False f False dtype: bool
若是將布爾值做爲Serrise的索引,則只保留True對應的元素值
s[[True,True,False,False,True,True]] a 2.0 b 4.0 e NaN f NaN dtype: float64
根據上面的特性,能夠取出全部空的數據和全部不爲空的數據post
s[s.isnull()] # 取全部空值 e NaN f NaN dtype: float64 s[s.notnull()] # 取出不爲空的數據 a 2.0 b 4.0 c 6.0 d 8.0 dtype: float64 s.index # 取出索引 Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')