簡單的說,lambda 就是一個函數,可是這個函數沒有名字,因此咱們介紹一下這個函數的調用形式,參數與返回值的實現。 lambda 的格式以下:python
lambda [arg1 [, agr2,.....argn]] : expression lambda x : expression
那麼這個函數怎麼使用了,它經常不是單獨使用,單獨的使用的時候能夠較爲簡單,實現的功能過於簡單。因此一般被使用的狀況是,某個函數的參數是一個函數,那麼這個參數就可使用 lambda來實現。express
>>> foo = [2, 18, 9, 22, 17, 24, 8, 12, 27] >>> list(map(lambda x: x * 2 + 10, foo)) # 這裏的 map 函數的第一個參數就是函數
apply 函數以下 DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
。其核心部分是function 的選擇,其次是 axis 表示維度,這個函數能夠經過上面說的 lambda函數實現。這個函數的參數就是 DataFrame, 返回的對象既能夠是 DataFrame 也能夠是 series。數組
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B']) >>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0 # 返回的是一個 DataFrame >>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64 # 返回的是一個 Series >>> df.apply(lambda x: x*2 + 1, axis = 1) # 這種狀況下,x 表示的是 df 中全部的參數 A B 0 9 19 1 9 19 2 9 19 >>> df.apply(lambda x: [1, 2,5], axis=1) 0 [1, 2, 5] 1 [1, 2, 5] 2 [1, 2, 5] >>> df.apply(lambda x: [1, 2,5], axis=0) A B 0 1 1 1 2 2 2 5 5 >>> df.apply(lambda x: [1, 2,6,7,8,5], axis=0) A [1, 2, 6, 7, 8, 5] B [1, 2, 6, 7, 8, 5] >>> type(df.apply(lambda x: [1, 2,6,7,8,5], axis=0)) <class 'pandas.core.series.Series'> # 這時,將DataFrame變成一個 Series。
zip 函數的使用就是 zip([iterable, …])
。zip()是Python的一個內建函數,它接受一系列可迭代的對象做爲參數,將對象中對應的元素打包成一個個tuple(元組),而後返回由這些tuples組成的list(列表)。app
>>> name = [ "Manjeet", "Nikhil", "Shambhavi", "Astha" ] >>> roll_no = [ 4, 1, 3, 2 ] >>> marks = [ 40, 50, 60, 70 ] >>> mapped = zip(name, roll_no, marks) >>> list(mapped) [('Manjeet', 4, 40), ('Nikhil', 1, 50), ('Shambhavi', 3, 60), ('Astha', 2, 70)]
Map 函數主要是對 DataFrame 的操做,其參數還能夠是函數,函數
>>> import pandas as pd >>> from pandas import Series, DataFrame >>> data = DataFrame({'food':['bacon','pulled pork','bacon','Pastrami', 'corned beef','Bacon','pastrami','honey ham','nova lox'], 'ounces':[4,3,12,6,7.5,8,3,5,6]}) >>> data food ounces 0 bacon 4.0 1 pulled pork 3.0 2 bacon 12.0 3 Pastrami 6.0 4 corned beef 7.5 5 Bacon 8.0 6 pastrami 3.0 7 honey ham 5.0 8 nova lox 6.0 >>> meat_to_animal = { 'bacon':'pig', 'pulled pork':'pig', 'pastrami':'cow', 'corned beef':'cow', 'honey ham':'pig', 'nova lox':'salmon' } >>> meat_to_animal {'bacon': 'pig', 'pulled pork': 'pig', 'pastrami': 'cow', 'corned beef': 'cow', 'honey ham': 'pig', 'nova lox': 'salmon'} >>> data['food'].map(str.lower) 0 bacon 1 pulled pork 2 bacon 3 pastrami 4 corned beef 5 bacon 6 pastrami 7 honey ham 8 nova lox Name: food, dtype: object >>> data['animal'] = data['food'].map(str.lower).map(meat_to_animal) >>> data food ounces animal 0 bacon 4.0 pig 1 pulled pork 3.0 pig 2 bacon 12.0 pig 3 Pastrami 6.0 cow 4 corned beef 7.5 cow 5 Bacon 8.0 pig 6 pastrami 3.0 cow 7 honey ham 5.0 pig 8 nova lox 6.0 salmon >>> data['ounces'] = data['ounces'].map(lambda x: x+ 2) # 這裏使用 Map 函數與Apply函數有點相似 >>> data food ounces animal 0 bacon 6.0 pig 1 pulled pork 5.0 pig 2 bacon 14.0 pig 3 Pastrami 8.0 cow 4 corned beef 9.5 cow 5 Bacon 10.0 pig 6 pastrami 5.0 cow 7 honey ham 7.0 pig 8 nova lox 8.0 salmon
函數原型爲:stack(arrays, axis=0),arrays能夠傳數組和列表。axis的含義我下面會講解,咱們先來看個例子。code
>>> import numpy as np >>> a=[[[1,2,3,4],[11,21,31,41]], [[5,6,7,8],[51,61,71,81]], [[9,10,11,12],[91,101,111,121]]] >>> a [[[1, 2, 3, 4], [11, 21, 31, 41]], [[5, 6, 7, 8], [51, 61, 71, 81]], [[9, 10, 11, 12], [91, 101, 111, 121]]] # 能夠當作 a 有三層,咱們把從外到裏分別當作 axis = 0, axis = 1, axis = 2的三層,首先要肯定這個 list a,有三個元素,每一個元素都# 是一個 list_1,每一個 lsit_1 有兩個 list_2 元素, >>> np.stack(a, axis = 0) array([[[ 1, 2, 3, 4], [ 11, 21, 31, 41]], [[ 5, 6, 7, 8], [ 51, 61, 71, 81]], [[ 9, 10, 11, 12], [ 91, 101, 111, 121]]]) >>> d = np.stack(a, axis = 0) >>> len(d) 3 >>> d.shape # 在shape中分別表示從外到裏的維度 (3, 2, 4) # 獲得的是一個 array 的類型,堆疊的是 axis = 0的那一層,至關於沒變,只是數據格式改變 >>> np.stack(a, axis = 1) array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[ 11, 21, 31, 41], [ 51, 61, 71, 81], [ 91, 101, 111, 121]]]) >>> c = np.stack(a, axis = 1) >>> c.shape (2, 3, 4) # 這裏獲取 array 的每一個元素的方式 >>> np.stack(a, axis = 2) array([[[ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11], [ 4, 8, 12]], [[ 11, 51, 91], [ 21, 61, 101], [ 31, 71, 111], [ 41, 81, 121]]]) >>> b = np.stack(a, axis = 2) >>> b.shape (2, 4, 3)
咱們能夠這樣理解,stack 過程當中堆疊了那一層的元素,將這些元素做爲新的 Array 的最裏層,axis != 0 的時候永遠都是將第一層的元素堆疊成新的最裏層元素。對象
對於上面的例子,咱們作個轉換就很好理解 hstack() 函數了ip
>>> d = np.stack(a, axis = -1) >>> d array([[[ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11], [ 4, 8, 12]], [[ 11, 51, 91], [ 21, 61, 101], [ 31, 71, 111], [ 41, 81, 121]]]) >>> d = np.hstack(d) >>> d array([[ 1, 5, 9, 11, 51, 91], [ 2, 6, 10, 21, 61, 101], [ 3, 7, 11, 31, 71, 111], [ 4, 8, 12, 41, 81, 121]]) >>> d = np.hstack(d) >>> d array([ 1, 5, 9, 11, 51, 91, 2, 6, 10, 21, 61, 101, 3, 7, 11, 31, 71, 111, 4, 8, 12, 41, 81, 121]) >>> a = [[[[1, 2, 3, 4], [11, 21, 31, 41]], [[5, 6, 7, 8], [51, 61, 71, 81]], [[9, 10, 11, 12], [91, 101, 111, 121]]]] >>> a [[[[1, 2, 3, 4], [11, 21, 31, 41]], [[5, 6, 7, 8], [51, 61, 71, 81]], [[9, 10, 11, 12], [91, 101, 111, 121]]]] hstack() 還能夠用於兩個array 的橫向合併 >>> a=[[1],[2],[3]] >>> b=[[1],[2],[3]] >>> np.hstack((a,b)) array([[1, 1], [2, 2], [3, 3]]) vstack() 函數用於列的合併,也就是縱向 >>> np.vstack((a,b)) array([[1], [2], [3], [1], [2], [3]])
groupby 函數就如字面上的意思,就是分組的意思,經常使用的方法第一個是分組, mean() 方法, 而groupby 的方法也經常用在觀察數據類型中,在實際中分組也會使用原型
import pandas as pd >>> df = pd.DataFrame({'A': ['a', 'b', 'a', 'c', 'a', 'c', 'b', 'c'], 'B': [2, 8, 1, 4, 3, 2, 5, 9],'C': [102, 98, 107, 104, 115, 87, 92, 123]}) >>> df A B C 0 a 2 102 1 b 8 98 2 a 1 107 3 c 4 104 4 a 3 115 5 c 2 87 6 b 5 92 7 c 9 123 >>> df.groupby('A').mean() B C A a 2.0 108.000000 b 6.5 95.000000 c 5.0 104.666667 >>> df.groupby(['A','B']).mean() C A B a 1 107 2 102 3 115 b 5 92 8 98 c 2 87 4 104 9 123
size跟count的區別: size計數時包含 NaN 值,而count不包含 NaN 值, 咱們能夠理解 groupby函數是用來分組,那麼分組以後的函數是能夠選擇的,能夠是 mean() ,查看,或者是 count() 計數,下面這個例子:pandas
>>> df = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],"City":["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],"Val":[4,3,3,np.nan,np.nan,4]}) >>> df City Name Val 0 Seattle Alice 4.0 1 Seattle Bob 3.0 2 Portland Mallory 3.0 3 Seattle Mallory NaN 4 Seattle Bob NaN 5 Portland Mallory 4.0 >>> df.groupby(["Name", "City"], as_index=False)['Val'].count() Name City Val 0 Alice Seattle 1 1 Bob Seattle 1 2 Mallory Portland 2 3 Mallory Seattle 0 >>> df.groupby(["Name"], as_index=False)['City'].count() Name City 0 Alice 1 1 Bob 2 2 Mallory 3 # 選擇的那一組表示次數, 好比上面的 City,而Size 函數就是包含 NaN 的個數