根據列名來選擇某列的數據python
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) # 選擇A列數據 print("A列數據:") print(data["A"])
輸出結果:code
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 A列數據: 2017-01-08 0 2017-01-09 4 2017-01-10 8 2017-01-11 12 2017-01-12 16 2017-01-13 20 Freq: D, Name: A, dtype: int32
也能夠用點符號來進行:索引
print(data.A)
上面的功能跟data["A"]同樣。pandas
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇0至3行的數據:") print(data[0:3])
輸出爲:io
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇0至3行的數據: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11
也能夠根據索引號範圍來選擇某幾行的數據。
好比,以下的例子中咱們就選擇出2017-01-10到2017-01-12的數據:class
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("按照索引選擇數據:") print(data["2017-01-10":"2017-01-12"])
輸出爲:import
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 按照索引選擇數據: A B C D 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19
使用loc選擇某幾行的數據:date
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("按照索引選擇數據:") print(data.loc["2017-01-10":"2017-01-12"])
輸出:numpy
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 按照索引選擇數據: A B C D 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19
也能夠按照列進行選擇數據,好比,咱們想要選擇其中B和C列的數據:方法
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇某兩列的數據:") print(data.loc[:, ["B", "C"]])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇某兩列的數據: B C 2017-01-08 1 2 2017-01-09 5 6 2017-01-10 9 10 2017-01-11 13 14 2017-01-12 17 18 2017-01-13 21 22
若是隻想選擇某幾行中某幾列的數據,能夠對上面的例子進行一下稍微的修改就能實現:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇某幾行某幾列的數據:") print(data.loc["2017-01-09":"2017-01-12", ["B", "C"]])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇某幾行某幾列的數據: B C 2017-01-09 5 6 2017-01-10 9 10 2017-01-11 13 14 2017-01-12 17 18
位置索引的方法爲iloc,例如,選擇第3行第2列的數據:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇第3行第2列的數據:") print(data.iloc[3, 1])
輸出:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇第3行第2位的數據: 13
固然,咱們也能夠在iloc中使用切片,好比,我想選擇出從第3行以後的第2列數據:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇第3行以後第2列的數據:") print(data.iloc[3:, 1])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇第3行以後第2列的數據: 2017-01-11 13 2017-01-12 17 2017-01-13 21 Freq: D, Name: B, dtype: int32
咱們也能夠單獨地選擇某幾行的數據,例如:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇第1,3,5行第1到第3列的數據:") print(data.iloc[[1, 3, 5], 1:3])
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇第3行以後第2列的數據: B C 2017-01-09 5 6 2017-01-11 13 14 2017-01-13 21 22
好比行用數字來篩選,而列用標籤來進行篩選,例如:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("選擇第1,3,5行第1到第3列的數據:") print(data.ix[[1, 3, 5], ["A", "C"]])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇第1,3,5行第1到第3列的數據: A C 2017-01-09 4 6 2017-01-11 12 14 2017-01-13 20 22
相似於SQL中where column < xxx這種類型的選擇。
例如,選擇出A列小於8的數據:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("根據某列中的數值進行篩選:") print(data[data.A < 8])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 選擇根據某列中的數值進行篩選: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7
若是想要進行聯合索引,好比where A<8 and B < 5,則:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("根據某列中的數值進行篩選:") data = data[data.A < 8] print(data[data.B < 5])
輸出爲:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 根據某列中的數值進行篩選: A B C D 2017-01-08 0 1 2 3