pandas選擇數據-【老魚學pandas】

選擇列

根據列名來選擇某列的數據python

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)
# 選擇A列數據
print("A列數據:")
print(data["A"])

輸出結果:code

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
A列數據:
2017-01-08     0
2017-01-09     4
2017-01-10     8
2017-01-11    12
2017-01-12    16
2017-01-13    20
Freq: D, Name: A, dtype: int32

也能夠用點符號來進行:索引

print(data.A)

上面的功能跟data["A"]同樣。pandas

選擇某幾行數據

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇0至3行的數據:")
print(data[0:3])

輸出爲:io

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇0至3行的數據:
            A  B   C   D
2017-01-08  0  1   2   3
2017-01-09  4  5   6   7
2017-01-10  8  9  10  11

也能夠根據索引號範圍來選擇某幾行的數據。
好比,以下的例子中咱們就選擇出2017-01-10到2017-01-12的數據:class

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("按照索引選擇數據:")
print(data["2017-01-10":"2017-01-12"])

輸出爲:import

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
按照索引選擇數據:
             A   B   C   D
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19

使用loc進行選擇

使用loc選擇某幾行的數據:date

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("按照索引選擇數據:")
print(data.loc["2017-01-10":"2017-01-12"])

輸出:numpy

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
按照索引選擇數據:
             A   B   C   D
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19

也能夠按照列進行選擇數據,好比,咱們想要選擇其中B和C列的數據:方法

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇某兩列的數據:")
print(data.loc[:, ["B", "C"]])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇某兩列的數據:
             B   C
2017-01-08   1   2
2017-01-09   5   6
2017-01-10   9  10
2017-01-11  13  14
2017-01-12  17  18
2017-01-13  21  22

若是隻想選擇某幾行中某幾列的數據,能夠對上面的例子進行一下稍微的修改就能實現:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇某幾行某幾列的數據:")
print(data.loc["2017-01-09":"2017-01-12", ["B", "C"]])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇某幾行某幾列的數據:
             B   C
2017-01-09   5   6
2017-01-10   9  10
2017-01-11  13  14
2017-01-12  17  18

根據位置索引選擇數據

位置索引的方法爲iloc,例如,選擇第3行第2列的數據:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇第3行第2列的數據:")
print(data.iloc[3, 1])

輸出:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇第3行第2位的數據:
13

固然,咱們也能夠在iloc中使用切片,好比,我想選擇出從第3行以後的第2列數據:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇第3行以後第2列的數據:")
print(data.iloc[3:, 1])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇第3行以後第2列的數據:
2017-01-11    13
2017-01-12    17
2017-01-13    21
Freq: D, Name: B, dtype: int32

咱們也能夠單獨地選擇某幾行的數據,例如:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇第1,3,5行第1到第3列的數據:")
print(data.iloc[[1, 3, 5], 1:3])
data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇第3行以後第2列的數據:
             B   C
2017-01-09   5   6
2017-01-11  13  14
2017-01-13  21  22

標籤和位置混合篩選

好比行用數字來篩選,而列用標籤來進行篩選,例如:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("選擇第1,3,5行第1到第3列的數據:")
print(data.ix[[1, 3, 5], ["A", "C"]])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇第1,3,5行第1到第3列的數據:
             A   C
2017-01-09   4   6
2017-01-11  12  14
2017-01-13  20  22

根據某列中的數值進行篩選

相似於SQL中where column < xxx這種類型的選擇。
例如,選擇出A列小於8的數據:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("根據某列中的數值進行篩選:")
print(data[data.A < 8])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
選擇根據某列中的數值進行篩選:
            A  B  C  D
2017-01-08  0  1  2  3
2017-01-09  4  5  6  7

若是想要進行聯合索引,好比where A<8 and B < 5,則:

import pandas as pd
import numpy as np
dates = pd.date_range("2017-01-08", periods=6)
data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"])
print("data:")
print(data)

print("根據某列中的數值進行篩選:")
data = data[data.A < 8]
print(data[data.B < 5])

輸出爲:

data:
             A   B   C   D
2017-01-08   0   1   2   3
2017-01-09   4   5   6   7
2017-01-10   8   9  10  11
2017-01-11  12  13  14  15
2017-01-12  16  17  18  19
2017-01-13  20  21  22  23
根據某列中的數值進行篩選:
            A  B  C  D
2017-01-08  0  1  2  3
相關文章
相關標籤/搜索