原文地址python
SQL 是用於訪問和處理數據庫的標準的計算機語言。 經常使用於數據庫管理系統(RDBMS)中, 這類數據庫包括 MySQL、SQL Server、Oracle等。mysql
Pandas 是一個開源的,爲 Python 提供高性能的,數據結構以及數據分析工具。sql
在熟練地使用 SQL 的同時,爲知足一些的業務需求,經常須要咱們將數據提取後,再對數據進行統計分析,那應該如何使用 Pandas 達到和 SQL 同樣的效果呢?數據庫
下面的速查表將會逐一使用 Pandas 對常見的 SQL 語句進行映射。數據結構
本篇將解構下面的 SQL 查詢句式, 使用 Pandas 進行實現工具
SQL 查詢句式post
SELECT DISTINCT [字段]
FROM [表] JOIN [bin] ON [鏈接條件]
WHERE [過濾條件]
GROUP BY [字段]
HAVING [條件]
ORDER BY [字段] DESC
LIMIT [個數] OFFSET [個數]
複製代碼
import pandas as pd
import pymysql
複製代碼
conn = pymysql.connect(host='127.0.0.1',
user='root',
password='12345678',
db='test_db')
df = pd.read_sql(sql="select * from student", con=conn)
複製代碼
數據預覽性能
df
複製代碼
id | name | age | sex | city | money | |
---|---|---|---|---|---|---|
0 | 1 | 趙雷 | 1990-01-01 | 男 | 北京 | 20.0 |
1 | 2 | 錢電 | 1990-12-21 | 男 | 天津 | 30.0 |
2 | 3 | 孫風 | 1990-12-20 | 男 | 成都 | 2.0 |
3 | 4 | 李雲 | 1990-12-06 | 男 | 北京 | 100.0 |
4 | 5 | 周梅 | 1991-12-01 | 女 | 成都 | 50.0 |
5 | 6 | 吳蘭 | 1992-01-01 | 女 | 北京 | 3.0 |
6 | 7 | 鄭竹 | 1989-01-01 | 女 | 成都 | 200.0 |
7 | 8 | 張三 | 2017-12-20 | 女 | 天津 | 20.0 |
8 | 9 | 李四 | 2017-12-25 | 女 | 西安 | 35.0 |
9 | 10 | 李四 | 2012-06-06 | 女 | 北京 | 40.0 |
10 | 11 | 趙六 | 2013-06-13 | 女 | 成都 | 5.0 |
11 | 12 | 孫七 | 2014-06-01 | 女 | 天津 | 210.0 |
SQL測試
SELECT * FROM student
SELECT id, name, sex FROM student
複製代碼
Pandasspa
df
df[['id','name','sex']]
複製代碼
SQL
SELECT DISTINCT name FROM student
複製代碼
Pandas
df['name'].unique()
複製代碼
SQL
SELECT COUNT(*) FROM student
SELECT SUM(money) FROM student
SELECT id, MAX(money) FROM student
SELECT id, MIN(money) FROM student
SELECT AVG(money) FROM student
複製代碼
Pandas
df['id'].count()
df['money'].sum()
df[df['money'] == df['money'].max()]
df[df['money'] == df['money'].min()]
df['money'].mean()
複製代碼
描述性統計數據
In [1]: df['money'].describe()
Out[1]: count 12.000000
mean 59.583333
std 72.963825
min 2.000000
25% 16.250000
50% 32.500000
75% 62.500000
max 210.000000
Name: money, dtype: float64
複製代碼
例子: =
SQL
SELECT * FROM student WHERE sex = '男'
複製代碼
Pandas
df[df['sex'] == ('男')]
複製代碼
例子: in & not in
SQL
SELECT * FROM student WHERE id IN (2,4,6,8,10)
SELECT * FROM student WHERE id NOT IN (2,4,6,8,10)
複製代碼
Pandas
df[df['id'].isin((2,4,6,8))]
df[~df['id'].isin((2,4,6,8))]
複製代碼
多個條件
SQL
SELECT * FROM student WHERE sex = '男' and id IN (2,4,6,8,10)
複製代碼
Pandas
df[(df['sex'] == ('男')) & (df['id'].isin((2,4,6,8)))]
複製代碼
SQL
SELECT * FROM student ORDER BY id DESC LIMIT 3
SELECT * FROM student ORDER BY id DESC LIMIT 3 OFFSET 2
複製代碼
Pandas
df.sort_values('id',ascending=False).head(3)
df.nlargest(2 + 3, 'id').tail(3)
複製代碼
SQL
SELECT * FROM student WHERE sex = '男' LIMIT 3
SELECT id, name, sex FROM student WHERE sex ='男' LIMIT 3
複製代碼
Pandas
df[df['sex'] == ('男')].head(3)
df[df['sex'] == ('男')][['id','name','sex']].head(3)
複製代碼
SQL
SELECT * FROM student ORDER BY age
SELECT * FROM student ORDER BY age DESC
複製代碼
Pandas
df.sort_values('age')
df.sort_values('age', ascending=False)
複製代碼
GROYP BY & COUNT
SQL
SELECT city, COUNT(*) FROM student GROUP BY city
複製代碼
Pandas
df.groupby(['city']).size().to_frame('size').reset_index()
複製代碼
GROYP BY & SUM
SQL
SELECT city, SUM(money) FROM student GROUP BY city
複製代碼
Pandas
df.groupby(['city'])['money'].agg('sum').reset_index()
複製代碼
GROUP BY 單字段
SQL
SELECT city, COUNT(*) FROM student GROUP BY sex ORDER BY city
複製代碼
Pandas
df.groupby(['city']).size().to_frame('size').reset_index().sort_values('city')
複製代碼
GROUP BY 多字段
SQL
SELECT city, sex, COUNT(*) FROM student GROUP BY city, sex ORDER BY city
複製代碼
Pandas
df.groupby(['city','sex']).size().to_frame('size').reset_index().sort_values('city')
複製代碼
SQL
SELECT city, COUNT(*) FROM student GROUP BY city HAVING count(*) > 3
複製代碼
Pandas
df.groupby('city').filter(lambda x:len(x)>3).groupby('city').size().to_frame('size').reset_index()
複製代碼
系列文章