【Python】SQL to Pandas 系列(一)

原文地址python

介紹

SQL 是用於訪問和處理數據庫的標準的計算機語言。 經常使用於數據庫管理系統(RDBMS)中, 這類數據庫包括 MySQLSQL ServerOracle等。mysql

Pandas 是一個開源的,爲 Python 提供高性能的,數據結構以及數據分析工具。sql

在熟練地使用 SQL 的同時,爲知足一些的業務需求,經常須要咱們將數據提取後,再對數據進行統計分析,那應該如何使用 Pandas 達到和 SQL 同樣的效果呢?數據庫

下面的速查表將會逐一使用 Pandas 對常見的 SQL 語句進行映射。數據結構

本篇內容

本篇將解構下面的 SQL 查詢句式, 使用 Pandas 進行實現工具

SQL 查詢句式post

SELECT DISTINCT [字段] 
FROM [表] JOIN [bin] ON [鏈接條件] 
WHERE [過濾條件] 
GROUP BY [字段] 
HAVING [條件] 
ORDER BY [字段] DESC 
LIMIT [個數] OFFSET [個數]
複製代碼

讀取測試數據

import pandas as pd
import pymysql
複製代碼
conn = pymysql.connect(host='127.0.0.1',
                       user='root',
                       password='12345678',
                       db='test_db')
df = pd.read_sql(sql="select * from student", con=conn)
複製代碼

數據預覽性能

df
複製代碼
id name age sex city money
0 1 趙雷 1990-01-01 北京 20.0
1 2 錢電 1990-12-21 天津 30.0
2 3 孫風 1990-12-20 成都 2.0
3 4 李雲 1990-12-06 北京 100.0
4 5 周梅 1991-12-01 成都 50.0
5 6 吳蘭 1992-01-01 北京 3.0
6 7 鄭竹 1989-01-01 成都 200.0
7 8 張三 2017-12-20 天津 20.0
8 9 李四 2017-12-25 西安 35.0
9 10 李四 2012-06-06 北京 40.0
10 11 趙六 2013-06-13 成都 5.0
11 12 孫七 2014-06-01 天津 210.0

SELECT

SQL測試

SELECT * FROM student

SELECT id, name, sex FROM student 
複製代碼

Pandasspa

df

df[['id','name','sex']]
複製代碼

DISTINCT

SQL

SELECT DISTINCT name FROM student 
複製代碼

Pandas

df['name'].unique()
複製代碼

COUNT & SUM & MAX & MIN & AVG

SQL

SELECT COUNT(*) FROM student

SELECT SUM(money) FROM student

SELECT id, MAX(money) FROM student

SELECT id, MIN(money) FROM student

SELECT AVG(money) FROM student
複製代碼

Pandas

df['id'].count()

df['money'].sum()

df[df['money'] == df['money'].max()]

df[df['money'] == df['money'].min()]

df['money'].mean()
複製代碼

描述性統計數據

In [1]: df['money'].describe()
    
Out[1]: count     12.000000
        mean      59.583333
        std       72.963825
        min        2.000000
        25%       16.250000
        50%       32.500000
        75%       62.500000
        max      210.000000
        Name: money, dtype: float64
複製代碼

WHERE

例子: =

SQL

SELECT * FROM student WHERE sex = '男'
複製代碼

Pandas

df[df['sex'] == ('男')]
複製代碼

例子: in & not in

SQL

SELECT * FROM student WHERE id IN (2,4,6,8,10)

SELECT * FROM student WHERE id NOT IN (2,4,6,8,10)
複製代碼

Pandas

df[df['id'].isin((2,4,6,8))]

df[~df['id'].isin((2,4,6,8))]
複製代碼

多個條件

SQL

SELECT * FROM student WHERE sex = '男' and id IN (2,4,6,8,10)
複製代碼

Pandas

df[(df['sex'] == ('男')) & (df['id'].isin((2,4,6,8)))]
複製代碼

LIMIT OFFSET

SQL

SELECT * FROM student ORDER BY id DESC LIMIT 3

SELECT * FROM student ORDER BY id DESC LIMIT 3 OFFSET 2
複製代碼

Pandas

df.sort_values('id',ascending=False).head(3)

df.nlargest(2 + 3, 'id').tail(3)
複製代碼

SELECT & WHERE & LIMIT

SQL

SELECT * FROM student WHERE sex = '男' LIMIT 3

SELECT id, name, sex FROM student WHERE sex ='男' LIMIT 3
複製代碼

Pandas

df[df['sex'] == ('男')].head(3)

df[df['sex'] == ('男')][['id','name','sex']].head(3)
複製代碼

ORDER BY

SQL

SELECT * FROM student ORDER BY age

SELECT * FROM student ORDER BY age DESC
複製代碼

Pandas

df.sort_values('age')

df.sort_values('age', ascending=False)
複製代碼

GROUP BY

GROYP BY & COUNT

SQL

SELECT city, COUNT(*) FROM student GROUP BY city
複製代碼

Pandas

df.groupby(['city']).size().to_frame('size').reset_index()
複製代碼

GROYP BY & SUM

SQL

SELECT city, SUM(money) FROM student GROUP BY city
複製代碼

Pandas

df.groupby(['city'])['money'].agg('sum').reset_index()
複製代碼

GROUP BY & ORDER BY & COUNT

GROUP BY 單字段

SQL

SELECT city, COUNT(*) FROM student GROUP BY sex ORDER BY city
複製代碼

Pandas

df.groupby(['city']).size().to_frame('size').reset_index().sort_values('city')
複製代碼

GROUP BY 多字段

SQL

SELECT city, sex, COUNT(*) FROM student GROUP BY city, sex ORDER BY city
複製代碼

Pandas

df.groupby(['city','sex']).size().to_frame('size').reset_index().sort_values('city')
複製代碼

HAVING

SQL

SELECT city, COUNT(*) FROM student GROUP BY city HAVING count(*) > 3
複製代碼

Pandas

df.groupby('city').filter(lambda x:len(x)>3).groupby('city').size().to_frame('size').reset_index()
複製代碼

系列文章

【Python】當數據離開了 SQL 該怎麼活下去(一)

【Python】當數據離開了 SQL 該怎麼活下去(二)

相關文章
相關標籤/搜索