30分鐘SQL指南

時間 2019-12-04

標籤 30分 sql 指南欄目 SQL 简体版

原文原文鏈接

本篇文章是 SQL 必知必會 的讀書筆記，SQL必知必會的英文名叫作 Sams Teach Yourself in 10 Minutes 。可是，我確定是不可以在10分鐘就能學會本書全部涉及到的sql，因此就起個名字叫30分鐘學會SQL語句（其實半個小時也沒有學會...）。python

目前手邊的數據庫是 mysql，因此如下示例均是由 mysql 演示。因爲如今大部分工具都支持語法高亮，因此如下關鍵字都使用小寫。mysql

原文連接見 shanyue.tech/post/sql-gu…git

準備

工具

mycli，一個使用python編寫的終端工具，支持語法高亮，自動補全，多行模式，而且若是你熟悉vi的話，可使用vi-mode快速移動，編輯。總之，vi + mycli 簡直是神器！github

一樣，postgreSQL 可使用pgcli。sql

pip install -U mycli    # 默認你已經安裝了pip
複製代碼

樣例表

示例中有兩個表，分爲 student 學生表與 class 班級表。student 表中有 class_id 關聯 class 表。如下是兩個表數據的 sql。另外，最後有三道小練習題會用到樣例表。shell

create table class (
  id int(11) not null auto_increment comment '班級id',
  name varchar(50) not null comment '班級名',
  primary key (id)
) comment '班級表';

create table student (
  id int(11) not null auto_increment comment '學生id',
  name varchar(50) not null comment '學生姓名',
  age tinyint unsigned default 20 comment '學生年齡',
  sex enum('male', 'famale') comment '性別',
  score tinyint comment '入學成績',
  class_id int(11) comment '班級',
  createTime timestamp default current_timestamp comment '建立時間',
  primary key (id),
  foreign key (class_id) references class (id)
) comment '學生表';

insert into class (name) values ('軟件工程'), ('市場營銷');

insert into student (name, age, sex, score, class_id) values ('張三', 21, 'male', 100, 1);
insert into student (name, age, sex, score, class_id) values ('李四', 22, 'male', 98, 1);
insert into student (name, age, sex, score, class_id) values ('王五', 22, 'male', 99, 1);
insert into student (name, age, sex, score, class_id) values ('燕七', 21, 'famale', 34, 2);
insert into student (name, age, sex, score, class_id) values ('林仙兒', 23, 'famale', 78, 2);
複製代碼

SQL 基礎

術語

Database數據庫

數據庫值一系列有關聯數據的集合，而操做和管理這些數據的是DBMS，包括MySQL，PostgreSQL，MongoDB，Oracle，SQLite等等。 RDBMS 是基於關係模型的數據庫，使用 SQL 管理和操縱數據。另外也有一些 NoSQL 數據庫，好比 MongoDB。由於NoSQL爲非關係型數據庫，通常不支持join操做，所以會有一些非正則化(denormalization)的數據，查詢也比較快。安全
Tableide

具備特定屬性的結構化文件。好比學生表，學生屬性有學號，年齡，性別等。schema (模式) 用來描述這些信息。 NoSQL 不須要固定列，通常沒有 schema，同時也利於垂直擴展。函數
Column

表中的特定屬性，如學生的學號，年齡。每一列都具備數據類型。
Data Type

每一列都具備數據類型，如 char, varchar，int，text，blob, datetime，timestamp。根據數據的粒度爲列選擇合適的數據類型，避免無心義的空間浪費。以下有一些類型對比
- char, varchar 須要存儲數據的長度方差小的時候適合存儲char，不然varchar。 varchar 會使用額外長度存儲字符串長度，佔用存儲空間較大。二者對字符串末尾的空格處理的策略不一樣，不一樣的DBMS又有不一樣的策略，設計數據庫的時候應當注意到這個區別。
- datetime, timestamp datetime 存儲時間範圍從1001年到9999年。 timestamp 保存了自1970年1月1日的秒數，由於存儲範圍比較小，天然存儲空間佔用也比較小。日期類型能夠設置更新行時自動更新日期，建議日期時間類型根據精度存儲爲這兩個類型。現在 DBMS 可以存儲微秒級別的精度，好比 mysql 默認存儲精度爲秒，但能夠指定到微秒級別，即小數點後六位小數
- enum 對於一些固定，不易變更的狀態碼建議存儲爲 enum 類型，具備更好的可讀性，更小的存儲空間，而且能夠保證數據有效性。
插一個小問題：如何存儲IP地址
Row

數據表的每一行記錄。如學生張三。

檢索數據

-- 檢索單列
select name from student;

-- 檢索多列
select name, age, class from student;

-- 檢索全部列
select * from student;

-- 對某列去重
select distinct class from student;

-- 檢索列-選擇區間
-- offset 基數爲0，因此 `offset 1` 表明從第2行開始
select * from student limit 1, 10;
select * from student limit 10 offset 1;
複製代碼

排序

默認排序是 ASC，因此通常升序的時候不需指定，降序的關鍵字是 DESC。使用 B-Tree 索引能夠提升排序性能，但只限最左匹配。關於索引能夠查看如下 FAQ。

-- 根據學號降序排列
select * from student order by number desc;

-- 添加索引 (score, name) 能夠提升排序性能
-- 可是索引 (name, score) 對性能毫無幫助，此謂最左匹配，能夠根據 B+Tree 進行理解
select * from student order by score desc, name;
複製代碼

數據過濾

數據篩選，或者數據過濾在 sql 中使用頻率最高

-- 找到學號爲1的學生
select * from student where number = 1;

-- 找到學號爲在 [1, 10] 的學生(閉區間)
select * from student where number between 1 and 10;

-- 找到未設置電子郵箱的學生
-- 注意不能使用 =
select * from student where email is null;

-- 找到一班中大於23歲的學生
select * from student where class_id = 1 and age > 23;

-- 找到一班或者大於23歲的學生
select * from student where class_id = 1 or age > 22;

-- 找到一班與二班的學生
select * from student where class_id in (1, 2);

-- 找到不是一班二班的學生
select * from student where class_id not in (1, 2);
複製代碼

計算字段

CONCAT

select concat(name, '(', age, ')') as nameWithAge from student;

select concat('hello', 'world') as helloworld;
複製代碼

Math

select age - 18 as relativeAge from student;

select 3 * 4 as n;
複製代碼

更多函數能夠查看 API 手冊，同時也能夠自定義函數(User Define Function)。

能夠直接使用 select 調用函數

select now();
select concat('hello', 'world');
複製代碼

數據聚合 (aggregation)

聚合函數，一些對數據進行彙總的函數，常見有 COUNT，MIN，MAX，AVG，SUM 五種。

-- 統計1班人數
select count(*) from student where class_id = 1;
複製代碼

數據分組

使用 group by 進行數據分組，可使用聚合函數對分組數據進行彙總，使用 having 對分組數據進行篩選。

-- 按照班級進行分組並統計各班人數
select class_id, count(*) from student group by class_id;

-- 列出大於三個學生的班級
select class_id, count(*) as cnt from student group by class_id having cnt > 3;
複製代碼

子查詢

-- 列出軟件工程班級中的學生
select * from student where class_id in (
  select id from class where name = '軟件工程'
);
複製代碼

聯接

雖然兩個表擁有公共字段即可以建立聯接，可是使用外鍵能夠更好地保證數據完整性。好比當對一個學生插入一條不存在的班級的時候，便會插入失敗。通常來講，聯接比子查詢擁有更好的性能。

-- 列出軟件工程班級中的學生
select * from student, class
where student.class_id = class.id and class.name = '軟件工程';
複製代碼

內聯接

內聯接又叫等值聯接。

-- 列出軟件工程班級中的學生
select * from student
inner join class on student.class_id = class.id
where class.name = '軟件工程';
複製代碼

自聯接

自鏈接就是相同的表進行聯接

-- 列出與張三同一班級的學生
select * from student s1
inner join student s2 on s1.class_id = s2.class_id
where s1.name = '張三';
複製代碼

外聯接

外聯接分爲 left join 與 right join，left join 指左側永不會爲 null，right join 指右側永不會爲 null。

-- 列出每一個學生的班級，若沒有班級則爲null
select name, class.name from student
left join class on student.class_id = class.id;
複製代碼

插入數據

使用 insert into 向表中插入數據，也能夠插入多行。

插入時能夠不指定列名，不過嚴重依賴表中列的順序關係，推薦指定列名插入數據，而且能夠插入部分列。

-- 插入一條數據
insert into student values(8, '陸小鳳', 24, 1, 3);

insert into student(name, age, sex, class_id) values(9, '花完好', 25, 1, 3);
複製代碼

修改數據

在修改重要數據時，務必先 select 確認是否須要操做數據，而後 begin 方便及時 rollback

更新

-- 修改張三的班級
update student set class_id = 2 where name = '張三';
複製代碼

刪除

-- 刪除張三的數據
delete from student where name = '張三';

-- 刪除表中全部數據
delete from student;

-- 更快地刪除表中全部數據
truncate table student;
複製代碼

建立表與更新表

-- 建立學生表，注意添加必要的註釋
create table student (
  id int(11) not null auto_increment comment '學生id',
  name varchar(50) not null comment '學生姓名',
  age tinyint unsigned default 20 comment '學生年齡',
  sex enum('male', 'famale') comment '性別',
  score tinyint comment '入學成績',
  class_id int(11) comment '班級',
  createTime timestamp default current_timestamp comment '建立時間',
  primary key (id),
  foreign key (class_id) references class (id)
) comment '學生表';

-- 根據舊錶建立新表
create table student_copy as select * from student;

-- 刪除 age 列
alter table student drop column age;

-- 添加 age 列
alter table student add column age smallint;

-- 刪除學生表
drop table student;
複製代碼

視圖

視圖是一種虛擬的表，便於更好地在多個表中檢索數據，視圖也能夠做寫操做，不過最好做爲只讀。在須要多個表聯接的時候可使用視圖。

create view v_student_with_classname as
select student.name name, class.name class_name
from student left join class
where student.class_id = class.id;

select * from v_student_with_classname; 
複製代碼

約束

primiry key

任意兩行絕對沒有相同的主鍵，且任一行不會有兩個主鍵且主鍵毫不爲空。使用主鍵能夠加快索引。
```
alter table student add constraint primary key (id);
複製代碼
```
foreign key

外鍵能夠保證數據的完整性。有如下兩種狀況。
- 插入張三丰5班到student表中會失敗，由於5班在class表中不存在。
- class表刪除3班會失敗，由於陸小鳳和楚留香還在3班。
```
alter table student add constraint foreign key (class_id) references class (id);
複製代碼
```
unique key

惟一索引保證該列值是惟一的，但能夠容許有null。
```
alter table student add constraint unique key (name);
複製代碼
```
check

檢查約束可使列知足特定的條件，若是學生表中全部的人的年齡都應該大於0。

不過很惋惜mysql不支持，可使用觸發器代替
```
alter table student add constraint check (age > 0);
複製代碼
```

index

索引能夠更快地檢索數據，可是下降了更新操做的性能。

create index index_on_student_name on student (name);

alter table student add constraint key(name);
複製代碼

觸發器

開發過程當中歷來沒有使用過，有多是我經驗少

能夠在插入，更新，刪除行的時候觸發事件。

場景:

數據約束，好比學生的年齡必須大於0
hook，提供數據庫級別的 hook

-- 建立觸發器
-- 好比mysql中沒有check約束，可使用建立觸發器，當插入數據小於0時，置爲0。
create trigger reset_age before insert on student for each row
begin
  if NEW.age < 0 then
    set NEW.age = 0;
  end if;
end;

-- 打印觸發器列表
show triggers;
複製代碼

存儲過程

開發過程當中歷來沒有使用過，有多是我經驗少

存儲過程能夠視爲一個函數，根據輸入執行一系列的 sql 語句。存儲過程也能夠看作對一系列數據庫操做的封裝，必定程度上能夠提升數據庫的安全性。

-- 建立存儲過程
create procedure create_student(name varchar(50))
begin
  insert into students(name) values (name);
end;

-- 調用存儲過程
call create_student('shanyue');
複製代碼

SQL 實踐

更多練習能夠查看 leetcode

1. 根據班級學生的分數進行排名，若是分數相等則爲同一名次

select id, name, score, (
  select count(distinct score) from student s2 where s2.score >= s1.score
) as rank
from student s1 order by s1.score desc;
複製代碼

在where以及排序中常常用到的字段須要添加Btree索引，所以 score 上能夠添加索引。

Result:

id	name	score	rank
1	張三	100	1
3	王五	99	2
2	李四	98	3
5	林仙兒	78	4
4	燕七	34	5

參考 leetcode: rank-scores

2. 寫一個函數，獲取第 N 高的分數

create function getNthHighestScore(N int) return int
begin
  declare M int default N-1;
  return (
    select distinct score from student order by score desc limit M, 1;
  )
end;

select getNthHighestScore(2);
複製代碼

Result:

getNthHighestScore(2)
99

參考 leetcode: nth highset salary

3. 檢索每一個班級分數前兩名學生，並顯示排名

select class.id class_id, class.name class_name, s.name student_name, score, rank
from (
  select *, (
    select count(distinct score) from student s2 where s2.score >= s1.score and s2.class_id = s1.class_id
  ) as rank from student s1
) as s left join class on s.class_id = class.id where rank <= 2;

-- 若是不想在from中包含select子句，也能夠像以下檢索，不過不顯示排名
select class.id class_id, class.name class_name, s1.name name, score
from student s1 left join class on s1.class_id = class.id
where (select count(*) from student s2 where s2.class_id = s1.class_id and s1.score <= s2.score) <= 2
order by s1.class_id, score desc;
複製代碼

Result:

class_name	student_name	score	rank
軟件工程	張三	100	1
軟件工程	王五	99	2
市場營銷	燕七	34	2
市場營銷	林仙兒	78	1

FAQ

大多根據 stackoverflow 中瀏覽最多的問題整理而成。

`inner join` 與 `outer join` 的區別是什麼

參考 StackOverflow: what is the difference between inner join and outer join

如何根據一個表的數據更新另外一個表

好比以上 student 表保存着成績，另有一表 score_correct 內存因失誤而需修改的學生成績。

在mysql中，可使用以下語法

update student, score_correct set student.score = score_correct.score where student.id = score_correct.uid;
複製代碼

索引是如何工做的

簡單來講，索引分爲 hash 和 B-Tree 兩種。 hash 查找的時間複雜度爲O(1)。 B-Tree 實際上是 B+Tree，一種自平衡多叉搜索數，自平衡表明每次插入和刪除數據都會須要動態調整樹高，以下降平衡因子。B+Tree 只有葉子節點會存儲信息，而且會使用鏈表連接起來。所以適合範圍查找以及排序，不過只能搜索最左前綴，如只能索引以a開頭的姓名，卻沒法索引以a結尾的姓名。另外，Everything is trade off。B+Tree的自平衡特性保證可以快速查找的同時也下降了更新的性能，須要權衡利弊。

參考 StackOverflow: how dow database indexing work

如何聯接多個行的字段

在mysql中，可使用group_concat

select group_concat(name) from student;
複製代碼

參考 StackOverflow: Concatenate many rows into a single text string

如何在一個sql語句中插入多行數據

values 使用逗號相隔，能夠插入多行數據

insert into student(id, name) values (), (), ()
複製代碼

參考 StackOverflow: Inserting multiple rows in a single SQL query

如何在`select`中使用條件表達式

示例，在student表中，查詢全部人成績，小於60則顯示爲0

select id, name, if(score < 60, 0, score) score from student;
複製代碼

如何找到重複項

姓名與班級惟一，找到姓名與班級的重複項，檢索重複次數與id

select name, class_id, group_concat(id), count(*) times from student
group by name, class_id
having times > 1;
複製代碼

1:1 Relation 設計的必要性在哪裏

參考 stackoverflow.com/questions/5…

如何刪除重複項並只保留首項

姓名與班級惟一，刪除重複項，只保留首項

# mysql 就簡單不少
delete s1 from student s1, student s2
where s1.name = s2.name and s1.sex = s2.sex and s1.id > s2.id;
複製代碼

參考 StackOverflow: how can i remove duplicate rows

什麼是SQL注入

若有一條查詢語句爲

"select * from (" + table + ");"
複製代碼

當table取值 student); drop table student; -- 時，語句變爲了，會刪掉表，形成攻擊。

"select * from (student); drop table student; --);"
複製代碼

mysql中單引號，雙引號，反引號有什麼區別

反引號(`) 表示table，column 標識符。主要用在當表名或者列名爲保留字的時候。在其它一些DBMS中，也用[]表示表名和列名。

單引號(') 表示字符串。

雙引號(") 默認表示字符串，可是當sql_mode爲ANSI_QUOTES時，雙引號表示表名或者列名。

參考 StackOverflow: when to use single quotes, double quotes and backticks in mysql

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

30分鐘SQL指南

準備

工具

樣例表

SQL 基礎

術語

檢索數據

排序

數據過濾

計算字段

數據聚合 (aggregation)

數據分組

子查詢

聯接

插入數據

修改數據

建立表與更新表

視圖

約束

觸發器

存儲過程

SQL 實踐

1. 根據班級學生的分數進行排名，若是分數相等則爲同一名次

2. 寫一個函數，獲取第 N 高的分數

3. 檢索每一個班級分數前兩名學生，並顯示排名

FAQ

inner join 與 outer join 的區別是什麼

如何根據一個表的數據更新另外一個表

索引是如何工做的

如何聯接多個行的字段

如何在一個sql語句中插入多行數據

如何在select中使用條件表達式

如何找到重複項

1:1 Relation 設計的必要性在哪裏

如何刪除重複項並只保留首項

什麼是SQL注入

mysql中單引號，雙引號，反引號有什麼區別

`inner join` 與 `outer join` 的區別是什麼

如何在`select`中使用條件表達式