這裏咱們會遇到subquery,它能夠出如今select子句
中或者where子句
或者from子句
中。它會產生一個對應的結果表格,咱們能夠給這個表示命名。html
咱們這一篇文章採用PostgreSQL的SQL語法。重點咱們關注select...from...where
這種讀操做,分析query (analytical query)。
數據集在 https://hyper-db.de/interface... 能夠直接使用。另外在這個網頁不容許進行寫操做:insert
, update
, delete
之類的transactional query。固然create table
和drop table
也不被容許。mysql
架構 Schema:
sql
下載:
https://db.in.tum.de/teaching...segmentfault
Schma和大部分SQL語句來自Prof. Alfons Kemper, Ph.D.的課件和書。api
課件:架構
書: https://db.in.tum.de/teaching...spa
select * from pruefen where note < ( select avg(note) from pruefen )
-- correlated sub-query select p.persnr, p.name, ( select sum(v.sws) as lehrbelastung from vorlesungen v where v.gelesenvon = p.persnr ) from professoren p -- no sub-query select p.persnr, p.name, sum(sws) from professoren p left outer join vorlesungen v on p.persnr = v.gelesenvon group by p.name, p.persnr
select tmp.matrnr, tmp.name, tmp.vorlanzahl from (select s.matrnr, s.name, count(*) as vorlanzahl from studenten s, hoeren h where s.matrnr = h.matrnr group by s.matrnr, s.name) tmp where tmp.vorlanzahl > 2
這時候咱們對這個subquery的結果表格進行命名tmp
。固然咱們能夠用with子句
來作一樣的事情。我主觀上更喜歡用with
,它很清晰地把暫時須要的表格寫在最上方,並且對debug也更加友好。固然二者是結果等價,運行時間也等價的。debug
with tmp as (select s.matrnr, s.name, count(*) as vorlanzahl from studenten s, hoeren h where s.matrnr = h.matrnr group by s.matrnr, s.name) select tmp.matrnr, tmp.name, tmp.vorlanzahl from tmp where tmp.vorlanzahl > 2
select h.vorlnr, h.anzProVorl, g.gesamtAnz, cast(h.anzProVorl as decimal(6, 1)) / g.gesamtAnz as MarkAnteil from (select vorlnr, count(*) as anzProVorl from hoeren group by vorlnr) as h, (select count(*) as gesamtAnz from studenten) g
-- with子句版本 with h as (select vorlnr, count(*) as anzProVorl from hoeren group by vorlnr), g as (select count(*) as gesamtAnz from studenten) select h.vorlnr, h.anzProVorl, g.gesamtAnz, cast(h.anzProVorl as decimal(6, 1)) / g.gesamtAnz as MarkAnteil from h, g
with kenntSich as ( select distinct v.gelesenvon as profpersnr, h.matrnr as studmatrnr from hoeren h join vorlesungen v on h.vorlnr =v.vorlnr ), kenntAnzahl as ( select profpersnr, count(*) as anzstudenten from kenntSich group by profpersnr), wieviel as ( select count(*) as gesamtanz from studenten) select k.profpersnr, p.name, k.anzstudenten, w.gesamtanz, 1.00 * k.anzstudenten / w.gesamtanz as bekanntheitsgard from kenntAnzahl k, wieviel w, professoren p where k.profpersnr = p.persnr order by bekanntheitsgard desc
SELECT s.* FROM studenten s where not exists( select * from vorlesungen v where v.sws = 4 and not exists( select * from hoeren h where h.vorlnr = v.vorlnr and h.matrnr = s.matrnr ) )
SQL92中沒有定義for all Quantifier(全稱量詞)。因此咱們只能改寫關係代數:code
$$ \{s|s\in studenten \wedge \forall v \in vorlesungen (v.sws = 4 \Rightarrow \\ \exists h \in hoeren (h.vorlnr = v.vorlnr \wedge h.matrnr = s.matrnr)) \} $$htm
咱們先把$\forall t \in R (P(t))$改寫成$\neg (\exists t \in R(\neg P(t)))$:
$$ \{s|s\in studenten \wedge \neg (\exists v \in vorlesungen \; \neg (v.sws = 4 \Rightarrow \\ \exists h \in hoeren (h.vorlnr = v.vorlnr \wedge h.matrnr = s.matrnr))) \} $$
再把$R \Rightarrow T$改寫成$\neg R \vee T$:
$$ \{s|s\in studenten \wedge \neg (\exists v \in vorlesungen \; \neg (\neg (v.sws = 4) \vee \\ \exists h \in hoeren (h.vorlnr = v.vorlnr \wedge h.matrnr = s.matrnr))) \} $$
再用DeMorgan律簡化一下:
$$ \{s|s\in studenten \wedge \neg (\exists v \in vorlesungen (v.sws = 4) \wedge \\ \neg (\exists h \in hoeren (h.vorlnr = v.vorlnr \wedge h.matrnr = s.matrnr))) \} $$
用中文說:不存在一門sws=4的課,沒有被這個學生聽。這樣咱們能夠對應關係代數到上面的SQL。
另一種trick解法,使用count
:
-- 先把hoeren變成sws=4hoeren: hoerenStudentenWith4SWS with hoerenStudentenWith4SWS (matrnr, vorlnr) as ( select h.matrnr, v.vorlnr from hoeren h, vorlesungen v where h.vorlnr = v.vorlnr and v.sws = 4 ) -- 再看學生是否是聽完了全部hoerenStudentenWith4SWS select h.matrnr from hoerenStudentenWith4SWS h group by h.matrnr having count(*) = (select count(*) from vorlesungen v where v.sws = 4)
select s.* from studenten s where not exists( select * from pruefen p where p.matrnr = s.matrnr and not exists( select * from hoeren h where h.vorlnr = p.vorlnr and h.matrnr = s.matrnr ) )
用中文說:沒有一門被考過的科目,沒有出如今對應學生hoeren表格中。
另外由於這個要求是獨立得應用在每個學生上,每個學生由於考試不一樣,全部要求聽的科目也不一樣。所以上面那題的trick
再也不適用。trick
應用條件是對全部學生須要廣泛性,而排除獨立性 -- 一視同仁
。
with vl_von_sokrates as ( select * from vorlesungen v, professoren p where v.gelesenvon = p.persnr and p.name = 'Sokrates' ), studenten_von_sokrates as ( select distinct s.name, s.matrnr, s.semester from studenten s, hoeren h, vl_von_sokrates v where s.matrnr = h.matrnr and h.vorlnr = v.vorlnr ) select avg(semester) from studenten_von_sokrates;
這題必定要注意,可能一個學生聽了Sokrates的不少課,可是這種同窗不能被重複計數。咱們能夠用distinct
。
可是咱們也有一種解法不須要distinct
,它不用join
,而是帶exists
的correlated subquery:
with vl_von_sokrates as ( select * from vorlesungen v, professoren p where v.gelesenvon = p.persnr and p.name = 'Sokrates' ), studenten_von_sokrates as ( select * from studenten s where exists( select * from hoeren h, vl_von_sokrates vl where h.matrnr = s.matrnr and h.vorlnr = vl.vorlnr ) ) select avg(semester) from studenten_von_sokrates;
select count(*) as hcount from hoeren ), s as ( select count(*) as scount from studenten ) select hcount / (scount * 1.00) as avg_vl from h, s
或者
with h as ( select count(*) as hcount from hoeren ), s as ( select count(*) as scount from studenten ) select hcount / (cast(scount as decimal(10, 4))) as avg_vl from h, s
select s1.name, s2.name from studenten s1, hoeren h1, hoeren h2, studenten s2 where h1.vorlnr = h2.vorlnr and h1.matrnr = s1.matrnr and h2.matrnr = s2.matrnr and s1.matrnr != s2.matrnr
with bekannte as ( select s1.matrnr as student, s2.matrnr as sein_bekannte from studenten s1, hoeren h1, hoeren h2, studenten s2 where h1.vorlnr = h2.vorlnr and h1.matrnr = s1.matrnr and h2.matrnr = s2.matrnr and s1.matrnr != s2.matrnr ) select s.matrnr, s.name, count(b.sein_bekannte) as num_friends from studenten s, bekannte b where s.matrnr = b.student group by s.matrnr, s.name order by num_friends desc
with bekannte as ( select s1.matrnr as student, s2.matrnr as sein_bekannte from studenten s1, hoeren h1, hoeren h2, studenten s2 where h1.vorlnr = h2.vorlnr and h1.matrnr = s1.matrnr and h2.matrnr = s2.matrnr and s1.matrnr != s2.matrnr ) select s.matrnr, s.name, count(b.sein_bekannte) as num_friends from studenten s left outer join bekannte b on s.matrnr = b.student group by s.matrnr, s.name order by num_friends desc
這裏用了一個left outer join
。右邊的表格bekannte b
只含有上課的同窗(即出如今hoeren
表格中的同窗),可是左邊的表格studenten s
含有全部學生。
with num_stu as ( select count(*) as count_stu from studenten), num_sws as ( select sum(vor.sws) as count_sws from hoeren h, vorlesungen vor where h.vorlnr = vor.vorlnr) select s.* from studenten s where s.matrnr in ( select h.matrnr from hoeren h, vorlesungen v where h.vorlnr = v.vorlnr group by h.matrnr having sum(sws) > (select cast(num_sws.count_sws as decimal (5, 2)) / num_stu.count_stu from num_sws, num_stu) )
或者
with num_stu as ( select count(*) as count_stu from studenten), num_sws as ( select sum(vor.sws) as count_sws from hoeren h, vorlesungen vor where h.vorlnr = vor.vorlnr), avg_sws as ( select cast(num_sws.count_sws as decimal(5, 2)) / num_stu.count_stu as sws from num_stu, num_sws), stu_sws as ( select s.matrnr, s.name, s.semester, sum(v.sws) as sum_sws from studenten s, hoeren h, vorlesungen v where s.matrnr = h.matrnr and h.vorlnr = v.vorlnr group by s.matrnr, s.name, s.semester) select s.* from stu_sws s, avg_sws where s.sum_sws > avg_sws.sws
或者
with swsProStudent as ( select s.matrnr, s.name, cast((case when sum(v.sws) is null then 0 else sum(v.sws) end) as real) as anzSWS from studenten s left outer join hoeren h on s.matrnr = h.matrnr left outer join vorlesungen v on h.vorlnr = v.vorlnr group by s.matrnr, s.name ) select s.* from studenten s where s.matrnr in ( select sws.matrnr from swsProStudent sws where sws.anzSWS > ( select avg(anzSWS) from swsProStudent ) )
with no_lec as ( select avg(note) as avg_note from pruefen p where not exists ( select * from hoeren h where h.matrnr = p.matrnr )), with_lec as ( select avg(note) as avg_note from pruefen p where exists ( select * from hoeren h where h.matrnr = p.matrnr )) select * from no_lec, with_lec;
假設咱們的schema變成上圖(SQL不能運行 數據集不對應上圖):
with anz(Fakname,AnzStudenten) as ( select s.FakName, count(*) from StudentenGF s group by s.FakNAme), anzw(Fakname,AnzWeiblich) as ( select sw.FakName,count(*) as AnzWeiblich from StudentenGF sw where sw.Geschlecht ='W' group by sw.FakName) select anz.FakName, anz.AnzStudenten, anzw.AnzWeiblich, (cast(anzw.AnzWeiblich as decimal(5,2))/anz.AnzStudenten * 100) as ProzentWeiblich from anz, anzw where anz.FakName = anzw.FakName
with anz(Fakname, AnzStudenten) as ( select s.FakName, count(*) from StudentenGF s group by s.FakNAme), anzm(Fakname, AnzMaenner) as ( select sw.FakName, count(*) as AnzWeiblich from StudentenGF sw where sw.Geschlecht = 'M' group by sw.FakName) select anz.FakName, anz.AnzStudenten, anzm.AnzMaenner, (case when anzm.AnzMaenner is null then 0 else anzm.AnzMaenner end) / anz.AnzStudenten * 100.00 as ProzentMaenner from anz left outer join anzm on anz.FakName = anzm.FakName
這裏並非女性版直接更改爲男性。一個重點是:存在系沒有任何男性。case
也能夠被替換爲: COALESCE(anzm.AnzMaenner, 0) / anz.AnzStudenten * 100.00 as ProzentMaenner
或者再換一種:
select fakname, (sum(case when geschlechte = 'M' then 1.00 else 0.00 end)) / count(*) from studentenFG group by fakname
select s.* from studentenFG s where not exists( select * from vorlesungen v, professorenF p where v.gelesenvon = p.persnr and p.fakname = s.fakname and not exists( select * from hoeren h where h.vorlnr = v.vorlnr and h.matrnr = s.matrnr ) )
用中文就是:對這個學生,不存在一門他系裏教授的課,這個學生沒有聽過。
或者
select s.* from studentenFG s where ( select count(*) from vorlesungen v, professorenF p where v.gelesenvon = p.persnr and p.fakname = s.fakname ) = ( select count(*) from hoeren h, vorlesungen v, professorenF p where h.matrnr = s.matrnr and h.vorlnr = v.vorlnr and p.persnr = v.gelesenvon and p.fakname= s.fakname )