實例的背景說明python
假定一個我的信息系統,須要記錄系統中各我的的故鄉、居住地、以及到過的城市。數據庫設計以下:mysql
Models.py 內容以下:
sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
from
django.db
import
models
class
Province(models.Model):
name
=
models.CharField(max_length
=
10
)
def
__unicode__(
self
):
return
self
.name
class
City(models.Model):
name
=
models.CharField(max_length
=
5
)
province
=
models.ForeignKey(Province)
def
__unicode__(
self
):
return
self
.name
class
Person(models.Model):
firstname
=
models.CharField(max_length
=
10
)
lastname
=
models.CharField(max_length
=
10
)
visitation
=
models.ManyToManyField(City, related_name
=
"visitor"
)
hometown
=
models.ForeignKey(City, related_name
=
"birth"
)
living
=
models.ForeignKey(City, related_name
=
"citizen"
)
def
__unicode__(
self
):
return
self
.firstname
+
self
.lastname
|
注1:建立的app名爲「QSOptimize」數據庫
注2:爲了簡化起見,`qsoptimize_province` 表中只有2條數據:湖北省和廣東省,`qsoptimize_city`表中只有三條數據:武漢市、十堰市和廣州市django
prefetch_related()緩存
對於多對多字段(ManyToManyField)和一對多字段,可使用prefetch_related()來進行優化。或許你會說,沒有一個叫OneToManyField的東西啊。實際上 ,ForeignKey就是一個多對一的字段,而被ForeignKey關聯的字段就是一對多字段了。app
做用和方法框架
prefetch_related()和select_related()的設計目的很類似,都是爲了減小SQL查詢的數量,可是實現的方式不同。後者是經過JOIN語句,在SQL查詢內解決問題。可是對於多對多關係,使用SQL語句解決就顯得有些不太明智,由於JOIN獲得的表將會很長,會致使SQL語句運行時間的增長和內存佔用的增長。如有n個對象,每一個對象的多對多字段對應Mi條,就會生成Σ(n)Mi 行的結果表。數據庫設計
prefetch_related()的解決方法是,分別查詢每一個表,而後用Python處理他們之間的關係。繼續以上邊的例子進行說明,若是咱們要得到張三全部去過的城市,使用prefetch_related()應該是這麼作:
函數
1
2
3
4
|
>>> zhangs
=
Person.objects.prefetch_related(
'visitation'
).get(firstname
=
u
"張"
,lastname
=
u
"三"
)
>>>
for
city
in
zhangs.visitation.
all
() :
...
print
city
...
|
上述代碼觸發的SQL查詢以下:
1
2
3
4
5
6
7
8
9
10
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`,
`QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`
WHERE (`QSOptimize_person`.`lastname`
=
'三'
AND `QSOptimize_person`.`firstname`
=
'張'
);
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
);
|
第一條SQL查詢僅僅是獲取張三的Person對象,第二條比較關鍵,它選取關係表`QSOptimize_person_visitation`中`person_id`爲張三的行,而後和`city`表內聯(INNER JOIN 也叫等值鏈接)獲得結果表。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| firstname | lastname | hometown_id | living_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 張 | 三 |
3
|
1
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.00
sec)
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
| _prefetch_related_val |
id
| name | province_id |
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
|
1
| 武漢市 |
1
|
|
1
|
2
| 廣州市 |
2
|
|
1
|
3
| 十堰市 |
1
|
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
3
rows
in
set
(
0.00
sec)
|
顯然張三武漢、廣州、十堰都去過。
又或者,咱們要得到湖北的全部城市名,能夠這樣:
1
2
3
4
|
>>> hb
=
Province.objects.prefetch_related(
'city_set'
).get(name__iexact
=
u
"湖北省"
)
>>>
for
city
in
hb.city_set.
all
():
... city.name
...
|
觸發的SQL查詢:
1
2
3
4
5
6
7
|
SELECT `QSOptimize_province`.`
id
`, `QSOptimize_province`.`name`
FROM `QSOptimize_province`
WHERE `QSOptimize_province`.`name` LIKE
'湖北省'
;
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
WHERE `QSOptimize_city`.`province_id` IN (
1
);
|
獲得的表:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 湖北省 |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.00
sec)
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name | province_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 武漢市 |
1
|
|
3
| 十堰市 |
1
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
|
咱們能夠看見,prefetch使用的是 IN 語句實現的。這樣,在QuerySet中的對象數量過多的時候,根據數據庫特性的不一樣有可能形成性能問題。
使用方法
*lookups 參數
prefetch_related()在Django < 1.7 只有這一種用法。和select_related()同樣,prefetch_related()也支持深度查詢,例如要得到全部姓張的人去過的省:
1
2
3
4
5
|
>>> zhangs
=
Person.objects.prefetch_related(
'visitation__province'
).
filter
(firstname__iexact
=
u
'張'
)
>>>
for
i
in
zhangs:
...
for
city
in
i.visitation.
all
():
...
print
city.province
...
|
觸發的SQL:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`,
`QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`
WHERE `QSOptimize_person`.`firstname` LIKE
'張'
;
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
,
4
);
SELECT `QSOptimize_province`.`
id
`, `QSOptimize_province`.`name`
FROM `QSOptimize_province`
WHERE `QSOptimize_province`.`
id
` IN (
1
,
2
);
|
得到的結果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| firstname | lastname | hometown_id | living_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 張 | 三 |
3
|
1
|
|
4
| 張 | 六 |
2
|
2
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
| _prefetch_related_val |
id
| name | province_id |
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
|
1
| 武漢市 |
1
|
|
1
|
2
| 廣州市 |
2
|
|
4
|
2
| 廣州市 |
2
|
|
1
|
3
| 十堰市 |
1
|
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
4
rows
in
set
(
0.00
sec)
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 湖北省 |
|
2
| 廣東省 |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
|
值得一提的是,鏈式prefetch_related會將這些查詢添加起來,就像1.7中的select_related那樣。
要注意的是,在使用QuerySet的時候,一旦在鏈式操做中改變了數據庫請求,以前用prefetch_related緩存的數據將會被忽略掉。這會致使Django從新請求數據庫來得到相應的數據,從而形成性能問題。這裏提到的改變數據庫請求指各類filter()、exclude()等等最終會改變SQL代碼的操做。而all()並不會改變最終的數據庫請求,所以是不會致使從新請求數據庫的。
舉個例子,要獲取全部人訪問過的城市中帶有「市」字的城市,這樣作會致使大量的SQL查詢:
1
2
|
plist
=
Person.objects.prefetch_related(
'visitation'
)
[p.visitation.
filter
(name__icontains
=
u
"市"
)
for
p
in
plist]
|
由於數據庫中有4人,致使了2+4次SQL查詢:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`,
`QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`;
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
,
2
,
3
,
4
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE(`QSOptimize_person_visitation`.`person_id`
=
1
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
2
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
3
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
4
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
|
詳細分析一下這些請求事件。
衆所周知,QuerySet是lazy的,要用的時候纔會去訪問數據庫。運行到第二行Python代碼時,for循環將plist看作iterator,這會觸發數據庫查詢。最初的兩次SQL查詢就是prefetch_related致使的。
雖然已經查詢結果中包含全部所需的city的信息,但由於在循環體中對Person.visitation進行了filter操做,這顯然改變了數據庫請求。所以這些操做會忽略掉以前緩存到的數據,從新進行SQL查詢。
可是若是有這樣的需求了應該怎麼辦呢?在Django >= 1.7,能夠經過下一節的Prefetch對象來實現,若是你的環境是Django < 1.7,能夠在Python中完成這部分操做。
1
2
|
plist
=
Person.objects.prefetch_related(
'visitation'
)
[[city
for
city
in
p.visitation.
all
()
if
u
"市"
in
city.name]
for
p
in
plist]
|
Prefetch 對象
在Django >= 1.7,能夠用Prefetch對象來控制prefetch_related函數的行爲。
注:因爲我沒有安裝1.7版本的Django環境,本節內容是參考Django文檔寫的,沒有進行實際的測試。
Prefetch對象的特徵:
繼續上面的例子,獲取全部人訪問過的城市中帶有「武」字和「州」的城市:
1
2
3
4
5
6
7
|
wus
=
City.objects.
filter
(name__icontains
=
u
"武"
)
zhous
=
City.objects.
filter
(name__icontains
=
u
"州"
)
plist
=
Person.objects.prefetch_related(
Prefetch(
'visitation'
, queryset
=
wus, to_attr
=
"wu_city"
),
Prefetch(
'visitation'
, queryset
=
zhous, to_attr
=
"zhou_city"
),)
[p.wu_city
for
p
in
plist]
[p.zhou_city
for
p
in
plist]
|
注:這段代碼沒有在實際環境中測試過,如有不正確的地方請指正。
順帶一提,Prefetch對象和字符串參數能夠混用。
None
能夠經過傳入一個None來清空以前的prefetch_related。就像這樣:
1
|
>>> prefetch_cleared_qset
=
qset.prefetch_related(
None
)
|
小結