https://blog.csdn.net/cugbabybear/article/details/38342793python
這是本系列的第二篇,內容是 prefetch_related() 函數的用途、實現途徑、以及用法。sql
本系列的第一篇在這裏數據庫
第三篇在這裏緩存
對於多對多字段(ManyToManyField)和一對多字段。可以使用prefetch_related()來進行優化。也許你會說,沒有一個叫OneToManyField的東西啊。實際上 ,ForeignKey就是一個多對一的字段。而被ForeignKey關聯的字段就是一對多字段了。函數
prefetch_related()和select_related()的設計目的很是類似,都是爲了下降SQL查詢的數量,但是實現的方式不同。後者是經過JOIN語句。在SQL查詢內解決這個問題。性能
但是對於多對多關係。使用SQL語句解決就顯得有些不太明智。因爲JOIN獲得的表將會很是長,會致使SQL語句執行時間的添加和內存佔用的添加。如有n個對象。每個對象的多對多字段相應Mi條,就會生成Σ(n)Mi 行的結果表。fetch
prefetch_related()的解決方法是,分別查詢每個表,而後用Python處理他們之間的關係。優化
繼續以上邊的樣例進行說明,假設咱們要得到張三所有去過的城市,使用prefetch_related()應該是這麼作:spa
>>> zhangs = Person.objects.prefetch_related('visitation').get(firstname=u"張",lastname=u"三") >>> for city in zhangs.visitation.all() : ... print city ...
上述代碼觸發的SQL查詢例如如下:.net
SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id` FROM `QSOptimize_person` WHERE (`QSOptimize_person`.`lastname` = '三' AND `QSOptimize_person`.`firstname` = '張'); SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1);
第一條SQL查詢不過獲取張三的Person對象,第二條比較關鍵。它選取關係表`QSOptimize_person_visitation`中`person_id`爲張三的行。而後和`city`表內聯(INNER JOIN 也叫等值鏈接)獲得結果表。
+----+-----------+----------+-------------+-----------+ | id | firstname | lastname | hometown_id | living_id | +----+-----------+----------+-------------+-----------+ | 1 | 張 | 三 | 3 | 1 | +----+-----------+----------+-------------+-----------+ 1 row in set (0.00 sec) +-----------------------+----+-----------+-------------+ | _prefetch_related_val | id | name | province_id | +-----------------------+----+-----------+-------------+ | 1 | 1 | 武漢市 | 1 | | 1 | 2 | 廣州市 | 2 | | 1 | 3 | 十堰市 | 1 | +-----------------------+----+-----------+-------------+ 3 rows in set (0.00 sec)
顯然張三武漢、廣州、十堰都去過。
又或者,咱們要得到湖北的所有城市名,可以這樣:
>>> hb = Province.objects.prefetch_related('city_set').get(name__iexact=u"湖北省") >>> for city in hb.city_set.all(): ... city.name ...
觸發的SQL查詢:
SELECT `QSOptimize_province`.`id`, `QSOptimize_province`.`name` FROM `QSOptimize_province` WHERE `QSOptimize_province`.`name` LIKE '湖北省' ; SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` WHERE `QSOptimize_city`.`province_id` IN (1);
獲得的表:
+----+-----------+ | id | name | +----+-----------+ | 1 | 湖北省 | +----+-----------+ 1 row in set (0.00 sec) +----+-----------+-------------+ | id | name | province_id | +----+-----------+-------------+ | 1 | 武漢市 | 1 | | 3 | 十堰市 | 1 | +----+-----------+-------------+ 2 rows in set (0.00 sec)
咱們可以看見,prefetch使用的是 IN 語句實現的。這樣,在QuerySet中的對象數量過多的時候,依據數據庫特性的不一樣有可能形成性能問題。
prefetch_related()在Django < 1.7 僅僅有這一種使用方法。和select_related()同樣,prefetch_related()也支持深度查詢,好比要得到所有姓張的人去過的省:
>>> zhangs = Person.objects.prefetch_related('visitation__province').filter(firstname__iexact=u'張') >>> for i in zhangs: ... for city in i.visitation.all(): ... print city.province ...
觸發的SQL:
SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id` FROM `QSOptimize_person` WHERE `QSOptimize_person`.`firstname` LIKE '張' ; SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1, 4); SELECT `QSOptimize_province`.`id`, `QSOptimize_province`.`name` FROM `QSOptimize_province` WHERE `QSOptimize_province`.`id` IN (1, 2);
得到的結果:
+----+-----------+----------+-------------+-----------+ | id | firstname | lastname | hometown_id | living_id | +----+-----------+----------+-------------+-----------+ | 1 | 張 | 三 | 3 | 1 | | 4 | 張 | 六 | 2 | 2 | +----+-----------+----------+-------------+-----------+ 2 rows in set (0.00 sec) +-----------------------+----+-----------+-------------+ | _prefetch_related_val | id | name | province_id | +-----------------------+----+-----------+-------------+ | 1 | 1 | 武漢市 | 1 | | 1 | 2 | 廣州市 | 2 | | 4 | 2 | 廣州市 | 2 | | 1 | 3 | 十堰市 | 1 | +-----------------------+----+-----------+-------------+ 4 rows in set (0.00 sec) +----+-----------+ | id | name | +----+-----------+ | 1 | 湖北省 | | 2 | 廣東省 | +----+-----------+ 2 rows in set (0.00 sec)
值得一提的是。鏈式prefetch_related會將這些查詢加入起來,就像1.7中的select_related那樣。
要注意的是。在使用QuerySet的時候,一旦在鏈式操做中改變了數據庫請求,以前用prefetch_related緩存的數據將會被忽略掉。
這會致使Django又一次請求數據庫來得到對應的數據,從而形成性能問題。這裏提到的改變數據庫請求指各類filter()、exclude()等等終於會改變SQL代碼的操做。而all()並不會改變終於的數據庫請求,所以是不會致使又一次請求數據庫的。
舉個樣例,要獲取所有人訪問過的城市中帶有「市」字的城市。這樣作會致使大量的SQL查詢:
plist = Person.objects.prefetch_related('visitation') [p.visitation.filter(name__icontains=u"市") for p in plist]
因爲數據庫中有4人。致使了2+4次SQL查詢:
SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id` FROM `QSOptimize_person`; SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1, 2, 3, 4); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE(`QSOptimize_person_visitation`.`person_id` = 1 AND `QSOptimize_city`.`name` LIKE '%市%' ); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE (`QSOptimize_person_visitation`.`person_id` = 2 AND `QSOptimize_city`.`name` LIKE '%市%' ); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE (`QSOptimize_person_visitation`.`person_id` = 3 AND `QSOptimize_city`.`name` LIKE '%市%' ); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE (`QSOptimize_person_visitation`.`person_id` = 4 AND `QSOptimize_city`.`name` LIKE '%市%' );
具體分析一下這些請求事件。
衆所周知。QuerySet是lazy的,要用的時候纔會去訪問數據庫。執行到第二行Python代碼時。for循環將plist看作iterator,這會觸發數據庫查詢。最初的兩次SQL查詢就是prefetch_related致使的。
儘管已經查詢結果中包括所有所需的city的信息,但因爲在循環體中對Person.visitation進行了filter操做,這顯然改變了數據庫請求。所以這些操做會忽略掉以前緩存到的數據。又一次進行SQL查詢。
但是假設有這種需求了應該怎麼辦呢?在Django >= 1.7,可以經過下一節的Prefetch對象來實現,假設你的環境是Django < 1.7。可以在Python中完畢這部分操做。
plist = Person.objects.prefetch_related('visitation') [[city for city in p.visitation.all() if u"市" in city.name] for p in plist]
在Django >= 1.7。可以用Prefetch對象來控制prefetch_related函數的行爲。
注:由於我沒有安裝1.7版本號的Django環境。本節內容是參考Django文檔寫的,沒有進行實際的測試。
Prefetch對象的特徵:
繼續上面的樣例,獲取所有人訪問過的城市中帶有「武」字和「州」的城市:
wus = City.objects.filter(name__icontains = u"武") zhous = City.objects.filter(name__icontains = u"州") plist = Person.objects.prefetch_related( Prefetch('visitation', queryset = wus, to_attr = "wu_city"), Prefetch('visitation', queryset = zhous, to_attr = "zhou_city"),) [p.wu_city for p in plist] [p.zhou_city for p in plist]
注:這段代碼沒有在實際環境中測試過。如有不對的地方請指正。
順帶一提。Prefetch對象和字符串參數可以混用。
可以經過傳入一個None來清空以前的prefetch_related。就像這樣:
>>> prefetch_cleared_qset = qset.prefetch_related(None)