有效使用Django的QuerySets

時間 2019-12-11

原文原文鏈接

對象關係映射 (ORM) 使得與SQL數據庫交互更爲簡單，不過也被認爲效率不高，比原始的SQL要慢。python

　　要有效的使用ORM，意味着須要多少要明白它是如何查詢數據庫的。本文我將重點介紹如何有效使用 Django ORM系統訪問中到大型的數據集。web

　Django的queryset是惰性的

　　Django的queryset對應於數據庫的若干記錄（row），經過可選的查詢來過濾。例如，下面的代碼會獲得數據庫中名字爲‘Dave’的全部的人:數據庫

1	`person_set` `=` `Person.objects.` `filter` `(first_name` `=` `"Dave"` `)`

　　上面的代碼並無運行任何的數據庫查詢。你可使用person_set，給它加上一些過濾條件，或者將它傳給某個函數，這些操做都不會發送給數據庫。這是對的，由於數據庫查詢是顯著影響web應用性能的因素之一。django

　　要真正從數據庫得到數據，你須要遍歷queryset:緩存

1 2	`for` `person` `in` `person_set:` `print` `(person.last_name)`

　Django的queryset是具備cache的

　　當你遍歷queryset時，全部匹配的記錄會從數據庫獲取，而後轉換成Django的model。這被稱爲執行（evaluation）。這些model會保存在queryset內置的cache中，這樣若是你再次遍歷這個queryset，你不須要重複運行通用的查詢。函數

　　例如，下面的代碼只會執行一次數據庫查詢：性能

1

2

3

4

5

6

7

 
        pet_set  
        = 
        Pet.objects. 
        filter 
        (species 
        = 
        "Dog" 
        ) 
       

 
        # The query is executed and cached. 
       

 
        for 
        pet  
        in 
        pet_set: 
       

 
        print 
        (pet.first_name)

 
        # The cache is used for subsequent iteration. 
       

 
        for 
        pet  
        in 
        pet_set: 
       

 
        print 
        (pet.last_name)

　if語句會觸發queryset的執行

　　queryset的cache最有用的地方是能夠有效的測試queryset是否包含數據，只有有數據時纔會去遍歷：測試

1

2

3

4

5

6

 
        restaurant_set  
        = 
        Restaurant.objects. 
        filter 
        (cuisine 
        = 
        "Indian" 
        ) 
       

 
        # `if`語句會觸發queryset的執行。 
       

 
        if 
        restaurant_set:

 
        # 遍歷時用的是cache中的數據

 
        for 
        restaurant  
        in 
        restaurant_set:

 
        print 
        (restaurant.name)

　若是不須要全部數據，queryset的cache可能會是個問題

　　有時候，你也許只想知道是否有數據存在，而不須要遍歷全部的數據。這種狀況，簡單的使用if語句進行判斷也會徹底執行整個queryset而且把數據放入cache，雖然你並不須要這些數據！fetch

1

2

3

4

5

 
        city_set  
        = 
        City.objects. 
        filter 
        (name 
        = 
        "Cambridge" 
        ) 
       

 
        # `if`語句會執行queryset.。 
       

 
        if 
        city_set:

 
        # 咱們並不須要全部的數據，可是ORM仍然會獲取全部記錄！

 
        print 
        ( 
        "At least one city called Cambridge still stands!" 
        )

　　爲了不這個，能夠用exists()方法來檢查是否有數據：優化

1

2

3

4

5

 
        tree_set  
        = 
        Tree.objects. 
        filter 
        ( 
        type 
        = 
        "deciduous" 
        ) 
       

 
        # `exists()`的檢查能夠避免數據放入queryset的cache。 
       

 
        if 
        tree_set.exists():

 
        # 沒有數據從數據庫獲取，從而節省了帶寬和內存

 
        print 
        ( 
        "There are still hardwood trees in the world!" 
        )

　當queryset很是巨大時，cache會成爲問題

　　處理成千上萬的記錄時，將它們一次裝入內存是很浪費的。更糟糕的是，巨大的queryset可能會鎖住系統進程，讓你的程序瀕臨崩潰。

　　要避免在遍歷數據的同時產生queryset cache，可使用iterator()方法來獲取數據，處理完數據就將其丟棄。

1

2

3

4

 
        star_set  
        = 
        Star.objects. 
        all 
        () 
       

 
        # `iterator()`能夠一次只從數據庫獲取少許數據，這樣能夠節省內存 
       

 
        for 
        star  
        in 
        star_set.iterator(): 
       

 
        print 
        (star.name)

　　固然，使用iterator()方法來防止生成cache，意味着遍歷同一個queryset時會重複執行查詢。因此使用iterator()的時候要小心，確保你的代碼在操做一個大的queryset時沒有重複執行查詢

　若是查詢集很大的話，if 語句是個問題

　　如前所述，查詢集緩存對於組合 if 語句和 for 語句是很強大的，它容許在一個查詢集上進行有條件的循環。然而對於很大的查詢集，則不適合使用查詢集緩存。

　　最簡單的解決方案是結合使用exists()和iterator(), 經過使用兩次數據庫查詢來避免使用查詢集緩存。

1

2

3

4

5

6

 
        molecule_set  
        = 
        Molecule.objects. 
        all 
        () 
       

 
        # One database query to test if any rows exist. 
       

 
        if 
        molecule_set.exists():

 
        # Another database query to start fetching the rows in batches.

 
        for 
        molecule  
        in 
        molecule_set.iterator():

 
        print 
        (molecule.velocity)

　　一個更復雜點的方案是使用 Python 的「高級迭代方法」在開始循環前先查看一下 iterator() 的第一個元素再決定是否進行循環。

 
        atom_set  
        = 
        Atom.objects. 
        all 
        () 
       

 
        # One database query to start fetching the rows in batches. 
       

 
        atom_iterator  
        = 
        atom_set.iterator() 
       

 
        # Peek at the first item in the iterator. 
       

 
        try 
        :

 
        first_atom  
        = 
        next 
        (atom_iterator)

 
        except 
        StopIteration:

 
        # No rows were found, so do nothing.

 
        pass

 
        else 
        :

 
        # At least one row was found, so iterate over

 
        # all the rows, including the first one.

 
        from 
        itertools  
        import 
        chain

 
        for 
        atom  
        in 
        chain([first_atom], atom_set):

 
        print 
        (atom.mass)

　防止不當的優化

　　queryset的cache是用於減小程序對數據庫的查詢，在一般的使用下會保證只有在須要的時候纔會查詢數據庫。

　　使用exists()和iterator()方法能夠優化程序對內存的使用。不過，因爲它們並不會生成queryset cache，可能會形成額外的數據庫查詢。

　　因此編碼時須要注意一下，若是程序開始變慢，你須要看看代碼的瓶頸在哪裏，是否會有一些小的優化能夠幫到你。

　　英文原文：Using Django querysets effectively

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。