SQL-如何使用 MongoDB和PyMongo。

時間 2019-11-12

標籤 sql 如何使用 mongodb pymongo 欄目 SQL 简体版

原文原文鏈接

先決條件

在開始以前，請確保已經安裝了 PyMongo 發行版。在 Python shell 中，下面的代碼應該在不引起異常的狀況下運行:html

>>> import pymongo

假設 MongoDB 實例在默認主機和端口上運行。假設你已經下載並安裝了 MongoDB，你能夠這樣啓動它:python

$ mongod

與 MongoClient 創建鏈接web

使用 PyMongo 時的第一步是爲運行的 mongod 實例建立一個 MongoClient。這樣作很簡單:mongodb

>>> from pymongo import MongoClient
>>> client = MongoClient()

上面的代碼將在默認主機和端口上鍊接。咱們也能夠明確地指定主機和端口，以下所示:shell

>>> client = MongoClient('localhost', 27017)

或者使用 MongoDB URI 格式:數據庫

>>> client = MongoClient('mongodb://localhost:27017/')

獲取數據庫

MongoDB的單個實例能夠支持多個獨立的數據庫。使用PyMongo時，您能夠使用MongoClient實例上的屬性樣式訪問來訪問數據庫：api

>>> db = client.test_database

若是您的數據庫名稱使用屬性樣式訪問不起做用（例如test-database），則能夠使用字典樣式訪問：服務器

>>> db = client['test-database']

獲取集合

一個集合是一組存儲在MongoDB中的文檔，而且能夠被認爲是大體在關係數據庫中的表的當量。在PyMongo中獲取集合與獲取數據庫的工做方式相同：架構

>>> collection = db.test_collection

或（使用字典樣式訪問）：app

>>> collection = db['test-collection']

關於MongoDB中的集合（和數據庫）的一個重要注意事項是它們是懶惰建立的 - 上述命令都沒有在MongoDB服務器上實際執行過任何操做。將第一個文檔插入其中時，將建立集合和數據庫。

文件

使用JSON樣式的文檔表示（並存儲）MongoDB中的數據。在PyMongo中，咱們使用字典來表示文檔。例如，如下字典可能用於表示博客帖子：

>>> import datetime
>>> post = {"author": "Mike",
...         "text": "My first blog post!",
...         "tags": ["mongodb", "python", "pymongo"],
...         "date": datetime.datetime.utcnow()}

請注意，文檔能夠包含本機Python類型（如datetime.datetime實例），這些類型將自動轉換爲適當的BSON類型。

插入文檔

要將文檔插入集合，咱們能夠使用如下 insert_one()方法：

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('...')

插入文檔時"_id"，若是文檔還沒有包含"_id"密鑰，則會自動添加特殊鍵。"_id"整個集合中的值必須是惟一的。insert_one()返回一個實例InsertOneResult。有關更多信息"_id"，請參閱_id上的文檔。

插入第一個文檔後，實際上已在服務器上建立了posts集合。咱們能夠經過在數據庫中列出全部集合來驗證這一點：

>>> db.collection_names(include_system_collections=False)
[u'posts']

用獲取單個文檔`find_one()`

能夠在MongoDB中執行的最基本類型的查詢是 find_one()。此方法返回與查詢匹配的單個文檔（或者None若是沒有匹配項）。當您知道只有一個匹配的文檔，或者只對第一個匹配感興趣時，它頗有用。這裏咱們用來 find_one()從posts集合中獲取第一個文檔：

>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

結果是一個與前面插入的字典匹配的字典。

注意返回的文檔包含一個"_id"，在插入時自動添加。

find_one()還支持查詢生成的文檔必須匹配的特定元素。要將咱們的結果限制爲做者「Mike」的文檔，咱們會：

>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

若是咱們嘗試使用其餘做者，例如「艾略特」，咱們將得不到任何結果：

>>> posts.find_one({"author": "Eliot"})

經過ObjectId查詢

咱們也能夠經過它找到一個帖子_id，在咱們的例子中是一個ObjectId：

>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

請注意，ObjectId與其字符串表示形式不一樣：

>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result

Web應用程序中的常見任務是從請求URL獲取ObjectId並查找匹配的文檔。在這種狀況下，有必要在將ObjectId傳遞給以前將其從字符串轉換爲 find_one：

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

也能夠看看當我在Web應用程序中經過ObjectId查詢文檔時，我得不到任何結果

關於Unicode字符串的註釋

您可能已經注意到，從服務器檢索時，咱們以前存儲的常規Python字符串看起來有所不一樣（例如，u'Mike'而不是'Mike'）。簡短的解釋是有序的。

MongoDB以BSON格式存儲數據。BSON字符串是UTF-8編碼的，所以PyMongo必須確保它存儲的任何字符串僅包含有效的UTF-8數據。常規字符串（<type'str'>）通過驗證並保持不變。Unicode字符串（<type'unicode'>）首先編碼爲UTF-8。咱們的示例字符串在Python shell中表示爲u'Mike'而不是'Mike'的緣由是PyMongo將每一個BSON字符串解碼爲Python unicode字符串，而不是常規str。

批量插入

爲了使查詢更有趣，讓咱們再插入一些文檔。除了插入單個文檔以外，咱們還能夠經過將列表做爲第一個參數傳遞來執行批量插入操做insert_many()。這將在列表中插入每一個文檔，只向服務器發送一個命令：

>>> new_posts = [{"author": "Mike",
...               "text": "Another post!",
...               "tags": ["bulk", "insert"],
...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
...              {"author": "Eliot",
...               "title": "MongoDB is fun",
...               "text": "and pretty easy too!",
...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]

關於這個例子，有幾個有趣的事情須要注意：

insert_many()如今的結果返回兩個ObjectId實例，每一個插入一個文檔。

new_posts[1]與其餘帖子有不一樣的「形狀」 - 沒有"tags"字段，咱們添加了一個新字段， "title"。當咱們說MongoDB沒有架構時，這就是咱們的意思。

查詢多個文檔

要獲取查詢結果之外的多個文檔，咱們使用該 find() 方法。find()返回一個 Cursor實例，它容許咱們迭代全部匹配的文檔。例如，咱們能夠迭代posts集合中的每一個文檔：

>>> for post in posts.find():
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}

就像咱們同樣find_one()，咱們能夠傳遞一個文檔find() 來限制返回的結果。在這裏，咱們只得到做者爲「Mike」的文檔：

>>> for post in posts.find({"author": "Mike"}):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

計數

若是咱們只想知道有多少文檔與查詢匹配，咱們能夠執行count_documents()操做而不是完整查詢。咱們能夠計算集合中的全部文檔：

>>> posts.count_documents({})
3

或者只是那些與特定查詢匹配的文檔：

>>> posts.count_documents({"author": "Mike"})
2

範圍查詢

MongoDB支持許多不一樣類型的高級查詢。例如，讓咱們執行查詢，將結果限制爲超過特定日期的帖子，同時按做者對結果進行排序：

>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

這裏咱們使用特殊"$lt"運算符來進行範圍查詢，並調用sort()按做者對結果進行排序。

索引

添加索引能夠幫助加速某些查詢，還能夠添加其餘功能來查詢和存儲文檔。在此示例中，咱們將演示如何在鍵上建立惟一索引，該索引拒絕索引中已存在該鍵值的文檔。

首先，咱們須要建立索引：

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
...                                   unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']

請注意，咱們如今有兩個索引：一個是_idMongoDB自動建立的索引，另外一個是user_id咱們剛建立的索引。

如今讓咱們設置一些用戶配置文件：

>>> user_profiles = [
...     {'user_id': 211, 'name': 'Luke'},
...     {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

索引阻止咱們插入user_id已在集合中的文檔：

>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }