基於 pyMongo 和 wxPython 實現的 MongoDB Schema Analyser

時間 2019-11-18

標籤基於 pymongo wxpython 實現 mongodb schema analyser 欄目 MongoDB 简体版

原文原文鏈接

MongoDB 做爲文檔型 NoSql 數據庫，它的集合表結構每每不像關係型數據庫那麼固定和統一，同一個集合下的文檔（document）的字段變化和差別可能很大，特別是在數據模型缺少良好規劃和規範的數據庫。java

當接手一個基於 MongoDB 存儲、計算的新項目，在缺少 ORM 等映射抽象的狀況下，瞭解其庫表的結構以及集合的Schema十分的重要。MongoBooster(MongoDB4.0 以後爲 NoSQlBooster for MongoDB) 可視化數據庫客戶端是一個方便高效的工具，它集成了mongo Shell，提供了對數據庫的各類操做，包括CRUD、數據庫表狀態查詢等等。它很是強大，天然也包含了對 Schema 的分析，遺憾的是，這個功能只對註冊用戶開放，非註冊用戶只能在test數據庫上作test...python

如下對本機mongodb://localhost:27017,localhost:27019,localhost:27020副本集上的test數據庫的test集合進行Schema 分析爲例。下圖爲MongoBooster 的Schema 分析結果。
git

這裏，爲了表達對自由開放的推崇，對於MongoDB Schema Analyser的功能，我找到了兩個替代工具。github

Variety.js

> https://github.com/variety/variety

Schema 分析的命令行工具。
命令行調用：mongo [mongoURI] --eval " var collection = 'test'" variety.js正則表達式

基於javaScript，支持的參數不少，可是運行速度不能報很高指望，對於大集合程序每每也會崩潰。
mongodb

pyMonSchema

> https://github.com/HanseyLee/pyMonSchema

pyMonSchema是一個基於pyMongo 和 wxPython 實現的 MongoDB Schema Analyser GUI 工具，界面鏈接和切換數據庫集合，支持自定義查詢語句、查詢排序、限值，支持忽略鍵名數組及忽略鍵名的正則表達式，支持嵌套字段的分析。Schema 分析使用MongoDB 的 MapReduce，速度和穩定性上遠高於Variety.js。
自定義字段的使用說明：數據庫

- Query -> MongoDB query document to filter input to analyse. e.g. {"keyName": {"$in": ["key1", "key2"]}}, {"keyName": {"$exists": True}}(Note that: PyMonSchema use "eval()" to deserialize query string, so use 'True'/'False' as bool value)
    - Order -> Positive/Negative, used in sort document, order=Positive equivalent to sort("_id":1), order=Negative equivalent to sort("_id":-1).
    - Limit -> Int, limit value of query result. Empty default is 0, which means no limit.
    - Omit_keys -> Fields string to be omitted, sperate by comma. such as: keyName1, keyName2 .
    - Omit_patterns -> Fileds match these regular expression patterns will be omitted, sperate by comma. such as: ^keyNameHead, keyNameTail$ .
    - Embed-keys -> Whether to analyse embed-key (e.g. keyNameParent.keyNameChild1.keyNameChild2) or not.

注意，這裏的Query Document 輸入實爲字符串，程序會使用python eval 函數對其進行轉化爲python 對象，如：{"keyName": {"$in": ["key1", "key2"]}}, {"keyName": {"$exists": True}}。express

pyMonSchema分析的字段類型，對於Number 類型，會進一步推斷其爲Int32, 或Double類型（MongoDB 默認超過Int32的整數也爲Double類型）。

json

另外，對應分析的結果，還能夠保存的json文件，格式以下：數組

[
    {
        "key": "_id",
        "total_occurrence": 15.0,
        "statics": [
            {
                "type": "ObjectId",
                "occurrence": 15.0,
                "percent": 100.0
            }
        ]
    },
    {
        "key": "hello",
        "total_occurrence": 9.0,
        "statics": [
            {
                "type": "Int32",
                "occurrence": 1.0,
                "percent": 6.666666666666667
            },
            {
                "type": "String",
                "occurrence": 8.0,
                "percent": 53.333333333333336
            }
        ]
    },
    ...
 ]

對於多數據庫/集合的批量Schema 分析，pyMonSchema 的 mongoDBM.DBManager 類對此提供了充分的支持，可使用多進程、多線程來對其進行實現，參考https://blog.csdn.net/fzlulee/article/details/85944967 ，或 github 源碼https://github.com/HanseyLee/pyMonSchema。
【正文完】

注，以上內容同步自同名博客 https://blog.csdn.net/fzlulee/article/details/86651664 。

相關標籤/搜索

twist+pymongo+mongodb

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。