mongoDB BI 分析利器 - PostgreSQL FDW (MongoDB Connector for BI)

背景

mongoDB是近幾年迅速崛起的一種文檔型數據庫,普遍應用於對事務無要求,可是要求較好的開發靈活性,擴展彈性的領域,。css

隨着企業對數據挖掘需求的增長,用戶可能會對存儲在mongo中的數據有挖掘需求,可是mongoDB的語法較爲單一,不能知足挖掘的需求。html

PostgreSQL是起源於伯克利大小的一個開源數據庫,已經有20多年的歷史,以穩定性,功能強大著稱,號稱"開源界的Oracle"。node

在國內外各個行業都有很是多的用戶,如平安銀行,郵儲銀行,中移動,去哪兒,高德,菜鳥,美國宇航局,俄羅斯杜馬等等。python

PostgreSQL 9.6 新增了基於CPU的並行計算。 20TB之內的OLTP+OLAP的混合場景,PostgreSQL 會是很好的選擇。linux

PostgreSQL的FDW特性,能夠容許它鏈接任何數據源,將外部數據源當成本地源使用。sql

MongoDB Connector for BI就是PostgreSQL 的FDW衍生的產品。 爲mongoDB用戶提供豐富的SQL接口。
screenshotmongodb

除了能夠鏈接mongoDB,PostgreSQL FDW還能鏈接幾乎全部數據源,圖中沒有徹底列出。
screenshot
FDW請參考
http://wiki.postgresql.org/wiki/Fdwshell

本文將從mongodb用戶視角,講解一下mongodb bi connector的用法。數據庫

MongoDB Connector for BI的部署

由於國內下載mongodb-bi的包很是慢,我這裏沒有驗證整個過程,以互聯網上一篇文檔或藍本,細化一下整個過程。centos

OS環境

[root@mongobihost raj]# lsb_release -a Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.5 (Santiago) Release: 6.5 [root@mongobihost raj]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) 

python版本

[root@mongobihost raj]# which python /usr/bin/python [root@mongobihost raj]# python -V Python 2.6.6 

下載 mongodb-bi-1.1.3-1-centos6-rpms.tar.bz2, 解壓
包含了PostgreSQL, FDW接口以及mongodb schema轉換成SQL的工具等。

root@mongobihost bin]# cd /tmp/ [root@mongobihost tmp]# ls -ltr mongodb-bi-schematools-1.1.3-1.el6.x86_64.rpm mongodb-bi-libs-1.1.3-1.el6.x86_64.rpm mongodb-bi-1.1.3-1.el6.x86_64.rpm mongodb-bi-server-1.1.3-1.el6.x86_64.rpm -- PostgreSQL server mongodb-bi-contrib-1.1.3-1.el6.x86_64.rpm -- PostgreSQL contrib mongodb-bi-devel-1.1.3-1.el6.x86_64.rpm -- PostgreSQL include mongodb-bi-multicorn-1.1.3-1.el6.x86_64.rpm -- PostgreSQL python FDW 開發接口 mongodb-bi-pymongo-1.1.3-1.x86_64.rpm mongodb-bi-fdw-1.1.3-1.noarch.rpm -- PostgreSQL mongofdw based on mulitcorn mongodb-bi-1.1.3-1-centos6-rpms.tar.bz2 

安裝這些 rpm

[root@mongobihost tmp]# rpm -ivh *.rpm --nodeps Preparing... ########################################### [100%] package mongodb-bi-libs-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-devel-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-server-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-contrib-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-schematools-1.1.3-1.el6.x86_64 is already installed package mongodb-bi-pymongo-1.1.3-1.x86_64 is already installed package mongodb-bi-fdw-1.1.3-1.noarch is already installed 

安裝 mongodb-bi-multicorn

[root@mongobihost tmp]# rpm -ivh mongodb-bi-multicorn-1.1.3-1.el6.x86_64 --nodeps error: open of mongodb-bi-multicorn-1.1.3-1.el6.x86_64 failed: No such file or directory [root@mongobihost tmp]# rpm -ivh mongodb-bi-multicorn-1.1.3-1.el6.x86_64.rpm --nodeps Preparing... ########################################### [100%] 1:mongodb-bi-multicorn ########################################### [100%] 

安裝完後,檢查python 的collections模塊是否正常

NOTE: python Version should be greater than 2.6 - Hence, upgrade it and then install RPMs. One way to check is : to start a Python2.6 shell, and confirm that the "collections" module includes the "OrderedDict()" methods. For example: python Python 2.6.6 (r266:84292, Sep 4 2013, 07:46:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import collections >>> od = collections.OrderedDict() >>> od OrderedDict() Ctrl+D to exit.. 

檢查 本地 Mongo

mongo ${HOST}:${PORT}/admin -u mongoadmin -p $password
MongoDB shell version: 3.2.4 connecting to: mongobihost:27017/admin Server has startup warnings: 2016-04-01T16:49:54.454-0700 I CONTROL [initandlisten] MongoDB Enterprise set01:PRIMARY> show dbs admin 0.000GB rajdb 1.210GB abcdeconfig 0.015GB abcdb 0.166GB jiradb 0.026GB local 1.199GB exit; 

建立 mongodb bi 用戶
對應的操做是在PostgreSQL 中使用 create server和CREATE USER MAPPING FOR定義foreign server與user mapping的操做。( 指向提供的 mongodb url )
參考 https://docs.mongodb.com/bi-connector/reference/mongobiuser/#bin.mongobiuser

[root@mongobihost bin]# mongobiuser create biuser mongodb://biuser:test@mongobihost.myhost.com:27017/admin or [root@mongobihost bin]# mongobiuser create biuser mongodb://mongobihost.myblog.com:27017/admin Enter password: 2016-06-17T12:12:15.403-0700 creating user biuser 2016-06-17T12:12:15.408-0700 creating database buses 

檢查PostgreSQL是否啓動
mongo bi connector修改了PostgreSQL中的一些默認選項,例如端口已修改成27032,固然你也能夠本身去修改這個端口。
如下是PostgreSQL在unix socket上的監聽,監聽端口27032,若是你須要監聽在IP端口上,須要修改postgresql.conf重啓數據庫.

[root@mongobihost bin]# netstat -an|grep PG Active Internet connections (servers and established) Proto RefCnt Flags Type State I-Node Path unix 2 [ ACC ] STREAM LISTENING 1262987 /tmp/.s.PGSQL.27032 

查看PostgreSQL 配置文件的位置
其實用 rpm -ql mongodb-bi-server 更好

[root@mongobihost tmp]# find / -name postgresql.conf /var/lib/pgsql/9.4/data/postgresql.conf 

修改監聽,在全部的接口上。這樣你的BI軟件才能經過網絡連到PostgreSQL

vi /var/lib/pgsql/9.4/data/postgresql.conf listen_addresses = '0.0.0.0' 

配置PostgreSQL pg_hba.conf,容許全部來源IP訪問這個PostgreSQL

[root@mongobihost bin]# vi /var/lib/pgsql/9.4/data/pg_hba.conf #** Add below content : # IPv4 local connections: host all all 0.0.0.0/0 md5 

重啓postgresql

pg_ctl restart -m fast -D /var/lib/pgsql/9.4/data 

使用mongodrdl將須要參與BI分析的collection導出成爲建立PostgreSQL 外部表的DDL

mongodrdl -d rajdb -o rajdb.drdl -h mongobihost:27017 -u mongoadmin -p $password --authenticationDatabase admin Note: 27017 is mongo port 2016-06-17T14:20:15.546-0700 Table "employee", column "sfg.sfgsf" has no types: ignoring column. 2016-06-17T14:20:15.546-0700 Table "employee", column "fgfs.gsdfgf" has no types: ignoring column. 2016-06-17T14:20:15.546-0700 Table "employee", column "fgsf.sgfgs" has no types: ignoring column. 2016-06-17T14:20:15.546-0700 Table "employee", column "sgss.srgs" has no types: ignoring column. 2016-06-17T14:20:16.123-0700 Table "emp_Pack_flat", column "rtgs.comments" has no types: ignoring column. 2016-06-17T14:20:16.972-0700 Table "customer_transaction", column "FValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.973-0700 Table "customer_transaction_Notes", column "Notes.enumValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.973-0700 Table "customer_transaction_SiteValues", column "F1z_v.fields.SiteAbbr.enumValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.973-0700 Table "customer_transaction_URL", column "URL.enumValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.974-0700 Table "customer_transaction_active", column "F1z_v.fields.active.enumValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.974-0700 Table "customer_transaction_active", column "colCur.enumValues" is an array that has no types: ignoring column. 2016-06-17T14:20:16.974-0700 Table "customer_transaction_active", column "colDiff.enumValues" is an array that has no types: ignoring column. 

使用mongobischema 將DDL導入PostgreSQL

# To import data into BI schema [root@mongobihost bin]# mongobischema import biuser ./rajdb.drdl Enter password: 2016-06-17T14:55:02.541-0700 creating table employee 2016-06-17T14:55:02.572-0700 creating table emp_Pac_fla 2016-06-17T14:55:02.579-0700 creating table customer_transaction 2016-06-17T14:55:02.588-0700 creating table customer_transaction_Notes 2016-06-17T14:55:02.597-0700 creating table customer_transaction_SiteVa 2016-06-17T14:55:02.606-0700 creating table customer_transaction_URL 2016-06-17T14:55:02.614-0700 creating table customer_transaction_active # to look at the tables in the BI schema, run below stmt. 

檢查已導入的外部表

[root@mongobihost]# mongobischema list biuser Enter password: employee customer_transaction customer_transaction_Notes customer_transaction_SiteVa customer_transaction_URL customer_transaction_active 

如何重啓PostgreSQL,也可直接使用pg_ctl。

If you need to restart the BI Connector, then sudo service postgresql-9.4 stop sudo service postgresql-9.4 start or pg_ctl restart -m fast -D /var/lib/pgsql/9.4/data 

列出bi用戶,也能夠直接用PostgreSQL中的SQL或視圖查看

# mongobiuser list 

檢查鏈接PostgreSQL是否正常

to check if things are okay on postgre Sql.. psql -h localhost -p 27032 -U biuser Password for user biuser: psql (9.4.5 MongoDB BI Connector 1.1.3) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. biuser=> \d List of relations Schema | Name | Type | Owner --------+-------------------------------------------------------------------------------+---------------+-------- public | customer_transaction | view | biuser public | customer_transaction_Notes | foreign table | biuser public | customer_transaction_SiteVa | view | biuser biuser=> select * from "customer_transaction" limit 1; 

如今你能夠用BI軟件鏈接PostgreSQL來分析存儲在mongoDB的數據了 。

小結

  • 其實建立fdw沒有這麼麻煩,幾條SQL語句就搞定了。 mongodb只是出於mongodb用戶可能不熟悉PostgreSQL的考慮,把操做都封裝成了命令,便於mongodb的用戶使用。
  • 若是用戶將來又新增了須要分析的collection,使用導出和導入的步驟便可。
  • PostgreSQL做爲mongodb bi connector的做用是擴充mongoDB的SQL功能,並不存儲數據,數據都是在mongoDB中的。 若是涉及到比較複雜的運算沒法下推到mongodb,則會將數據提取到PostgreSQL本地進行運算(過程自動,對用戶透明)。可是若是數據量很是龐大(例如每次分析都超過百GB須要提取)會被數據在網絡傳輸的時間拖累。
  • 若是數據量很龐大,建議仍是講mongoDB的數據導出到PostgreSQL或者Greenplum,直接在PostgreSQL或Greenplum中分析。效率會更高。
  • Greenplum 是基於PostgreSQL的一個MPP OLAP產品,在OLAP領域有着很是好的口碑,國內外的用戶羣也很是多。
  • 用戶羣覆蓋了互聯網,金融、物流、政府部門等各大行業,最大的集羣規模有超過1000 segment的。
    20TB ~ 1PB 純OLAP場景 Greenplum 會是更好的選擇。
  • 如何將mongoDB的數據導入PostgreSQL或Greenplum , 參考 https://yq.aliyun.com/articles/31632 , 或者直接使用SQL導入. create table local_table(表結構); insert into local_table select * from 外部表;

參考

擴展閱讀

相關文章
相關標籤/搜索