ps:以前在iteye發表過同樣的文章,如今工做換了,轉戰前端,基本告別了java和python,因而把那裏的博客遷移到這邊來了~~~前端
出於對性能的要求,公司但願把Mysql的數據遷移到MongoDB上,因而我開始學習Mongo的一些CRUD操做,因爲第一次接觸NoSQL,仍是有點不習慣。java
先吐個槽,公司的Mongo版本是2.6.4,而用的java驅動包版本是超級老物2.4版。當時一個「如何對分組後的文檔進行篩選」這個需求頭痛了好久,雖然shell命令下能夠使用Aggregation很方便地解決,可是java驅動包從2.9.0版本纔開始支持該特性,我至今沒有找到不用Aggregation解決上述需求的辦法。只能推薦公司升級驅動包版本,但願沒有後續的兼容問題。python
Mongo2.2版本後開始支持Aggregation Pipeline,而java驅動包從2.9.0版本纔開始支持2.2的特性,2.9版本是12年發佈的,mongodb在09年就出現了,彷佛Mongo對java的開發者不怎麼友好←_←sql
MongoDB目前提供了三個能夠執行聚合操做的命令:aggregate、mapReduce、group。三者在性能和操做的優劣比較見官網提供的表格 Aggregation Commands Comparison,這裏再也不贅述細節。mongodb
我從官網總結出來了這三個函數的原型及底層封裝的命令shell
函數名:db.collection.group()數據庫
函數原型:json
db.collection.group( { key, reduce, initial [, keyf] [, cond] [, finalize] } )
封裝的命令:數組
db.runCommand( { group: { ns: <namespace>, key: <key>, $reduce: <reduce function>, $keyf: <key function>, cond: <query>, finalize: <finalize function> } } )
函數名:db.collection.mapReduce()函數
函數原型:
db.collection.mapReduce( <map>, <reduce>, { out: <collection>, query: <document>, sort: <document>, limit: <number>, finalize: <function>, scope: <document>, jsMode: <boolean>, verbose: <boolean> } )
封裝的命令:
db.runCommand( { mapReduce: <collection>, map: <function>, reduce: <function>, finalize: <function>, out: <output>, query: <document>, sort: <document>, limit: <number>, scope: <document>, jsMode: <boolean>, verbose: <boolean> } )
函數名:db.collection.aggregate()
函數原型:
db.collection.aggregate( pipeline, options )
封裝的命令:
db.runCommand( { aggregate: "<collection>", pipeline: [ <stage>, <...> ], explain: <boolean>, allowDiskUse: <boolean>, cursor: <document> } )
好記性不如爛筆頭,下面經過操做來了解這幾個函數和命令
先準備SQL的測試數據(用來驗證結果、比較SQL語句和NoSQL的異同):
先建立數據庫表:
create table dogroup ( _id int, name varchar(45), course varchar(45), score int, gender int, primary key(_id) );
插入數據:
insert into dogroup (_id, name, course, score, gender) values (1, "N", "C", 5, 0); insert into dogroup (_id, name, course, score, gender) values (2, "N", "O", 4, 0); insert into dogroup (_id, name, course, score, gender) values (3, "A", "C", 5, 1); insert into dogroup (_id, name, course, score, gender) values (4, "A", "O", 6, 1); insert into dogroup (_id, name, course, score, gender) values (5, "A", "U", 8, 1); insert into dogroup (_id, name, course, score, gender) values (6, "A", "R", 8, 1); insert into dogroup (_id, name, course, score, gender) values (7, "A", "S", 7, 1); insert into dogroup (_id, name, course, score, gender) values (8, "M", "C", 4, 0); insert into dogroup (_id, name, course, score, gender) values (9, "M", "U", 7, 0); insert into dogroup (_id, name, course, score, gender) values (10, "E", "C", 7, 1);
接着準備MongoDB測試數據:
建立Collection(等同於SQL中的表,該行能夠不寫,Mongo會在插入數據時自動建立Collection)
db.createCollection("dogroup")
插入數據:
db.dogroup.insert({"_id": 1,"name": "N",course: "C","score": 5,gender: 0}) db.dogroup.insert({"_id": 2,"name": "N",course: "O","score": 4,gender: 0}) db.dogroup.insert({"_id": 3,"name": "A",course: "C","score": 5,gender: 1}) db.dogroup.insert({"_id": 4,"name": "A",course: "O","score": 6,gender: 1}) db.dogroup.insert({"_id": 5,"name": "A",course: "U","score": 8,gender: 1}) db.dogroup.insert({"_id": 6,"name": "A",course: "R","score": 8,gender: 1}) db.dogroup.insert({"_id": 7,"name": "A",course: "S","score": 7,gender: 1}) db.dogroup.insert({"_id": 8,"name": "M",course: "C","score": 4,gender: 0}) db.dogroup.insert({"_id": 9,"name": "M",course: "U","score": 7,gender: 0}) db.dogroup.insert({"_id": 10,"name": "E",course: "C","score": 7,gender: 1})
如下操做可能邏輯上沒有實際意義,主要是幫助熟悉指令
select course as '課程名', count(*) as '數量' from dogroup group by course;
db.dogroup.group({ key : { course: 1 }, initial : { count: 0 }, reduce : function Reduce(curr, result) { result.count += 1; }, finalize : function Finalize(out) { return {"課程名": out.course, "數量": out.count}; } });
返回的格式以下:
{ "課程名" : "C", "數量" : 4 }, { "課程名" : "O", "數量" : 2 }, { "課程名" : "U", "數量" : 2 }, { "課程名" : "R", "數量" : 1 }, { "課程名" : "S", "數量" : 1 }
db.dogroup.mapReduce( function () { emit( this.course, {course: this.course, count: 1} ); }, function (key, values) { var count = 0; values.forEach(function(val) { count += val.count; }); return {course: key, count: count}; }, { out: { inline : 1 }, finalize: function (key, reduced) { return {"課程名": reduced.course, "數量": reduced.count}; } } )
這裏把count初始化爲1的緣由是,MongoDB執行完map函數(第一個函數)後,若是key所對應的values數組的元素個數只有一個,reduce函數(第二個函數)將不會被調用。
返回的格式以下:
{ "_id" : "C", "value" : { "課程名" : "C", "數量" : 4 } }, { "_id" : "O", "value" : { "課程名" : "O", "數量" : 2 } }, { "_id" : "R", "value" : { "課程名" : "R", "數量" : 1 } }, { "_id" : "S", "value" : { "課程名" : "S", "數量" : 1 } }, { "_id" : "U", "value" : { "課程名" : "U", "數量" : 2 } }
db.dogroup.aggregate( { $group: { _id: "$course", "數量": { $sum: 1 } } } )
返回格式以下:
{ "_id" : "S", "數量" : 1 } { "_id" : "R", "數量" : 1 } { "_id" : "U", "數量" : 2 } { "_id" : "O", "數量" : 2 } { "_id" : "C", "數量" : 4 }
以上三種方式中,group獲得了咱們想要的結果,mapReduce返回的結果只能嵌套在values裏面,aggregate必須返回_id,沒法爲分組的字段指定別名,可是無疑第三種是最簡單的。
雖然上面的問題不影響程序在前臺展示數據,可是對於一個略微有強迫症的開發者確實難以忍受的。本人才疏學淺,剛接觸Mongo,不知道後二者有沒有可行的方法來獲取想要的結果,但願網友指教。
select course, count(*) as count from dogroup group by course having count > 2;
db.dogroup.aggregate({ $group: { _id: "$course", count: { $sum: 1 } } },{ $match: { count:{ $gt: 2 } } });
目前還沒有找到group和mapReduce對分組結果進行篩選的方法,歡迎網友補充
select score as '分數', count(distinct(name)) as '數量' from dogroup where score > 5 group by score;
db.dogroup.group({ key : { score: 1 }, cond : { score: {$gt: 5} }, initial : { name:[] }, reduce : function Reduce(curr, result) { var flag = true; for(i=0;i<result.name.length&&flag;i++){ if(curr.name==result.name[i]){ flag = false; } } // 若是result.name數組裏面沒有curr.name則添加curr.name if(flag){ result.name.push(curr.name); } }, finalize : function Finalize(out) { return {"分數": out.score, "數量": out.name.length}; } });
db.dogroup.mapReduce( function () { if(this.score > 5){ emit( this.score, {score: this.score, name: this.name} ); } }, function (key, values) { var reduced = {score: key, names: []}; var json = {};//利用json對象的key去重 for(i = 0; i < values.length; i++){ if(!json[values[i].name]){ reduced.names.push(values[i].name); json[values[i].name] = 1; } } return reduced; }, { out: { inline : 1 }, finalize: function (key, reduced) { return {"分數": reduced.score, "數量": reduced.names?reduced.names.length:1}; } } )
db.dogroup.aggregate({ $match: { score: { $gt: 5 } } },{ $group: { _id: { score: "$score", name: "$name" } } },{ $group: { _id: { "分數": "$_id.score" }, "數量": { $sum: 1 } } });
弄熟上面這幾個方法,大部分的分組應用場景應該沒大問題了。
這張圖示能夠更直觀地理解(點擊看大圖):