[python]Mongodb

文檔:javascript

http://api.mongodb.com/python/current/tutorial.htmlhtml

安裝:java

官網直接下載安裝, mac上brew安裝的下載太慢, 打算手動安裝node

使用:python

開啓服務:mongodb

1 mongod #默認配置開啓服務
2 mongod -- dpath <db path> # 指定數據庫文件路徑

鏈接服務:shell

1 mongo # 默認配置鏈接
2 mongo [options] [db address] [file names (ending in .js)]

圖形可視化程序:數據庫

https://www.robomongo.org/express

shell:api

 1 > help  2  db.help() help on db methods  3  db.mycoll.help() help on collection methods  4     sh.help() sharding helpers  5  rs.help() replica set helpers  6  help admin administrative help  7  help connect connecting to a db help  8  help keys key shortcuts  9  help misc misc things to know 10  help mr mapreduce 11 
12  show dbs show database names 13     show collections             show collections in current database 14     show users                   show users in current database 15     show profile                 show most recent system.profile entries with time >= 1ms 16  show logs show the accessible logger names 17     show log [name]              prints out the last segment of log in memory, 'global' is default 18     use <db_name> set current database 19     db.foo.find()                list objects in collection foo 20     db.foo.find( { a : 1 } )     list objects in foo where a == 1
21     it                           result of the last line evaluated; use to further iterate 22     DBQuery.shellBatchSize = x set default number of items to display on shell 23     exit                         quit the mongo shell

 more helps...

 1 > db.help()  2 DB methods:  3     db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [just calls db.runCommand(...)]  4     db.aggregate([pipeline], {options}) - performs a collectionless aggregation on this database; returns a cursor  5  db.auth(username, password)  6  db.cloneDatabase(fromhost)  7     db.commandHelp(name) returns the help for the command  8  db.copyDatabase(fromdb, todb, fromhost)  9  db.createCollection(name, {size: ..., capped: ..., max: ...}) 10  db.createView(name, viewOn, [{$operator: {...}}, ...], {viewOptions}) 11  db.createUser(userDocument) 12     db.currentOp() displays currently executing operations in the db 13  db.dropDatabase() 14     db.eval() - deprecated 15     db.fsyncLock() flush data to disk and lock server for backups 16  db.fsyncUnlock() unlocks server following a db.fsyncLock() 17     db.getCollection(cname) same as db['cname'] or db.cname 18     db.getCollectionInfos([filter]) - returns a list that contains the names and options of the db's collections
19  db.getCollectionNames() 20     db.getLastError() - just returns the err msg string
21     db.getLastErrorObj() - return full status object
22  db.getLogComponents() 23     db.getMongo() get the server connection object
24  db.getMongo().setSlaveOk() allow queries on a replication slave server 25  db.getName() 26  db.getPrevError() 27     db.getProfilingLevel() - deprecated 28     db.getProfilingStatus() - returns if profiling is on and slow threshold 29  db.getReplicationInfo() 30  db.getSiblingDB(name) get the db at the same server as this one 31     db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set 32     db.hostInfo() get details about the server's host
33  db.isMaster() check replica primary status 34     db.killOp(opid) kills the current operation in the db 35  db.listCommands() lists all the db commands 36     db.loadServerScripts() loads all the scripts in db.system.js 37  db.logout() 38  db.printCollectionStats() 39  db.printReplicationInfo() 40  db.printShardingStatus() 41  db.printSlaveReplicationInfo() 42  db.dropUser(username) 43  db.repairDatabase() 44  db.resetError() 45     db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into {cmdObj: 1} 46  db.serverStatus() 47     db.setLogLevel(level,<component>) 48     db.setProfilingLevel(level,slowms) 0=off 1=slow 2=all 49     db.setWriteConcern(<write concern doc>) - sets the write concern for writes to the db 50     db.unsetWriteConcern(<write concern doc>) - unsets the write concern for writes to the db 51     db.setVerboseShell(flag) display extra information in shell output 52  db.shutdownServer() 53  db.stats() 54  db.version() current version of the server 55 >
DB methods
 1 > db.mycoll.help()  2 DBCollection help  3     db.mycoll.find().help() - show DBCursor help  4     db.mycoll.bulkWrite( operations, <optional params> ) - bulk execute write operations, optional parameters are: w, wtimeout, j  5     db.mycoll.count( query = {}, <optional params> ) - count the number of documents that matches the query, optional parameters are: limit, skip, hint, maxTimeMS  6     db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.  7     db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command  8  db.mycoll.createIndex(keypattern[,options])  9     db.mycoll.createIndexes([keypatterns], <options>) 10  db.mycoll.dataSize() 11     db.mycoll.deleteOne( filter, <optional params> ) - delete first matching document, optional parameters are: w, wtimeout, j 12     db.mycoll.deleteMany( filter, <optional params> ) - delete all matching documents, optional parameters are: w, wtimeout, j 13     db.mycoll.distinct( key, query, <optional params> ) - e.g. db.mycoll.distinct( 'x' ), optional parameters are: maxTimeMS 14  db.mycoll.drop() drop the collection 15     db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" : 1 } ) 16  db.mycoll.dropIndexes() 17     db.mycoll.ensureIndex(keypattern[,options]) - DEPRECATED, use createIndex() instead 18     db.mycoll.explain().help() - show explain help 19  db.mycoll.reIndex() 20     db.mycoll.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return. 21                                                   e.g. db.mycoll.find( {x:77} , {name:1, x:1} ) 22     db.mycoll.find(...).count() 23     db.mycoll.find(...).limit(n) 24     db.mycoll.find(...).skip(n) 25     db.mycoll.find(...).sort(...) 26  db.mycoll.findOne([query], [fields], [options], [readConcern]) 27     db.mycoll.findOneAndDelete( filter, <optional params> ) - delete first matching document, optional parameters are: projection, sort, maxTimeMS 28     db.mycoll.findOneAndReplace( filter, replacement, <optional params> ) - replace first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument 29     db.mycoll.findOneAndUpdate( filter, update, <optional params> ) - update first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument 30     db.mycoll.getDB() get DB object associated with collection 31  db.mycoll.getPlanCache() get query plan cache associated with collection 32  db.mycoll.getIndexes() 33  db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } ) 34  db.mycoll.insert(obj) 35     db.mycoll.insertOne( obj, <optional params> ) - insert a document, optional parameters are: w, wtimeout, j 36     db.mycoll.insertMany( [objects], <optional params> ) - insert multiple documents, optional parameters are: w, wtimeout, j 37     db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> ) 38     db.mycoll.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor 39  db.mycoll.remove(query) 40     db.mycoll.replaceOne( filter, replacement, <optional params> ) - replace the first matching document, optional parameters are: upsert, w, wtimeout, j 41     db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection. 42     db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name 43  db.mycoll.save(obj) 44     db.mycoll.stats({scale: N, indexDetails: true/false, indexDetailsKey: <index key>, indexDetailsName: <index name>}) 45     db.mycoll.storageSize() - includes free space allocated to this collection 46     db.mycoll.totalIndexSize() - size in bytes of all the indexes 47     db.mycoll.totalSize() - storage allocated for all data and indexes 48     db.mycoll.update( query, object[, upsert_bool, multi_bool] ) - instead of two flags, you can pass an object with fields: upsert, multi 49     db.mycoll.updateOne( filter, update, <optional params> ) - update the first matching document, optional parameters are: upsert, w, wtimeout, j 50     db.mycoll.updateMany( filter, update, <optional params> ) - update all matching documents, optional parameters are: upsert, w, wtimeout, j 51     db.mycoll.validate( <full> ) - SLOW 52     db.mycoll.getShardVersion() - only for use with sharding 53     db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster 54     db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
55     db.mycoll.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set 56     db.mycoll.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection 57     db.mycoll.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection 58     db.mycoll.latencyStats() - display operation latency histograms for this collection 59 >
Collection methods
 1 > sh.help()  2     sh.addShard( host )                       server:port OR setname/server:port  3     sh.addShardToZone(shard,zone) adds the shard to the zone  4     sh.updateZoneKeyRange(fullName,min,max,zone) assigns the specified range of the given collection to a zone  5     sh.disableBalancing(coll) disable balancing on one collection  6     sh.enableBalancing(coll)                  re-enable balancing on one collection  7     sh.enableSharding(dbname) enables sharding on the database dbname  8     sh.getBalancerState() returns whether the balancer is enabled  9     sh.isBalancerRunning()                    return true if the balancer has work in progress on any mongos 10     sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to 'to' (name of shard) 11     sh.removeShardFromZone(shard,zone) removes the shard from zone 12     sh.removeRangeFromZone(fullName,min,max) removes the range of the given collection from any zone 13     sh.shardCollection(fullName,key,unique,options) shards the collection 14     sh.splitAt(fullName,middle)               splits the chunk that middle is in at middle 15     sh.splitFind(fullName,find)               splits the chunk that find is in at the median 16     sh.startBalancer() starts the balancer so chunks are balanced automatically 17     sh.status() prints a general overview of the cluster 18     sh.stopBalancer() stops the balancer so chunks are not balanced automatically 19     sh.disableAutoSplit() disable autoSplit on one collection 20     sh.enableAutoSplit()                    re-enable autoSplit on one collection 21     sh.getShouldAutoSplit() returns whether autosplit is enabled 22 >
sharding helpers
 1 > rs.help()  2     rs.status()                                { replSetGetStatus : 1 } checks repl set status  3     rs.initiate()                              { replSetInitiate : null } initiates set with default settings  4  rs.initiate(cfg) { replSetInitiate : cfg } initiates set with configuration cfg  5     rs.conf()                                  get the current configuration object from local.system.replset  6  rs.reconfig(cfg) updates the configuration of a running replica set with cfg (disconnects)  7  rs.add(hostportstr) add a new member to the set with default attributes (disconnects)  8  rs.add(membercfgobj) add a new member to the set with extra attributes (disconnects)  9     rs.addArb(hostportstr)                     add a new member which is arbiterOnly:true (disconnects) 10  rs.stepDown([stepdownSecs, catchUpSecs]) step down as primary (disconnects) 11     rs.syncFrom(hostportstr)                   make a secondary sync from the given member 12     rs.freeze(secs)                            make a node ineligible to become primary for the time specified 13  rs.remove(hostportstr) remove a host from the replica set (disconnects) 14  rs.slaveOk() allow queries on secondary nodes 15 
16     rs.printReplicationInfo()                  check oplog size and time range 17  rs.printSlaveReplicationInfo() check replica set members and replication lag 18     db.isMaster()                              check who is primary 19 
20  reconfiguration helpers disconnect from the database so the shell will display 21     an error, even if the command succeeds. 22 >
replica set helpers
 1 > help admin  2     ls([path]) list files  3     pwd() returns current directory  4     listFiles([path])               returns file list  5     hostname() returns name of this host  6     cat(fname)                      returns contents of text file as a string
 7     removeFile(f)                   delete a file or directory  8     load(jsfilename)                load and execute a .js file
 9     run(program[, args...])         spawn a program and wait for its completion 10  runProgram(program[, args...]) same as run(), above 11     sleep(m)                        sleep m milliseconds 12  getMemInfo() diagnostic 13 >
administrative help
 1 > help connect  2 
 3 Normally one specifies the server on the mongo shell command line.  Run mongo --help to see those options.  4 Additional connections may be opened:  5 
 6     var x = new Mongo('host[:port]');  7     var mydb = x.getDB('mydb');  8  or  9     var mydb = connect('host[:port]/mydb'); 10 
11 Note: the REPL prompt only auto-reports getLastError() for the shell command line connection. 12 
13 >
connect db help
 1 > help keys  2 Tab completion and command history is available at the command prompt.  3 
 4 Some emacs keystrokes are available too:  5   Ctrl-A start of line  6   Ctrl-E end of line  7   Ctrl-K del to end of line  8 
 9 Multi-line commands 10 You can enter a multi line javascript expression. If parens, braces, etc. are not closed, you will see a new line 11 beginning with '...' characters.  Type the rest of your expression.  Press Ctrl-C to abort the data entry if you 12 get stuck. 13 
14 >
shotcut keys
 1 > help misc  2     b = new BinData(subtype,base64str) create a BSON BinData value  3     b.subtype()                         the BinData subtype (0..255)  4     b.length()                          length of the BinData data in bytes  5     b.hex()                             the data as a hex encoded string
 6     b.base64()                          the data as a base 64 encoded string
 7  b.toString()  8 
 9     b = HexData(subtype,hexstr)         create a BSON BinData value from a hex string
10     b = UUID(hexstr) create a BSON BinData value of UUID subtype 11     b = MD5(hexstr) create a BSON BinData value of MD5 subtype 12     "hexstr"                            string, sequence of hex characters (no 0x prefix) 13 
14     o = new ObjectId() create a new ObjectId 15     o.getTimestamp()                    return timestamp derived from first 32 bits of the OID 16  o.isObjectId 17  o.toString() 18  o.equals(otherid) 19 
20     d = ISODate()                       like Date() but behaves more intuitively when used 21     d = ISODate('YYYY-MM-DD hh:mm:ss')    without an explicit "new " prefix on construction 22 >
misc
 1 > help mr  2 
 3 See also http://dochub.mongodb.org/core/mapreduce
 4 
 5 function mapf() {  6   // 'this' holds current document to inspect
 7  emit(key, value);  8 }  9 
10 function reducef(key,value_array) { 11  return reduced_value; 12 } 13 
14 db.mycollection.mapReduce(mapf, reducef[, options]) 15 
16 options 17 {[query : <query filter object>] 18  [, sort : <sort the query.  useful for optimization>] 19  [, limit : <number of objects to return from collection>] 20  [, out : <output-collection name>] 21  [, keeptemp: <true|false>] 22  [, finalize : <finalizefunction>] 23  [, scope : <object where fields go into javascript global scope >] 24  [, verbose : true]} 25 
26 >
mr

 

python驅動

 pip install pymongo 

scrapy:

settings.py

1 ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ] 2 
3 MONGODB_SERVER = "localhost"
4 MONGODB_PORT = 27017
5 MONGODB_DB = "stackoverflow"
6 MONGODB_COLLECTION = "questions"

piplines.py

 1 import pymongo  2 
 3 from scrapy.conf import settings  4 from scrapy.exceptions import DropItem  5 from scrapy import log  6 
 7 
 8 class MongoDBPipeline(object):  9 
10     def __init__(self): 11         connection = pymongo.MongoClient( 12             settings['MONGODB_SERVER'], 13             settings['MONGODB_PORT'] 14  ) 15         db = connection[settings['MONGODB_DB']] 16         self.collection = db[settings['MONGODB_COLLECTION']] 17 
18     def process_item(self, item, spider): 19         valid = True 20         for data in item: 21             if not data: 22                 valid = False 23                 raise DropItem("Missing {0}!".format(data)) 24         if valid: 25  self.collection.insert(dict(item)) 26             log.msg("Question added to MongoDB database!", 27                     level=log.DEBUG, spider=spider) 28         return item

scrapy 官方文檔 https://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb:

piplines.py

 1 import pymongo  2 
 3 class MongoPipeline(object):  4 
 5     collection_name = 'scrapy_items'
 6 
 7     def __init__(self, mongo_uri, mongo_db):  8         self.mongo_uri = mongo_uri  9         self.mongo_db = mongo_db 10 
11  @classmethod 12     def from_crawler(cls, crawler): 13         return cls( 14             mongo_uri=crawler.settings.get('MONGO_URI'), 15             mongo_db=crawler.settings.get('MONGO_DATABASE', 'items') 16  ) 17 
18     def open_spider(self, spider): 19         self.client = pymongo.MongoClient(self.mongo_uri) 20         self.db = self.client[self.mongo_db] 21 
22     def close_spider(self, spider): 23  self.client.close() 24 
25     def process_item(self, item, spider): 26  self.db[self.collection_name].insert_one(dict(item)) 27         return item
相關文章
相關標籤/搜索
本站公眾號
   歡迎關注本站公眾號,獲取更多信息