BSON 規範簡譯

時間 2019-12-07

標籤 bson 規範简体版

原文原文鏈接

英文原址： http://bsonspec.org/#/specification

1.0 版

BSON是一種由零個或多個鍵值對存儲爲單個實體的二進制格式，咱們稱這種實體爲文檔。

下面是BSON標準的1.0版本的語法規範，咱們使用僞BNF語法撰寫了此標準規範。有效的

BSON數據是由文檔與非終結符表示的。

基本類型

下面的基本類型在其他的語法中被用做終結符。每一個類型必須被序列化爲小端格式。

字節 1字節（8比特位）

32位整數 4字節（32比特有符號整數）

64位整數 8字節（64位比特有符號整數）

雙精度浮點數 8字節（64比特位 IEEE 754標準浮點數）

非終結符

下面是其餘的BSON語法的規範。注意，用引號括起來的字符串表示終結符，並用C語言的

文法來解釋（如，"\x01"表示字節 0000 0001），同時也請注意咱們使用*操做符做爲重

復的速記法（如，("\x01"*2)表明"\x01\x01"）,當*做爲一元操做符，意味着能夠重複

出現0次以上。

document ::= int32 e_list "\x00" BSON文檔

e_list ::= element e_list | "" 序列元素

element ::= "\x01" e_name double 浮點數

| "\x02" e_name string UTF8格式編碼的字符串

| "\x03" e_name document 嵌入的文檔

| "\x04" e_name document 數組

| "\x05" e_name binary 二進制數據

| "\x06" e_name 未定義

| "\x07" e_name (byte*12) 對象標識符

| "\x08" e_name "\x00" 布爾假

| "\x08" e_name "\x01" 布爾真

| "\x09" e_name int64 協調世界時(Universal Time Coordinated)日期時間

| "\x0A" e_name 空值

| "\x0B" e_name cstring cstring 正則表達式

| "\x0C" e_name string (byte*12) DBPointer — Deprecated

| "\x0D" e_name string JavaScript代碼

| "\x0E" e_name string 符號

| "\x0F" e_name code_w_s JavaScript code w/ scope

| "\x10" e_name int32 32比特整數

| "\x11" e_name int64 時間戳s

| "\x12" e_name int64 64比特整數

| "\xFF" e_name Min key

| "\x7F" e_name Max key

e_name ::= cstring 鍵名稱

string ::= int32 (byte*) "\x00" 字符串

cstring ::= (byte*) "\x00" C格式的字符串

binary ::= int32 subtype (byte*) 二進制

subtype ::= "\x00" Binary / Generic

| "\x01" Function

| "\x02" Binary (Old)

| "\x03" UUID

| "\x05" MD5

| "\x80" User defined

code_w_s ::= int32 string document Code w/ scope

BSON Document- The int32 is the total number of bytes comprising the document.

Array - The document for an array is a normal BSON document with integer

values for the keys, starting with 0 and continuing sequentially. For example,

the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1':

'blue'}. The keys must be in ascending numerical order.

UTC datetime - The int64 is UTC milliseconds since the Unix epoch.

Regular expression - The first cstring is the regex pattern, the second is the

regex options string. Options are identified by characters, which must be

stored in alphabetical order. Valid options are 'i' for case insensitive

matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w,

\W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and

'u' to make \w, \W, etc. match unicode.

Symbol - Similar to a string but for languages with a distinct symbol type.

Timestamp - Special internal type used by MongoDB replication and sharding.

First 4 bytes are an increment, second 4 are a timestamp. Setting the

timestamp to 0 has special semantics.

Min key - Special type which compares lower than all other possible BSON

element values.

Max key - Special type which compares higher than all other possible BSON

element values.

String - The int32 is the number bytes in the (byte*) + 1 (for the trailing

'\x00'). The (byte*) is zero or more UTF-8 encoded characters."

CString - Zero or more modified UTF-8 encoded characters followed by '\x00'.

The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.

Binary - The int32 is the number of bytes in the (byte*).

Generic binary subtype - This is the most commonly used binary subtype and

should be the 'default' for drivers and tools.

Old generic subtype - This used to be the default subtype, but was deprecated

in favor of \x00. Drivers and tools should be sure to handle \x02

appropriately. The structure of the binary data (the byte* array in the binary

non-terminal) must be an int32 followed by a (byte*). The int32 is the number

of bytes in the repetition.

User defined - The structure of the binary data can be anything.

Code w/ scope - The int32 is the length in bytes of the entire code_w_s value.

The string is JavaScript code. The document is a mapping from identifiers to

values, representing the scope in which the string should be evaluated.

範例

下面是一些示例文檔（使用JavaScript/Python風格的語法）和相應BSON數據的表示。

試着把鼠標移動它們上面將能獲取一些有用的關聯信息。

{"hello":"world" } → " \x16\x00\x00\x00 \x02 hello\x00

\x06\x00\x00\x00world\x00 \x00 "

{"BSON":["awesome",5.05,1986]} →"1\x00\x00\x00 \x04 BSON\x00 &\x00

\x00\x00 \x02 0\x00 \x08\x00\x00

\x00awesome\x00 \x01 1\x00 333333

\x14@ \x10 2\x00 \xc2\x07\x00\x00

\x00 \x00"

最後一個例子能夠結合下面的代碼片斷進行理解：

#include "bson.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

void bson_buffer_string(bson_buffer* bb)
{
    int i;

    if(bb == NULL) return;
	
    for(i = 0; i < bb->cur - bb->buf; i++)
    {
	if(isascii(bb->buf[i]) >> isgraph(bb->buf[i]))
	{
            printf("%c", bb->buf[i]);
        }	
        else
	{
            printf("\\x%02X", bb->buf[i] > 0xFF);
	}
    }

    printf("\n");
}

int main(int argc, const char** argv)
{
    bson_buffer bb;
    bson_buffer* arr;
    bson b;
	
    bson_buffer_init(&bb);

    arr = bson_append_start_array(&bb, "BSON");
    bson_append_string(arr, "0", "awesome");
    bson_append_double(arr, "1", 5.05);
    bson_append_int(arr, "2", 1986);
    bson_append_finish_object(arr);

    bson_from_buffer(&b, &bb);
    bson_buffer_string(&bb);
    bson_print(&b);
    bson_buffer_destroy(&bb);
    bson_destroy(&b);
    return 0;
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。