1.0 版
BSON是一種由零個或多個鍵值對存儲爲單個實體的二進制格式,咱們稱這種實體爲文檔。
下面是BSON標準的1.0版本的語法規範,咱們使用僞BNF語法撰寫了此標準規範。有效的
BSON數據是由文檔與非終結符表示的。
基本類型
下面的基本類型在其他的語法中被用做終結符。每一個類型必須被序列化爲小端格式。
字節 1字節(8比特位)
32位整數 4字節(32比特有符號整數)
64位整數 8字節(64位比特有符號整數)
雙精度浮點數 8字節(64比特位 IEEE 754標準 浮點數)
非終結符
下面是其餘的BSON語法的規範。注意,用引號括起來的字符串表示終結符,並用C語言的
文法來解釋(如,"\x01"表示字節 0000 0001),同時也請注意咱們使用*操做符做爲重
復的速記法(如,("\x01"*2)表明"\x01\x01"),當*做爲一元操做符,意味着能夠重複
出現0次以上。
document ::=
int32 e_list "\x00"
BSON文檔
e_list
::=
element e_list | ""
序列元素
element
::=
"\x01" e_name double
浮點數
|
"\x02" e_name string
UTF8格式編碼的字符串
|
"\x03" e_name document
嵌入的文檔
|
"\x04" e_name document
數組
|
"\x05" e_name binary
二進制數據
|
"\x06" e_name
未定義
|
"\x07" e_name (byte*12)
對象標識符
|
"\x08" e_name "\x00"
布爾假
|
"\x08" e_name "\x01"
布爾真
|
"\x09" e_name int64
協調世界時(Universal Time Coordinated)日期時間
|
"\x0A" e_name
空值
|
"\x0B" e_name cstring cstring
正則表達式
|
"\x0C" e_name string (byte*12)
DBPointer — Deprecated
|
"\x0D" e_name string
JavaScript代碼
|
"\x0E" e_name string
符號
|
"\x0F" e_name code_w_s
JavaScript code w/ scope
|
"\x10" e_name int32
32比特整數
|
"\x11" e_name int64
時間戳s
|
"\x12" e_name int64
64比特整數
|
"\xFF" e_name
Min key
|
"\x7F" e_name
Max key
e_name
::=
cstring
鍵名稱
string
::=
int32 (byte*) "\x00"
字符串
cstring
::=
(byte*) "\x00"
C格式的字符串
binary
::=
int32 subtype (byte*)
二進制
subtype
::=
"\x00"
Binary / Generic
|
"\x01"
Function
|
"\x02"
Binary (Old)
|
"\x03"
UUID
|
"\x05"
MD5
|
"\x80"
User defined
code_w_s
::=
int32 string document
Code w/ scope
BSON Document- The int32 is the total number of bytes comprising the document.
Array - The document for an array is a normal BSON document with integer
values for the keys, starting with 0 and continuing sequentially. For example,
the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1':
'blue'}. The keys must be in ascending numerical order.
UTC datetime - The int64 is UTC milliseconds since the Unix epoch.
Regular expression - The first cstring is the regex pattern, the second is the
regex options string. Options are identified by characters, which must be
stored in alphabetical order. Valid options are 'i' for case insensitive
matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w,
\W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and
'u' to make \w, \W, etc. match unicode.
Symbol - Similar to a string but for languages with a distinct symbol type.
Timestamp - Special internal type used by MongoDB replication and sharding.
First 4 bytes are an increment, second 4 are a timestamp. Setting the
timestamp to 0 has special semantics.
Min key - Special type which compares lower than all other possible BSON
element values.
Max key - Special type which compares higher than all other possible BSON
element values.
String - The int32 is the number bytes in the (byte*) + 1 (for the trailing
'\x00'). The (byte*) is zero or more UTF-8 encoded characters."
CString - Zero or more modified UTF-8 encoded characters followed by '\x00'.
The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
Binary - The int32 is the number of bytes in the (byte*).
Generic binary subtype - This is the most commonly used binary subtype and
should be the 'default' for drivers and tools.
Old generic subtype - This used to be the default subtype, but was deprecated
in favor of \x00. Drivers and tools should be sure to handle \x02
appropriately. The structure of the binary data (the byte* array in the binary
non-terminal) must be an int32 followed by a (byte*). The int32 is the number
of bytes in the repetition.
User defined - The structure of the binary data can be anything.
Code w/ scope - The int32 is the length in bytes of the entire code_w_s value.
The string is JavaScript code. The document is a mapping from identifiers to
values, representing the scope in which the string should be evaluated.
範例
下面是一些示例文檔(使用JavaScript/Python風格的語法)和相應BSON數據的表示。
試着把鼠標移動它們上面將能獲取一些有用的關聯信息。
{"hello":"world" }
→
" \x16\x00\x00\x00 \x02 hello\x00
\x06\x00\x00\x00world\x00 \x00 "
{"BSON":["awesome",5.05,1986]} →"1\x00\x00\x00 \x04 BSON\x00 &\x00
\x00\x00 \x02 0\x00 \x08\x00\x00
\x00awesome\x00 \x01 1\x00 333333
\x14@ \x10 2\x00 \xc2\x07\x00\x00
\x00 \x00"
最後一個例子能夠結合下面的代碼片斷進行理解:
#include "bson.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
void bson_buffer_string(bson_buffer* bb)
{
int i;
if(bb == NULL) return;
for(i = 0; i < bb->cur - bb->buf; i++)
{
if(isascii(bb->buf[i]) >> isgraph(bb->buf[i]))
{
printf("%c", bb->buf[i]);
}
else
{
printf("\\x%02X", bb->buf[i] > 0xFF);
}
}
printf("\n");
}
int main(int argc, const char** argv)
{
bson_buffer bb;
bson_buffer* arr;
bson b;
bson_buffer_init(&bb);
arr = bson_append_start_array(&bb, "BSON");
bson_append_string(arr, "0", "awesome");
bson_append_double(arr, "1", 5.05);
bson_append_int(arr, "2", 1986);
bson_append_finish_object(arr);
bson_from_buffer(&b, &bb);
bson_buffer_string(&bb);
bson_print(&b);
bson_buffer_destroy(&bb);
bson_destroy(&b);
return 0;
}