Go語言規範(詞法元素)

原文:http://golang.org/doc/go_spec.html
    翻譯:紅獵人 (zengsai@gmail.com)

Lexical elements 詞法元素[Top]

Comments 註釋

There are two forms of comments:html

  1. Line comments start with the character sequence // and continue through the next newline. A line comment acts like a newline.
  2. General comments start with the character sequence /* and continue through the character sequence */. A general comment that spans multiple lines acts like a newline, otherwise it acts like a space.

註釋有兩種形式:git

  1. 行註釋 從 // 開始直到行尾。行註釋的行爲就像一個換行符。
  2. 普通註釋 從 /* 開始直到 */。 若是普通註釋跨躍多行,它的行爲就像一個換行符,不然它的行爲就像一個空格。

Comments do not nest.golang

註釋不要嵌套。app

Tokens 標記

Tokens form the vocabulary of the Go language. There are four classes: identifierskeywordsoperators and delimiters, and literalsWhite space, formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and newlines (U+000A), is ignored except as it separates tokens that would otherwise combine into a single token. Also, a newline may trigger the insertion of a semicolon. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.ide

Go 語言的詞彙由標記組成. 分爲四類: 標識符關鍵字運算符和分隔符 和 直接常量。 由空格(U+0020)、水平製表符(U+0009)、回車符(U+000D)、換行符(U+000A)組成的 空白 除用於分隔標記的以外,都會被合併爲一個標記。同時,換行可能會觸發一個 分號 插入操做。在把輸入分解爲標記時, 下一個標記將會是能夠組成合法標記的最長字符序列。this

Semicolons 分號

The formal grammar uses semicolons ";" as terminators in a number of productions. Go programs may omit most of these semicolons using the following two rules:編碼

正式的語法用分號 ";" 做爲生產式的終結符。Go 程序可能使用如下兩個規則 來省略大多數分號。spa

  1. When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line's final token is.net

    • an identifier
    • an integer, floating-point, character, or string literal
    • one of the keywords breakcontinuefallthrough, or return
    • one of the operators and delimiters ++--)], or }
  2. To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ")" or "}".
  1. 在把輸入分解成標記的時候,若是不是空行而且該行的最後一個標記是如下狀況時,會在標記流的末尾自動插入一個分號。翻譯

    • 標識符
    • 整數、浮點數、字符或字符串直接常量
    • 如下關鍵字之一 breakcontinuefallthrough 或 return
    • 如下運算符或分隔符之一 ++--)] 或 }
  2. 爲了在一行中做複雜的聲明,在封閉的 ")" 或 "}" 以前的分號將被省略。

To reflect idiomatic use, code examples in this document elide semicolons using these rules.

爲了習慣這種用法,該文檔中全部示例中的分號都將依據以上規則省略。

Identifiers 標識符

Identifiers name program entities such as variables and types. An identifier is a sequence of one or more letters and digits. The first character in an identifier must be a letter.

標識符用來給程序中的實體(如變量和類型)命名。 一個標識符就是由一個或多個字母和數字組成的序列。 標識符的第一個字符必須是字母。

identifier = letter { letter | unicode_digit } .
a
_x9
ThisVariableIsExported
αβ

Some identifiers are predeclared.

有些標識符是 預約義 的。

Keywords 關鍵字

The following keywords are reserved and may not be used as identifiers.

如下關鍵字是被保留的,不能用做標識符。

break        default      func         interface    select
case         defer        go           map          struct
chan         else         goto         package      switch
const        fallthrough  if           range        type
continue     for          import     return       var

Operators and Delimiters 運算符和分隔符

The following character sequences represent operators, delimiters, and other special tokens:

下面的字符序列表明 運算符、分隔符和其它特殊標記:

+    &     +=    &=     &&    ==    !=    (    )
-    |     -=    |=     ||    <     <=    [    ]
*    ^     *=    ^=     <-    >     >=    {    }
/    <<    /=    <<=    ++    =     :=    ,    ;
%    >>    %=    >>=    --    !     ...   .    :
     &^          &^=

Integer literals 整型字面值

An integer literal is a sequence of digits representing an integer constant. An optional prefix sets a non-decimal base: 0 for octal, 0x or 0X for hexadecimal. In hexadecimal literals, letters a-f and A-F represent values 10 through 15.

整型字面值是一個表明 整型常量 的數字序列。 能夠添加一個前綴來表示非十進制基底的數: 0 表明八進制, 0x 或 0X 表明十六進制。在十六進制的字面值中,字母 a-f和 A-F 表明值 10 到 15。

int_lit     = decimal_lit | octal_lit | hex_lit .
decimal_lit = ( "1" ... "9" ) { decimal_digit } .
octal_lit   = "0" { octal_digit } .
hex_lit     = "0" ( "x" | "X" ) hex_digit { hex_digit } .
42
0600
0xBadFace
170141183460469231731687303715884105727

Floating-point literals 浮點數字面值

A floating-point literal is a decimal representation of a floating-point constant. It has an integer part, a decimal point, a fractional part, and an exponent part. The integer and fractional part comprise decimal digits; the exponent part is an e or E followed by an optionally signed decimal exponent. One of the integer part or the fractional part may be elided; one of the decimal point or the exponent may be elided.

浮點字面值是 浮點數常量 的十進制表示。 它有一個整數部分、一個分數部分、一個小數部分和一個指數部分。 整數和分數部分包括十進制數;指數部分是一個 e or E 後面可選的跟着一上二進制指數。 整數部分和分數部分能夠兩者舍其一;小數點和指數也能夠兩者舍其一。

float_lit = decimals "." [ decimals ] [ exponent ] |
            decimals exponent |
            "." decimals [ exponent ] .
decimals  = decimal_digit { decimal_digit } .
exponent  = ( "e" | "E" ) [ "+" | "-" ] decimals .
0.
72.40
072.40  // == 72.40
2.71828
1.e+0
6.67428e-11
1E6
.25
.12345E+5

Imaginary literals 虛數字面值

An imaginary literal is a decimal representation of the imaginary part of a complex constant. It consists of a floating-point literal or decimal integer followed by the lower-case letter i.

虛數字面值是 複數型常量 的虛數部分的十進制表示。 它由一個 浮點數字面值 或二進制整型後面跟一個小字字母 i 構成。

imaginary_lit = (decimals | float_lit) "i" .
0i
011i  // == 11i
0.i
2.71828i
1.e+0i
6.67428e-11i
1E6i
.25i
.12345E+5i

Character literals 字符字面值

A character literal represents an integer constant, typically a Unicode code point, as one or more characters enclosed in single quotes. Within the quotes, any character may appear except single quote and newline. A single quoted character represents itself, while multi-character sequences beginning with a backslash encode values in various formats.

一個字符表示一個 整型常量,一般是一個 Unicode 代碼點, 用一個或多個包圍在單引號中的字符來表示。引號中能夠包含除引號和換行以外的任何字符。 一個用單引號包圍起來的字符表明字符自己,而用單引號包圍起來的以反斜槓開頭的字符序列則 會根據其不一樣的格式表示不一樣的值。

The simplest form represents the single character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal 'a' holds a single byte representing a literal a, Unicode U+0061, value 0x61, while 'ä' holds two bytes (0xc30xa4) representing a literal a-dieresis, U+00E4, value 0xe4.

最簡單的形式就是表示單引號包圍的單一字符;因爲 Go 的源代碼文本是用 UTF-8 編碼的 Unicode 字符, 所以多個 UTF-8 編碼的字節能夠表示一個整型值。如,字面值'a' 用一個字節表示一個文字 a,Unicode U+0061, 值 0x61,而 'ä' 用兩個字節 (0xc3 0xa4) 表示一個文字 a-分音符, U+00E4, 值 0xe4.

Several backslash escapes allow arbitrary values to be represented as ASCII text. There are four ways to represent the integer value as a numeric constant: \xfollowed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

能夠使用多種反斜槓轉義格式把任意值表示爲 ASCII 文本。有四種方法把整型值表示爲數字常量: \x 後面跟兩個十六進制數字, \u 後面跟四個十六進制數字, \U 後面跟八個十六進制數字以及 \ 後面跟三個八進制數字。 以上形式的表示中,字面值表示的值就是數字在相應的數基中表明的值。

Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

儘管上面幾種表示都表明一個整數,可是它們表示的範圍不一樣。八進制的轉義序列只能表示 0 到 255 之間的數。十六進制轉義序列也知足這個條件。轉義符號 \u 和 \U 表示合法的 Unicode 代碼點的值,一般這個值小於 0x10FFFF

After a backslash, certain single-character escapes represent special values:

反斜槓以後,特定的單一字符表示特殊值:

\a   U+0007 alert or bell
\b   U+0008 backspace
\f   U+000C form feed
\n   U+000A line feed or newline
\r   U+000D carriage return
\t   U+0009 horizontal tab
\v   U+000b vertical tab
\\   U+005c backslash
\'   U+0027 single quote  (valid escape only within character literals)
\"   U+0022 double quote  (valid escape only within string literals)

All other sequences starting with a backslash are illegal inside character literals.

其它以反斜槓開頭的字面值在字符字面值中都是非法的。

char_lit         = "'" ( unicode_value | byte_value ) "'" .
unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
byte_value       = octal_byte_value | hex_byte_value .
octal_byte_value = `\` octal_digit octal_digit octal_digit .
hex_byte_value   = `\` "x" hex_digit hex_digit .
little_u_value   = `\` "u" hex_digit hex_digit hex_digit hex_digit .
big_u_value      = `\` "U" hex_digit hex_digit hex_digit hex_digit
                           hex_digit hex_digit hex_digit hex_digit .
escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
'a'
'ä'
'本'
'\t'
'\000'
'\007'
'\377'
'\x07'
'\xff'
'\u12e4'
'\U00101234'

String literals 字符串字面值

A string literal represents a string constant obtained from concatenating a sequence of characters. There are two forms: raw string literals and interpreted string literals.

字符串字面值表示由字符序列構成的 字符串常量。有兩種格式: 原始字符串字面值和解釋字符串字面值。

Raw string literals are character sequences between back quotes ``. Within the quotes, any character is legal except back quote. The value of a raw string literal is the string composed of the uninterpreted characters between the quotes; in particular, backslashes have no special meaning and the string may span multiple lines.

原始字符串是放在反引號 `` 之間的字符序列。 在反引號之間能夠放置除反引號自己以外的任意字符。 原始字符串字面值表示的值就是由反引號之間的字符組成的字符串。 特別是,原始字符串中反斜槓沒有特殊含義,能夠跨躍多行。

Interpreted string literals are character sequences between double quotes "". The text between the quotes, which may not span multiple lines, forms the value of the literal, with backslash escapes interpreted as they are in character literals (except that \' is illegal and \" is legal). The three-digit octal (\nnn) and two-digit hexadecimal (\xnn) escapes represent individual bytes of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individualcharacters. Thus inside a string literal \377 and \xFF represent a single byte of value 0xFF=255, while ÿ\u00FF\U000000FF and \xc3\xbf represent the two bytes0xc3 0xbf of the UTF-8 encoding of character U+00FF.

解釋字符串字面值就是雙引號 "" 之間的字符序列。引號之間的文本不能夠跨躍多行, 字面值的值就是被解釋事後的文本的值,反斜槓轉義字符會被當成字符來解釋(除開 \' 是非法的, 可是 \" 是合法的)。三位八進制數 (\nnn) 和兩位十六進制數 (\xnn) 分別表明它們對應的字符的 字節碼; 全部其它的轉義表明一個(多是多字節的)UTF-8 編碼的 字符。 所以在字符串字面值內部 \377 和 \xFF 就表明點一個字節的值 0xFF=255, 而 ÿ\u00FF\U000000FF 和 \xc3\xbf 表明佔兩個字節 0xc30xbf 的 UTF-8 編碼字符 U+00FF。

string_lit             = raw_string_lit | interpreted_string_lit .
raw_string_lit         = "`" { unicode_char } "`" .
interpreted_string_lit = `"` { unicode_value | byte_value } `"` .
`abc`  // same as "abc"
`\n
\n`    // same as "\\n\n\\n"
"\n"
""
"Hello, world!\n"
"日本語"
"\u65e5本\U00008a9e"
"\xff\u00FF"

These examples all represent the same string:

下面的示例都表明相同的字符串:

"日本語"                                 // UTF-8 input text
`日本語`                                 // UTF-8 input text as a raw literal
"\u65e5\u672c\u8a9e"                    // The explicit Unicode code points
"\U000065e5\U0000672c\U00008a9e"        // The explicit Unicode code points
"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"  // The explicit UTF-8 bytes

If the source code represents a character as two code points, such as a combining form involving an accent and a letter, the result will be an error if placed in a character literal (it is not a single code point), and will appear as two code points if placed in a string literal.

若是源代碼中用兩個代碼點來表示一個字符(好比用重音和字母組合成一個字符), 若是出如今字符字面值中會是一個錯誤,由於字符字面值中不能夠有兩個字符。 若是出如今字符串中,將表示兩個代碼點。

相關文章
相關標籤/搜索