分析C++ Command-line參數

預注:命令行(commandline)被操做系統的命令分析器(/日後簡稱cmdlineparser)分解到命令參數argv[0]…[n],這裏,commandline是入料,argv是出品.html

Microsoft C/C++ 程序引導代碼使用如下規則解析操做系統命令行中給出的參數:ios

  • cmdlineparser用空白字符從commandline中分隔出argv;空白字符能夠是一個空格(0x20)或製表符(0x09);注意,空白字符不必定就分割了argv,由於空白字符多是argv的一部分
  • 相比0x20和0x09,字符^(0x5E) 未被識別爲轉義符或者分隔符;出品argv以前,commandline由cmdlineparser徹底處理
  • commandline中,雙引號括起來的字符串"string"被解釋爲單個參數,即便其中包含空格0x20,譬如"a string",解析爲a string; 帶引號的字符串能夠嵌入在參數內,譬如d"e f"g,將被cmdlineparser解析爲de fg
  • commandline中,前面有反斜槓(0x5C)的雙引號 (\") 被解釋爲argv中的雙引號字符 (")
  • 承4.,反斜槓在argv中按其原義解釋,除非它們緊位於雙引號以前
  • commandline中,若是偶數個反斜槓後跟一個雙引號,每對反斜槓將被cmdlineparser解析爲argv中的一個反斜槓;而緊跟後面的那個雙引號將被cmdlineparser看成分隔符,等價於commandline中的空白字符
  • commandline中,若是奇數個反斜槓後跟一個雙引號,每對反斜槓將被cmdlineparser解析爲argv中的一個反斜槓;剩下的反斜槓+雙引號按4.被轉義解釋爲雙引號

以上這段文字翻譯自http://msdn.microsoft.com/en-us/library/17w5ykft.aspx ,主要仍是本人理解的語義。原文以下算法

Microsoft C/C++ startup code uses the following rules when interpreting arguments given on the operating system command line:瀏覽器

  • Arguments are delimited by white space, which is either a space or a tab.
  • The caret character (^) is not recognized as an escape character or delimiter. The character is handled completely by the command-line parser in the operating system before being passed to the argv array in the program.
  • A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
  • A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (").
  • Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
  • If an even number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is interpreted as a string delimiter.
  • If an odd number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is "escaped" by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.

示例sass

下面的過程演示如何經過命令行參數:less

  
// command_line_arguments.cpp // compile with: /EHsc #include < iostream > using namespace std; int main( int argc, // Number of strings in array argv char * argv[], // Array of command-line argument strings char * envp[] ) // Array of environment variable strings { int count; // Display each command-line argument. cout << " \nCommand-line arguments:\n " ; for ( count = 0 ; count < argc; count ++ ) cout << " argv[ " << count << " ] " << argv[count] << " \n " ; }

下表顯示示例輸入,並預期的輸出,演示上面的規則列表
ide

命令行輸入       |   argv [1]  |   argv [2]   |   argv [3]
-----------------|-------------|--------------|---------------
"abc" d e        |   abc       |   d          |   e
a\\b d"e f"g h   |   a\\b      |   de fg      |   h
a\\\"b c d       |   a\"b      |   c          |   d
a\\\\"b c" d e   |   a\\b c    |   d          |   e
this

/////////////////////////////////////////////////spa

又:操作系統

有關連在一塊兒的多個雙引號的解析,很是狗血,請參考討論

尤爲是 http://www.daviddeley.com/autohotkey/parameters/parameters.htm 中的這個補充說明:

  • And here's the missing undocumented rule:
    If a closing " is followed immediately by another ", the 2nd " is accepted literally and added to the parameter.

及其算法:

5.10  The Microsoft C/C++ Command Line Parameter Parsing Algorithm
The following algorithm was reverse engineered by disassembling a small C program compiled using Microsoft Visual C++ and examining the disassembled code:

1. Parse off parameter 0 (the program filename)
    * The entire parameter may be enclosed in double quotes (it handles double quoted parts)
      (Double quotes are necessary if there are any spaces or tabs in the parameter)
    * There is no special processing of backslashes (\)

2. Parse off next parameter:
    a. Skip over multiple spaces/tabs between parameters
      LOOP
    b. Count the backslashes (\). Let m = number of backslashes. (m may be zero.)
    c. IF next character following m backslashes is a double quote:
           If m is even (or zero)
                if currently in a double quoted part
                   IF next character is also a "
                        move to next character (the 2nd ". This character will be added to the parameter.)
                   ELSE
                        set flag to not add this " character to the parameter
                   ENDIF
                    toggle double quoted part flag
               else
                    set flag to not add this " character to the parameter
               endif
           Endif
            m = m/2 (floor divide e.g. 0/2=0, 1/2=0, 2/2=1, 3/2=1, 4/2=2, 5/2=2, etc.)
       ENDIF
    d. add m backslashes
    e. add this character to our parameter
      ENDLOOP

相關文章
相關標籤/搜索