SHELL(bash)腳本編程七:源碼簡析

本文對bash的源碼(版本:4.2.46(1)-release)進行簡要分析。shell

數據結構

bash是用C語言寫成的,其源碼中只使用了少許的數據結構:數組單向鏈表雙向鏈表哈希表。幾乎全部的bash結構都是用這些基本結構實現的。segmentfault

源碼中最主要的結構都定義在根目錄下頭文件command.h中。數組

單詞

bash在不一樣階段傳輸信息並處理數據單元的數據結構是WORD_DESC緩存

typedef struct word_desc {
  char *word;       /* Zero terminated string. */
  int flags;        /* Flags associated with this word. */
} WORD_DESC;

WORD_DESC表示一個單詞,字符指針word指向一個以\0結尾的字符串,整型成員flags定義了該單詞的類型。
當前源碼中定義了二十多種單詞類型,如W_HASDOLLAR表示該單詞包含擴展字符$W_ASSIGNMENT表示該單詞是一個賦值語句,W_GLOBEXP表示該單詞是路徑擴展(通配符擴展)以後的結果等等。bash

單詞被組合爲簡單的鏈表WORD_LIST數據結構

typedef struct word_list {
  struct word_list *next;
  WORD_DESC *word;
} WORD_LIST;

WORD_LIST在shell中無處不在。一個簡單的命令就是一個單詞列表,展開結果一樣是一個單詞列表,內置命令的參數仍是一個單詞列表。app

重定向

結構REDIRECT描述了一條命令的重定向鏈表,包含指向下一個REDIRECT對象的next指針:async

typedef struct redirect {
  struct redirect *next;    /* Next element, or NULL. */
  REDIRECTEE redirector;    /* Descriptor or varname to be redirected. */
  int rflags;           /* Private flags for this redirection */
  int flags;            /* Flag value for `open'. */
  enum r_instruction  instruction; /* What to do with the information. */
  REDIRECTEE redirectee;    /* File descriptor or filename */
  char *here_doc_eof;       /* The word that appeared in <<foo. */
} REDIRECT;

整型成員flags定義了目標文件打開方式。
重定向描述符redirector的類型是一個聯合體REDIRECTEE函數

typedef union {
  int dest;         /* Place to redirect REDIRECTOR to, or ... */
  WORD_DESC *filename;      /* filename to redirect to. */
} REDIRECTEE;

instruction是枚舉型變量r_instruction,它定義了一個重定向的類型:oop

enum r_instruction {
  r_output_direction, r_input_direction, r_inputa_direction,
  r_appending_to, r_reading_until, r_reading_string,
  r_duplicating_input, r_duplicating_output, r_deblank_reading_until,
  r_close_this, r_err_and_out, r_input_output, r_output_force,
  r_duplicating_input_word, r_duplicating_output_word,
  r_move_input, r_move_output, r_move_input_word, r_move_output_word,
  r_append_err_and_out
};

REDIRECTEE中,若是重定向類型是ri_duplicating_input或者ri_duplicating_output則使用整型成員dest(若是其值爲負則表示錯誤的重定向),不然使用結構指針成員filename
REDIRECT結構中的字符指針成員here_doc_eof,指定了重定向類型爲Here Document(見這裏)。

命令

命令COMMAND結構描述一條bash命令,對於複合命令,其內部可能還包含有其餘命令:

typedef struct command {
  enum command_type type;   /* FOR CASE WHILE IF CONNECTION or SIMPLE. */
  int flags;            /* Flags controlling execution environment. */
  int line;         /* line number the command starts on */
  REDIRECT *redirects;      /* Special redirects for FOR CASE, etc. */
  union {
    struct for_com *For;
    struct case_com *Case;
    struct while_com *While;
    struct if_com *If;
    struct connection *Connection;
    struct simple_com *Simple;
    struct function_def *Function_def;
    struct group_com *Group;
#if defined (SELECT_COMMAND)
    struct select_com *Select;
#endif
#if defined (DPAREN_ARITHMETIC)
    struct arith_com *Arith;
#endif
#if defined (COND_COMMAND)
    struct cond_com *Cond;
#endif
#if defined (ARITH_FOR_COMMAND)
    struct arith_for_com *ArithFor;
#endif
    struct subshell_com *Subshell;
    struct coproc_com *Coproc;
  } value;
} COMMAND;

枚舉型成員type定義了命令類型:

/* Command Types: */
enum command_type { cm_for, cm_case, cm_while, cm_if, cm_simple, cm_select,
            cm_connection, cm_function_def, cm_until, cm_group,
            cm_arith, cm_cond, cm_arith_for, cm_subshell, cm_coproc };

整型成員flags定義了命令的執行環境,好比是否在子shell中執行,是否在後臺執行等等。
聯合成員value指明瞭命令值的結構指針,各個不一樣的命令對應於不一樣的結構體。
if命令結構:

/* IF command. */
typedef struct if_com {
  int flags;            /* See description of CMD flags. */
  COMMAND *test;        /* Thing to test. */
  COMMAND *true_case;       /* What to do if the test returned non-zero. */
  COMMAND *false_case;      /* What to do if the test returned zero. */
} IF_COM;

簡單命令simple結構:

typedef struct simple_com {
  int flags;            /* See description of CMD flags. */
  int line;         /* line number the command starts on */
  WORD_LIST *words;     /* The program name, the arguments,
                   variable assignments, etc. */
  REDIRECT *redirects;      /* Redirections to perform. */
} SIMPLE_COM;

while命令結構:

/* WHILE command. */
typedef struct while_com {
  int flags;            /* See description of CMD flags. */
  COMMAND *test;        /* Thing to test. */
  COMMAND *action;      /* Thing to do while test is non-zero. */
} WHILE_COM;

等等。

主要流程

如下所涉及文件如無特殊說明均處於bash源碼的根目錄下。
對於一行bash命令的執行流程分爲兩大步驟:解析執行(注意和上一篇中的解析和執行的區別)。
解析的做用是得到用於執行的命令結構體:COMMAND *global_command
執行主要是針對特定類型的命令進行執行和結果處理。

解析

bash的入口函數main()位於文件shell.c中:

int
main (argc, argv, env)
     int argc;
     char **argv, **env;
{
    ....
    shell_initialize ();
    ....
    run_startup_files ();
    ....
    shell_initialized = 1;

    /* Read commands until exit condition. */
    reader_loop ();
    exit_shell (last_command_exit_value);
}

函數定義了shell啓動和運行過程當中的一些狀態變量,依據不一樣的參數初始化shell:shell_initialize ()初始化了shell變量和參數,run_startup_files ()執行須要的配置文件(/etc/profile~/.bashrc等)。

初始化完成以後,進入eval.c中的交互循環函數reader_loop()。該函數不斷讀取和執行命令,直到遇到EOF。
此時函數調用關係爲:main()-->reader_loop()

/* Read and execute commands until EOF is reached.  This assumes that
   the input source has already been initialized. */
int
reader_loop ()
{
    ....
    if (read_command () == 0)
    {
      ....
    }
    else if (current_command = global_command)
    {
      ....
      execute_command (current_command);
    }
    ....
    return (last_command_exit_value);
}

reader_loop()函數中調用read_command()取得命令結構體global_command,而後賦值給current_command並交給execute_command ()去執行。

read_command ()調用parse_command (),此時函數調用關係爲:main()-->reader_loop()-->read_command()-->parse_command()

/* Read and parse a command, returning the status of the parse.  The command
   is left in the globval variable GLOBAL_COMMAND for use by reader_loop.
   This is where the shell timeout code is executed. */
int
read_command ()
{
    ....
    result = parse_command ();
    ....
    return (result);
}
....
/* Call the YACC-generated parser and return the status of the parse.
   Input is read from the current input stream (bash_input).  yyparse
   leaves the parsed command in the global variable GLOBAL_COMMAND.
   This is where PROMPT_COMMAND is executed. */
int
parse_command ()
{
    ....
    r = yyparse ();

    if (need_here_doc)
      gather_here_documents ();

    return (r);
}

parse_command()調用y.tab.c中的yyparse ()函數,並使用函數gather_here_documents ()處理here document類型的輸入重定向。

yyparse ()由YACC經過parse.y生成,函數內使用大量的goto語句,此文件可讀性較差:

int
yyparse ()
{
    ....
    yychar = YYLEX;
    ....
    yytoken = YYTRANSLATE (yychar);
    ....
    yyn += yytoken;
    ....
    switch (yyn)
    {
      case 2:
        {
        global_command = (yyvsp[(1) - (2)].command);
        ....
        }
        break;
      case 3:
        {
        global_command = (COMMAND *)NULL;
        ....
        }
        break;
      ....
      case 6:
        { (yyval.word_list) = make_word_list ((yyvsp[(1) - (1)].word), (WORD_LIST *)NULL); }
        break;
      ....
      case 8:
        {
        ....
        redir.filename = (yyvsp[(2) - (2)].word);
        (yyval.redirect) = make_redirection (source, r_output_direction, redir, 0);
        }
      ....
      case 57:
        { (yyval.command) = make_simple_command ((yyvsp[(1) - (1)].element), (COMMAND *)NULL); }
        break;
      ....
      case 107:
        { (yyval.command) = make_if_command ((yyvsp[(2) - (7)].command), (yyvsp[(4) - (7)].command), (yyvsp[(6) - (7)].command)); }
        break;
      ....
      default: break;
    }
    ....
    return YYID (yyresult);
}

函數內調用yylex()(宏定義:#define YYLEX yylex ())來得到並計算出整型變量yyn的值,而後根據不一樣的yyn值獲取具體的命令結構體。

在函數yylex()內部,調用read_token()得到各類類型的token並進一步調用read_token_word()獲取具體的不一樣類型的單詞結構WORD_DESC

以後在yyparse()中,調用文件make_cmd.c中各類函數,根據yylex()得到的各類tokenword組裝成具體command

其中,make_word_list()負責生成單詞鏈表WORD_LISTmake_redirection()負責生成重定向鏈表REDIRECTcommand_connect()根據一行語句中多個命令的邏輯順序生成關係;make_simple_command()負責生成簡單命令;以及一系列生成各類不一樣命令的其餘函數。

此時的函數調用關係爲:

main()-->reader_loop()-->read_command()-->parse_command()-->yyparse()-->yylex()-->read_token()-->read_token_word()
                              |                                 |                      |                |
                        current_command  <-------------- global_command <------------token------------word

執行

在函數reader_loop()中,調用完read_command()得到current_command後,將調用execute_cmd.c中的execute_command()來執行命令:

int
execute_command (command)
     COMMAND *command;
{
    ....
    result = execute_command_internal (command, 0, NO_PIPE, NO_PIPE, bitmap);
    ....
    return (result);
}

execute_command()調用execute_command_internal()函數:

int
execute_command_internal (command, asynchronous, pipe_in, pipe_out,fds_to_close)
    ....
{
    ....
    switch (command->type)
    {
        case cm_simple:
        {
          ....
          exec_result = execute_simple_command (command->value.Simple, pipe_in, pipe_out, asynchronous, fds_to_close);
          ....
        }
        break;
        case cm_for:
        ....
        exec_result = execute_for_command (command->value.For);
        break;
        ....
        case cm_cond:
        ....
        exec_result = execute_cond_command (command->value.Cond);
        ....
        break;
        ....
        default: command_error ("execute_command", CMDERR_BADTYPE, command->type, 0);
    }
    ....
    last_command_exit_value = exec_result;
    ....
    return (last_command_exit_value);
}

在函數execute_command_internal()中,根據參數command的類型command->type,分別調用不一樣的命令執行函數,並返回命令的退出碼。

此時函數的調用關係爲:main()-->reader_loop()-->execute_command()-->execute_command_internal()-->execute_xxxx_command()

這些命令執行函數除execute_arith_command()execute_cond_command()以外,都將遞歸地調用execute_command_internal()並最終執行execute_simple_command()

static int
execute_simple_command (simple_command, pipe_in, pipe_out, async, fds_to_close)
    ....
{
    ....
    if (dofork)
    {
      ....
      if (make_child (savestring (the_printed_command_except_trap), async) == 0)
      {
        ....
      }
      else
      {
        ....
        return (result);
      }
    }
    ....
    words = expand_words (simple_command->words);
    ....
    builtin = find_special_builtin (words->word->word);
    ....
    func = find_function (words->word->word);
    ....
run_builtin:
    ....
    if (func == 0 && builtin == 0)
      builtin = find_shell_builtin (this_command_name);
    ....
    if (builtin || func)
    {
      ....
      result = execute_builtin_or_function(words, builtin, func, simple_command->redirects, fds_to_close, simple_command->flags);
      ....
      goto return_result;
    }
    ....
    result = execute_disk_command (words, simple_command->redirects, command_line, pipe_in, pipe_out, async, fds_to_close, simple_command->flags);

return_result:
    ....
    return (result);
}

首先,對於須要在子shell中執行的命令(如管道中的命令),先調用job.c中的make_child(),而後進一步執行系統調用fork()execve()

若是並不須要在子shell中執行,則將簡單命令中的單詞進行擴展操做,調用的函數位於subst.c中,包括:expand_words()expand_word_list_internal()等等。

以後進行命令搜索,前後調用以下函數:搜索特殊內置命令find_special_builtin()(此版本的bash包含以下特殊內置命令:break continue : eval exec exit return set unset export readonly shift source . times trap),搜索函數find_function(),搜索內置命令find_shell_builtin()

若是搜索到結果則執行execute_builtin_or_function(),若是沒有搜索到則執行execute_disk_command()

static int
execute_disk_command (words, redirects, command_line, pipe_in, pipe_out, async, fds_to_close, cmdflags)
    ....
{
    ....
    result = EXECUTION_SUCCESS;
    ....
    command = search_for_command (pathname);
    ....
    pid = make_child (savestring (command_line), async);
    if (pid == 0)
    {
      ....
      if (command == 0)
      {
        ....
        internal_error (_("%s: command not found"), pathname);
        exit (EX_NOTFOUND);
        ....
      }
      ....
      exit (shell_execve (command, args, export_env));
    }
    else
    {
parent_return:
      close_pipes (pipe_in, pipe_out);
      ....
      FREE (command);
      return (result);
    }
}

execute_disk_command()首先調用findcmd.c中的search_for_command()進行命令搜索(注意區別函數execute_simple_command()中的命令搜索):

char *
search_for_command (pathname)
    const char *pathname;
{
    ....
    hashed_file = phash_search (pathname);
    ....
    if (hashed_file)
      command = hashed_file;
    else if (absolute_program (pathname))
      command = savestring (pathname);
    else
    {
      ....
      command = find_user_command (pathname);
      ....
    }
    return (command);
}

命令搜索首先在hash緩存中進行,若是命令名包含斜線/,則既不在PATH中搜索,也不在hash表中進行緩存,直接返回該命令。

若是hash緩存中未找到且不包含斜線,則調用find_user_command()find_user_command_internal()等函數繼續在PATH中尋找。

而後,execute_disk_command()調用job.c中的make_child()make_child()內部執行系統調用fork()並返回pid。在子進程中,execute_disk_command()判斷返回的命令command,若是未搜索到命令,則返回報錯並退出,若是找到命令,則調用shell_execve()並進一步執行系統調用execve()

int
shell_execve (command, args, env)
    ....
{
    ....
    execve (command, args, env);
    ....
    i = errno;          /* error from execve() */
    ....
    if (i != ENOEXEC)
    {
      if (file_isdir (command))
        ....
      else if (executable_file (command) == 0)
        ....
      else
        ....
    }
    ....
    return (execute_shell_script (sample, sample_len, command, args, env));
    ....
}

若是execve()失敗了,則判斷文件,若是文件不是目錄且有可執行權限,則把它當作腳本執行execute_shell_script()

至此,子進程退出,父進程關閉管道,釋放命令結構體,返回至函數execute_command_internal()並將結果result賦值給全局變量last_command_exit_value返回。

整個流程函數調用關係爲:

main()
        |
   reader_loop()       解析
        |--------------------------->read_command()-->parse_command()-->yyparse()-->yylex()-->read_token()-->read_token_word()
        |                                 |                               |                       |                 |
 execute_command() <-------------- current_command <--------------- global_command <------------token------------word
        |
execute_command_internal()
        |
 execute_xxxx_command()
        |
execute_simple_command()
        |
        |--->expand_words()-->expand_word_list_internal()
        |                                                                  子進程
        |------------------------------------->execute_disk_command()------------->shell_execve()-->execve()                
        |                  磁盤命令                       |                |                       |
        |函數及內置命令                              make_child()          |                       |FAILED
        |                                                |                |                       |
execute_builtin_or_function()                          fork()----------->pid                      ->execute_shell_script()
                                                                          |
                                                                          --------->return(result)
                                                                            父進程
相關文章
相關標籤/搜索