winafl 源碼分析

時間 2019-11-18

標籤 winafl 源碼分析简体版

原文原文鏈接

前言

winafl 是 afl 在 windows 的移植版， winafl 使用 dynamorio 來統計代碼覆蓋率，而且使用共享內存的方式讓 fuzzer 知道每一個測試樣本的覆蓋率信息。本文主要介紹 winafl 不一樣於 afl 的部分，對於 afl 的變異策略等部分沒有介紹，對於 afl 的分析能夠看git

https://paper.seebug.org/496/#arithmetic

源碼分析

winafl 主要分爲兩個部分 afl-fuzz.c 和 winafl.c ，前者是 fuzzer 的主程序，後面的是收集程序運行時信息的 dynamorio 插件的源碼。github

afl-fuzz

main

winafl 的入口時 afl-fuzz.c ，其中的 main 函數的主要代碼以下算法

int main(int argc, char** argv) {

  // 加載變異數據修正模塊
  setup_post();
  if (!in_bitmap) memset(virgin_bits, 255, MAP_SIZE); // MAP_SIZE --> 0x00010000
  setup_shm();  // 設置共享內存
  init_count_class16();
  
  setup_dirs_fds(); // 設置模糊測試過程當中的文件存放位置
  read_testcases();  // 讀取測試用例到隊列

  // 首先跑一遍全部的測試用例， 記錄信息到樣本隊列
  perform_dry_run(use_argv);

  // 模糊測試主循環
  while (1) {
    u8 skipped_fuzz;
    // 每次循環從樣本隊列裏面取測試用例
    cull_queue();

    // 對測試用例進行測試
    skipped_fuzz = fuzz_one(use_argv);

    queue_cur = queue_cur->next;
    current_entry++;
  }
}

首先設置一些 fuzz 過程當中須要的狀態值，好比共享內存、輸入輸出位置。
而後經過 perform_dry_run 把提供的全部測試用例讓目標程序跑一遍，同時統計執行過程當中的覆蓋率信息。
以後就開始進行模糊測試的循環，每次取樣本出來，而後交給 fuzz_one 對該樣本進行 fuzz .

post_handler

該函數裏面最重要的就是 fuzz_one 函數，該函數的做用是完成一個樣本的模糊測試，這裏面實現了 afl 中的模糊測試策略，使用這些測試策略生成一個樣本後，使用採用 common_fuzz_stuff 函數來讓目標程序執行測試用例。common_fuzz_stuff 的主要代碼以下windows

static u8 common_fuzz_stuff(char** argv, u8* out_buf, u32 len) {

  u8 fault;

  // 若是提供了數據修正函數，則調用
  if (post_handler) {

    out_buf = post_handler(out_buf, &len);
    if (!out_buf || !len) return 0;

  }

  write_to_testcase(out_buf, len);

  // 讓目標程序執行測試用例，並返回執行結果
  fault = run_target(argv, exec_tmout);

函數首先會判斷是否提供了 post_handler ，若是提供了 post_handler 就會使用提供的 post_handler 對變異獲得的測試數據進行處理, post_handler 函數指針在 setup_post 函數中設置。服務器

static void setup_post(void) {
    HMODULE dh;
    u8* fn = getenv("AFL_POST_LIBRARY"); // 經過環境變量獲取 post_handler  所在 dll 的路徑
    u32 tlen = 6;

    if (!fn) return;
    ACTF("Loading postprocessor from '%s'...", fn);
    dh = LoadLibraryA(fn);
    if (!dh) FATAL("%s", dlerror());
    post_handler = (u8* (*)(u8*,u32*))GetProcAddress(dh, "afl_postprocess"); // 加載dll 獲取函數地址
    if (!post_handler) FATAL("Symbol 'afl_postprocess' not found.");

    /* Do a quick test. It's better to segfault now than later =) */
    post_handler("hello", &tlen);
    OKF("Postprocessor installed successfully.");
}

該函數首先從 AFL_POST_LIBRARY 環境變量裏面拿到 post_handler 所在 dll 的路徑，而後設置 post_handler 爲 dll 裏面的 afl_postprocess 函數的地址。該函數在 fuzzer 運行的開頭會調用。 post_handler 的定義以下網絡

static u8* (*post_handler)(u8* buf, u32* len);
參數： buf 輸入內存地址，  len 輸入內存的長度
返回值： 指向修正後的內存的地址

因此 afl_postprocess 須要接收兩個參數，而後返回一個指向修正後的內存的地址。post_handler 這個機制用於對測試數據的格式作簡單的修正，好比計算校驗和，計算文件長度等。app

run_target

post_handler 這一步事後，會調用 write_to_testcase 先把測試用例寫入文件，默認狀況下測試用例會寫入 .cur_input (用戶可使用 -f 指定)ssh

out_file = alloc_printf("%s\\.cur_input", out_dir);

而後調用 run_target 讓目標程序處理測試用例，其主要代碼以下tcp

static u8 run_target(char** argv, u32 timeout) {

  // 若是進程還存活就不去建立新的進程
  if(!is_child_running()) {
    destroy_target_process(0);
    create_target_process(argv);  // 建立進程而且使用 dynamorio 監控
    fuzz_iterations_current = 0;
  }

  if (custom_dll_defined)
      process_test_case_into_dll(fuzz_iterations_current);

  child_timed_out = 0;
  memset(trace_bits, 0, MAP_SIZE);

  result = ReadCommandFromPipe(timeout);
  if (result == 'K')
  {
      //a workaround for first cycle in app persistent mode
      result = ReadCommandFromPipe(timeout);
  }

  // 當 winafl.dll 插樁準備好之後， 會經過命名管道發送 P 
  if (result != 'P')
  {
      FATAL("Unexpected result from pipe! expected 'P', instead received '%c'\n", result);
  }

  // 讓 winafl.dll 那端開始繼續執行
  WriteCommandToPipe('F');

  result = ReadCommandFromPipe(timeout); 
  // 接收到 K 就表示該用例運行正常
  if (result == 'K') return FAULT_NONE;

  if (result == 'C') {
      destroy_target_process(2000);
      return FAULT_CRASH;
  }

  destroy_target_process(0);
  return FAULT_TMOUT;
}

首先會去判斷目標進程是否還處於運行狀態，若是不處於運行狀態就新建目標進程，由於在 fuzz 過程當中爲了提高效率，會使用 dynamorio 來讓目標程序不斷的運行指定的函數，因此不須要每次 fuzz 都起一個新的進程。函數

而後若是須要使用用戶自定義的方式發送數據。就會使用 process_test_case_into_dll 發送測試用例，好比 fuzz 的目標是網絡應用程序。

static int process_test_case_into_dll(int fuzz_iterations)
{

  char *buf = get_test_case(&fsize);

  result = dll_run_ptr(buf, fsize, fuzz_iterations); /* caller should copy the buffer */

  free(buf);

  return 1;
}

這個 dll_run_ptr 在用戶經過 -l 提供了dll 的路徑後，winafl 會經過 load_custom_library 設置相關的函數指針

void load_custom_library(const char *libname)
{
  int result = 0;
  HMODULE hLib = LoadLibraryA(libname);
  dll_init_ptr = (dll_init)GetProcAddress(hLib, "_dll_init@0");
  
  dll_run_ptr = (dll_run)GetProcAddress(hLib, "_dll_run@12");
}

winafl 自身也提供了兩個示例分別是 tcp 服務和 tcp 客戶端。在 dll_run_ptr 中也能夠實現一些協議的加解密算法，這樣就能夠 fuzz 數據加密的協議了。

在一切準備好之後 winafl 往命名管道里面寫入 F ，通知 winafl.dll （winafl 中實現代碼覆蓋率獲取的dynamorio 插件）運行測試用例並記錄覆蓋率信息。 winafl.dll 執行完目標函數後會經過命名管道返回一些信息，若是返回 K 表示用例沒有觸發異常，若是返回 C 代表用例觸發了異常。

在 run_target 函數執行完畢以後， winafl 會對用例的覆蓋率信息進行評估，而後更新樣本隊列。

winafl.c

這個文件裏面包含了 winafl 實現的 dynamorio 插件，裏面實現覆蓋率蒐集以及一些模糊測試的效率提高機制。

dr_client_main

該文件的入口函數是 dr_client_main

DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{

    drmgr_init();
    drx_init();
    drreg_init(&ops);
    drwrap_init();
    
    options_init(id, argc, argv);

    dr_register_exit_event(event_exit);

    drmgr_register_exception_event(onexception);

    if(options.coverage_kind == COVERAGE_BB) {
        drmgr_register_bb_instrumentation_event(NULL, instrument_bb_coverage, NULL);
    } else if(options.coverage_kind == COVERAGE_EDGE) {
        drmgr_register_bb_instrumentation_event(NULL, instrument_edge_coverage, NULL);
    }

    drmgr_register_module_load_event(event_module_load);
    drmgr_register_module_unload_event(event_module_unload);
    dr_register_nudge_event(event_nudge, id);

    client_id = id;
    if (options.nudge_kills)
        drx_register_soft_kills(event_soft_kill);

    if(options.thread_coverage) {
        winafl_data.fake_afl_area = (unsigned char *)dr_global_alloc(MAP_SIZE);
    }

    if(!options.debug_mode) {
        setup_pipe();
        setup_shmem();
    } else {
        winafl_data.afl_area = (unsigned char *)dr_global_alloc(MAP_SIZE);
    }

    if(options.coverage_kind == COVERAGE_EDGE || options.thread_coverage || options.dr_persist_cache) {
        winafl_tls_field = drmgr_register_tls_field();
        if(winafl_tls_field == -1) {
            DR_ASSERT_MSG(false, "error reserving TLS field");
        }
        drmgr_register_thread_init_event(event_thread_init);
        drmgr_register_thread_exit_event(event_thread_exit);
    }

    event_init();
}

函數的主要邏輯以下

首先會初始化一些 dynamorio 的信息，而後根據用戶的參數來選擇是使用基本塊覆蓋率（instrument_bb_coverage）仍是使用邊覆蓋率(instrument_edge_coverage)。
而後再註冊一些事件的回調。
以後就是設置命名管道和共享內存以便和 afl-fuzz 進行通訊。

覆蓋率記錄

經過 drmgr_register_bb_instrumentation_event 咱們就能夠在每一個基本塊執行以前調用咱們設置回調函數。這時咱們就能夠統計覆蓋率信息了。具體的統計方式以下：

instrument_bb_coverage 的方式

// 計算基本塊的偏移而且取  MAP_SIZE 爲數， 以便放入覆蓋率表
offset = (uint)(start_pc - mod_entry->data->start);
offset &= MAP_SIZE - 1; // 把地址映射到 map中
afl_map[offset]++

instrument_edge_coverage 的方式

offset = (uint)(start_pc - mod_entry->data->start);
offset &= MAP_SIZE - 1; // 把地址映射到 map中
afl_map[pre_offset ^ offset]++
pre_offset = offset >> 1

afl_map 適合 afl-fuzz 共享的內存區域， afl-fuzz 和 winafl.dll 經過 afl_map 來傳遞覆蓋率信息。

效率提高方案

在 event_module_load會在每一個模塊被加載時調用，這個函會根據用戶的參數爲指定的目標函數設置一些回調函數，用來提高模糊測試的效率。主要代碼以下：

static void
event_module_load(void *drcontext, const module_data_t *info, bool loaded)
{

    if(options.fuzz_module[0]) {
        if(strcmp(module_name, options.fuzz_module) == 0) {
            if(options.fuzz_offset) {
                to_wrap = info->start + options.fuzz_offset;
            } else {
                //first try exported symbols
                to_wrap = (app_pc)dr_get_proc_address(info->handle, options.fuzz_method);
                if(!to_wrap) {

                    DR_ASSERT_MSG(to_wrap, "Can't find specified method in fuzz_module");                
                    to_wrap += (size_t)info->start;
                }
            }
            if (options.persistence_mode == native_mode)
            {
                drwrap_wrap_ex(to_wrap, pre_fuzz_handler, post_fuzz_handler, NULL, options.callconv);
            }
            if (options.persistence_mode == in_app)
            {
                drwrap_wrap_ex(to_wrap, pre_loop_start_handler, NULL, NULL, options.callconv);
            }
        }

    module_table_load(module_table, info);
}

在找到 target_module 中的 target_method 函數後，根據是否啓用 persistence 模式，採用不一樣的方式給 target_method 函數設置一些回調函數，默認狀況下是不啓用 persistence 模式， persistence 模式要求目標程序裏面有不斷接收數據的循環，好比一個 TCP 服務器，會循環的接收客戶端的請求和數據。下面分別分析兩種方式的源代碼。

不啓用 persistence

會調用

drwrap_wrap_ex(to_wrap, pre_fuzz_handler, post_fuzz_handler, NULL, options.callconv);

這個語句的做用是在目標函數 to_wrap 執行前調用 pre_fuzz_handler 函數，在目標函數執行後調用 post_fuzz_handler 函數。

下面具體分析

static void
pre_fuzz_handler(void *wrapcxt, INOUT void **user_data)
{
    char command = 0;
    int i;
    void *drcontext;

    app_pc target_to_fuzz = drwrap_get_func(wrapcxt);
    dr_mcontext_t *mc = drwrap_get_mcontext_ex(wrapcxt, DR_MC_ALL);
    drcontext = drwrap_get_drcontext(wrapcxt);

    // 保存目標函數的 棧指針 和 pc 指針， 以便在執行完程序後回到該狀態繼續運行
    fuzz_target.xsp = mc->xsp;
    fuzz_target.func_pc = target_to_fuzz;

    if(!options.debug_mode) {
        WriteCommandToPipe('P');
        command = ReadCommandFromPipe();

        // 等待 afl-fuzz 發送 F ， 收到 F 開始進行 fuzzing
        if(command != 'F') {
            if(command == 'Q') {
                dr_exit_process(0);
            } else {
                DR_ASSERT_MSG(false, "unrecognized command received over pipe");
            }
        }
    } else {
        debug_data.pre_hanlder_called++;
        dr_fprintf(winafl_data.log, "In pre_fuzz_handler\n");
    }

    //save or restore arguments， 第一次進入時保存參數， 之後都把保存的參數寫入
    if (!options.no_loop) {
        if (fuzz_target.iteration == 0) {
            for (i = 0; i < options.num_fuz_args; i++)
                options.func_args[i] = drwrap_get_arg(wrapcxt, i);
        } else {
            for (i = 0; i < options.num_fuz_args; i++)
                drwrap_set_arg(wrapcxt, i, options.func_args[i]);
        }
    }

    memset(winafl_data.afl_area, 0, MAP_SIZE);

    // 把 覆蓋率信息保存在 tls 裏面， 在統計邊覆蓋率時會用到
    if(options.coverage_kind == COVERAGE_EDGE || options.thread_coverage) {
        void **thread_data = (void **)drmgr_get_tls_field(drcontext, winafl_tls_field);
        thread_data[0] = 0;
        thread_data[1] = winafl_data.afl_area;
    }
}

首先保存一些上下文信息，好比寄存器信息，而後經過命名管道像 afl-fuzz 發送 P 表示這邊已經準備好了能夠執行用例，而後等待 afl-fuzz 發送 F 後，就繼續向下執行。
而後若是是第一次執行，就保存函數的參數，不然就把以前保存的參數設置好。
而後重置表示代碼覆蓋率的共享內存區域。

而後在 post_fuzz_handle 會根據執行的狀況向 afl-fuzz 返回執行信息，而後根據狀況判斷是否恢復以前保存的上下文信息，從新準備開始執行目標函數。經過這種方式能夠不用每次執行都新建一個進程，提高了 fuzz 的效率。

static void
post_fuzz_handler(void *wrapcxt, void *user_data)
{
    dr_mcontext_t *mc;
    mc = drwrap_get_mcontext(wrapcxt);

    if(!options.debug_mode) {
        WriteCommandToPipe('K');  // 程序正常執行後發送 K 給 fuzz
    } else {
        debug_data.post_handler_called++;
        dr_fprintf(winafl_data.log, "In post_fuzz_handler\n");
    }

    /* 
        We don't need to reload context in case of network-based fuzzing. 
        對於網絡型的 fuzz , 不須要reload.執行一次就好了，這裏直接返回
    */
    if (options.no_loop)
        return;

    fuzz_target.iteration++;
    if(fuzz_target.iteration == options.fuzz_iterations) {
        dr_exit_process(0);
    }

    // 恢復 棧指針 和 pc 到函數的開頭準備下次繼續運行
    mc->xsp = fuzz_target.xsp;
    mc->pc = fuzz_target.func_pc;
    drwrap_redirect_execution(wrapcxt);
}

啓用 persistence

在 fuzz 網絡應用程序時，應該使用該模式

-persistence_mode in_app

在這個模式下，對目標函數的包裝就沒有 pre_fuzz.... 和 post_fuzz..... 了，此時就是在每次運行到目標函數就清空覆蓋率，由於程序自身會不斷的調用目標函數。

/* 每次執行完就簡單的重置 aflmap， 這種模式適用於程序自身就有循環的狀況 */
static void
pre_loop_start_handler(void *wrapcxt, INOUT void **user_data)
{
    void *drcontext = drwrap_get_drcontext(wrapcxt);

    if (!options.debug_mode) {
        //let server know we finished a cycle, redundunt on first cycle.
        WriteCommandToPipe('K');

        if (fuzz_target.iteration == options.fuzz_iterations) {
            dr_exit_process(0);
        }
        fuzz_target.iteration++;

        //let server know we are starting a new cycle
        WriteCommandToPipe('P'); 

        //wait for server acknowledgement for cycle start
        char command = ReadCommandFromPipe(); 

        if (command != 'F') {
            if (command == 'Q') {
                dr_exit_process(0);
            }
            else {
                char errorMessage[] = "unrecognized command received over pipe: ";
                errorMessage[sizeof(errorMessage)-2] = command;
                DR_ASSERT_MSG(false, errorMessage);
            }
        }
    }
    else {
        debug_data.pre_hanlder_called++;
        dr_fprintf(winafl_data.log, "In pre_loop_start_handler\n");
    }

    memset(winafl_data.afl_area, 0, MAP_SIZE);

    if (options.coverage_kind == COVERAGE_EDGE || options.thread_coverage) {
        void **thread_data = (void **)drmgr_get_tls_field(drcontext, winafl_tls_field);
        thread_data[0] = 0;
        thread_data[1] = winafl_data.afl_area;
    }
}

總結

經過對 afl-fuzz.c 的分析，咱們知道 winafl 提供了兩種有意思的功能，即數據修正功能和自定義數據發送功能。這兩種功能能夠輔助咱們對一些很是規目標進行 fuzz, 好比網絡協議、數據加密應用。經過對 winafl.c 能夠清楚的知道如何使用 dynamorio 統計程序的覆蓋率，而且明白了 winafl 經過屢次在內存中執行目標函數來提高效率的方式，同時也清楚了在程序內部自帶循環調用函數時，可使用 persistence 模式來對目標進行 fuzz，好比一些網絡服務應用。