CoreCLR源碼探索(八) JIT的工做原理(詳解篇)

時間 2019-11-29

標籤 coreclr 源碼探索 jit 原理詳解简体版

原文原文鏈接

在上一篇咱們對CoreCLR中的JIT有了一個基礎的瞭解,
這一篇咱們將更詳細分析JIT的實現.html

JIT的實現代碼主要在https://github.com/dotnet/coreclr/tree/master/src/jit下,
要對一個的函數的JIT過程進行詳細分析, 最好的辦法是查看JitDump.
查看JitDump須要本身編譯一個Debug版本的CoreCLR, windows能夠看這裏, linux能夠看這裏,
編譯完之後定義環境變量COMPlus_JitDump=Main, Main能夠換成其餘函數的名稱, 而後使用該Debug版本的CoreCLR執行程序便可.前端

JitDump的例子能夠看這裏, 包含了Debug模式和Release模式的輸出.node

接下來咱們來結合代碼一步步的看JIT中的各個過程.
如下的代碼基於CoreCLR 1.1.0和x86/x64分析, 新版本可能會有變化.
(爲何是1.1.0? 由於JIT部分我看了半年時間, 開始看的時候2.0還未出來)linux

JIT的觸發

在上一篇中我提到了, 觸發JIT編譯會在第一次調用函數時, 會從樁(Stub)觸發:ios

這就是JIT Stub實際的樣子, 函數第一次調用前Fixup Precode的狀態:c++

Fixup Precode:

(lldb) di --frame --bytes
-> 0x7fff7c21f5a8: e8 2b 6c fe ff     callq  0x7fff7c2061d8
   0x7fff7c21f5ad: 5e                 popq   %rsi
   0x7fff7c21f5ae: 19 05 e8 23 6c fe  sbbl   %eax, -0x193dc18(%rip)
   0x7fff7c21f5b4: ff 5e a8           lcalll *-0x58(%rsi)
   0x7fff7c21f5b7: 04 e8              addb   $-0x18, %al
   0x7fff7c21f5b9: 1b 6c fe ff        sbbl   -0x1(%rsi,%rdi,8), %ebp
   0x7fff7c21f5bd: 5e                 popq   %rsi
   0x7fff7c21f5be: 00 03              addb   %al, (%rbx)
   0x7fff7c21f5c0: e8 13 6c fe ff     callq  0x7fff7c2061d8
   0x7fff7c21f5c5: 5e                 popq   %rsi
   0x7fff7c21f5c6: b0 02              movb   $0x2, %al
(lldb) di --frame --bytes 
-> 0x7fff7c2061d8: e9 13 3f 9d 79                 jmp    0x7ffff5bda0f0            ; PrecodeFixupThunk
   0x7fff7c2061dd: cc                             int3   
   0x7fff7c2061de: cc                             int3   
   0x7fff7c2061df: cc                             int3   
   0x7fff7c2061e0: 49 ba 00 da d0 7b ff 7f 00 00  movabsq $0x7fff7bd0da00, %r10
   0x7fff7c2061ea: 40 e9 e0 ff ff ff              jmp    0x7fff7c2061d0

這兩段代碼只有第一條指令是相關的, 注意callq後面的5e 19 05, 這些並非彙編指令而是函數的信息, 下面會提到.
接下來跳轉到Fixup Precode Chunk, 從這裏開始的代碼全部函數都會共用:git

Fixup Precode Chunk:

(lldb) di --frame --bytes
-> 0x7ffff5bda0f0 <PrecodeFixupThunk>: 58              popq   %rax                         ; rax = 0x7fff7c21f5ad
   0x7ffff5bda0f1 <PrecodeFixupThunk+1>: 4c 0f b6 50 02  movzbq 0x2(%rax), %r10            ; r10 = 0x05 (precode chunk index)
   0x7ffff5bda0f6 <PrecodeFixupThunk+6>: 4c 0f b6 58 01  movzbq 0x1(%rax), %r11            ; r11 = 0x19 (methoddesc chunk index)
   0x7ffff5bda0fb <PrecodeFixupThunk+11>: 4a 8b 44 d0 03  movq   0x3(%rax,%r10,8), %rax    ; rax = 0x7fff7bdd5040 (methoddesc chunk)
   0x7ffff5bda100 <PrecodeFixupThunk+16>: 4e 8d 14 d8     leaq   (%rax,%r11,8), %r10       ; r10 = 0x7fff7bdd5108 (methoddesc)
   0x7ffff5bda104 <PrecodeFixupThunk+20>: e9 37 ff ff ff  jmp    0x7ffff5bda040            ; ThePreStub

這段代碼的源代碼在vm\amd64\unixasmhelpers.S:github

LEAF_ENTRY PrecodeFixupThunk, _TEXT

        pop     rax         // Pop the return address. It points right after the call instruction in the precode.

        // Inline computation done by FixupPrecode::GetMethodDesc()
        movzx   r10,byte ptr [rax+2]    // m_PrecodeChunkIndex
        movzx   r11,byte ptr [rax+1]    // m_MethodDescChunkIndex
        mov     rax,qword ptr [rax+r10*8+3]
        lea     METHODDESC_REGISTER,[rax+r11*8]

        // Tail call to prestub
        jmp C_FUNC(ThePreStub)

LEAF_END PrecodeFixupThunk, _TEXT

popq %rax後rax會指向剛纔callq後面的地址, 再根據後面儲存的索引值能夠獲得編譯函數的MethodDesc, 接下來跳轉到The PreStub:web

ThePreStub:

(lldb) di --frame --bytes
-> 0x7ffff5bda040 <ThePreStub>: 55                       pushq  %rbp
   0x7ffff5bda041 <ThePreStub+1>: 48 89 e5                 movq   %rsp, %rbp
   0x7ffff5bda044 <ThePreStub+4>: 53                       pushq  %rbx
   0x7ffff5bda045 <ThePreStub+5>: 41 57                    pushq  %r15
   0x7ffff5bda047 <ThePreStub+7>: 41 56                    pushq  %r14
   0x7ffff5bda049 <ThePreStub+9>: 41 55                    pushq  %r13
   0x7ffff5bda04b <ThePreStub+11>: 41 54                    pushq  %r12
   0x7ffff5bda04d <ThePreStub+13>: 41 51                    pushq  %r9
   0x7ffff5bda04f <ThePreStub+15>: 41 50                    pushq  %r8
   0x7ffff5bda051 <ThePreStub+17>: 51                       pushq  %rcx
   0x7ffff5bda052 <ThePreStub+18>: 52                       pushq  %rdx
   0x7ffff5bda053 <ThePreStub+19>: 56                       pushq  %rsi
   0x7ffff5bda054 <ThePreStub+20>: 57                       pushq  %rdi
   0x7ffff5bda055 <ThePreStub+21>: 48 8d a4 24 78 ff ff ff  leaq   -0x88(%rsp), %rsp         ; allocate transition block
   0x7ffff5bda05d <ThePreStub+29>: 66 0f 7f 04 24           movdqa %xmm0, (%rsp)             ; fill transition block
   0x7ffff5bda062 <ThePreStub+34>: 66 0f 7f 4c 24 10        movdqa %xmm1, 0x10(%rsp)         ; fill transition block
   0x7ffff5bda068 <ThePreStub+40>: 66 0f 7f 54 24 20        movdqa %xmm2, 0x20(%rsp)         ; fill transition block
   0x7ffff5bda06e <ThePreStub+46>: 66 0f 7f 5c 24 30        movdqa %xmm3, 0x30(%rsp)         ; fill transition block
   0x7ffff5bda074 <ThePreStub+52>: 66 0f 7f 64 24 40        movdqa %xmm4, 0x40(%rsp)         ; fill transition block
   0x7ffff5bda07a <ThePreStub+58>: 66 0f 7f 6c 24 50        movdqa %xmm5, 0x50(%rsp)         ; fill transition block
   0x7ffff5bda080 <ThePreStub+64>: 66 0f 7f 74 24 60        movdqa %xmm6, 0x60(%rsp)         ; fill transition block
   0x7ffff5bda086 <ThePreStub+70>: 66 0f 7f 7c 24 70        movdqa %xmm7, 0x70(%rsp)         ; fill transition block
   0x7ffff5bda08c <ThePreStub+76>: 48 8d bc 24 88 00 00 00  leaq   0x88(%rsp), %rdi          ; arg 1 = transition block*
   0x7ffff5bda094 <ThePreStub+84>: 4c 89 d6                 movq   %r10, %rsi                ; arg 2 = methoddesc
   0x7ffff5bda097 <ThePreStub+87>: e8 44 7e 11 00           callq  0x7ffff5cf1ee0            ; PreStubWorker at prestub.cpp:958
   0x7ffff5bda09c <ThePreStub+92>: 66 0f 6f 04 24           movdqa (%rsp), %xmm0
   0x7ffff5bda0a1 <ThePreStub+97>: 66 0f 6f 4c 24 10        movdqa 0x10(%rsp), %xmm1
   0x7ffff5bda0a7 <ThePreStub+103>: 66 0f 6f 54 24 20        movdqa 0x20(%rsp), %xmm2
   0x7ffff5bda0ad <ThePreStub+109>: 66 0f 6f 5c 24 30        movdqa 0x30(%rsp), %xmm3
   0x7ffff5bda0b3 <ThePreStub+115>: 66 0f 6f 64 24 40        movdqa 0x40(%rsp), %xmm4
   0x7ffff5bda0b9 <ThePreStub+121>: 66 0f 6f 6c 24 50        movdqa 0x50(%rsp), %xmm5
   0x7ffff5bda0bf <ThePreStub+127>: 66 0f 6f 74 24 60        movdqa 0x60(%rsp), %xmm6
   0x7ffff5bda0c5 <ThePreStub+133>: 66 0f 6f 7c 24 70        movdqa 0x70(%rsp), %xmm7
   0x7ffff5bda0cb <ThePreStub+139>: 48 8d a4 24 88 00 00 00  leaq   0x88(%rsp), %rsp
   0x7ffff5bda0d3 <ThePreStub+147>: 5f                       popq   %rdi
   0x7ffff5bda0d4 <ThePreStub+148>: 5e                       popq   %rsi
   0x7ffff5bda0d5 <ThePreStub+149>: 5a                       popq   %rdx
   0x7ffff5bda0d6 <ThePreStub+150>: 59                       popq   %rcx
   0x7ffff5bda0d7 <ThePreStub+151>: 41 58                    popq   %r8
   0x7ffff5bda0d9 <ThePreStub+153>: 41 59                    popq   %r9
   0x7ffff5bda0db <ThePreStub+155>: 41 5c                    popq   %r12
   0x7ffff5bda0dd <ThePreStub+157>: 41 5d                    popq   %r13
   0x7ffff5bda0df <ThePreStub+159>: 41 5e                    popq   %r14
   0x7ffff5bda0e1 <ThePreStub+161>: 41 5f                    popq   %r15
   0x7ffff5bda0e3 <ThePreStub+163>: 5b                       popq   %rbx
   0x7ffff5bda0e4 <ThePreStub+164>: 5d                       popq   %rbp
   0x7ffff5bda0e5 <ThePreStub+165>: 48 ff e0                 jmpq   *%rax
   %rax should be patched fixup precode = 0x7fff7c21f5a8
   (%rsp) should be the return address before calling "Fixup Precode"

看上去至關長但作的事情很簡單, 它的源代碼在vm\amd64\theprestubamd64.S:算法

NESTED_ENTRY ThePreStub, _TEXT, NoHandler
        PROLOG_WITH_TRANSITION_BLOCK 0, 0, 0, 0, 0

        //
        // call PreStubWorker
        //
        lea             rdi, [rsp + __PWTB_TransitionBlock]     // pTransitionBlock*
        mov             rsi, METHODDESC_REGISTER
        call            C_FUNC(PreStubWorker)

        EPILOG_WITH_TRANSITION_BLOCK_TAILCALL
        TAILJMP_RAX

NESTED_END ThePreStub, _TEXT

它會備份寄存器到棧, 而後調用PreStubWorker這個函數, 調用完畢之後恢復棧上的寄存器,
再跳轉到PreStubWorker的返回結果, 也就是打完補丁後的Fixup Precode的地址(0x7fff7c21f5a8).

PreStubWorker是C編寫的函數, 它會調用JIT的編譯函數, 而後對Fixup Precode打補丁.
打補丁時會讀取前面的5e, 5e表明precode的類型是PRECODE_FIXUP, 打補丁的函數是FixupPrecode::SetTargetInterlocked.
打完補丁之後的Fixup Precode以下:

Fixup Precode:

(lldb) di --bytes -s 0x7fff7c21f5a8
   0x7fff7c21f5a8: e9 a3 87 3a 00     jmp    0x7fff7c5c7d50
   0x7fff7c21f5ad: 5f                 popq   %rdi
   0x7fff7c21f5ae: 19 05 e8 23 6c fe  sbbl   %eax, -0x193dc18(%rip)
   0x7fff7c21f5b4: ff 5e a8           lcalll *-0x58(%rsi)
   0x7fff7c21f5b7: 04 e8              addb   $-0x18, %al
   0x7fff7c21f5b9: 1b 6c fe ff        sbbl   -0x1(%rsi,%rdi,8), %ebp
   0x7fff7c21f5bd: 5e                 popq   %rsi
   0x7fff7c21f5be: 00 03              addb   %al, (%rbx)
   0x7fff7c21f5c0: e8 13 6c fe ff     callq  0x7fff7c2061d8
   0x7fff7c21f5c5: 5e                 popq   %rsi
   0x7fff7c21f5c6: b0 02              movb   $0x2, %al

下次再調用函數時就能夠直接jmp到編譯結果了.
JIT Stub的實現可讓運行時只編譯實際會運行的函數, 這樣能夠大幅減小程序的啓動時間, 第二次調用時的消耗(1個jmp)也很是的小.

注意調用虛方法時的流程跟上面的流程有一點不一樣, 虛方法的地址會保存在函數表中,
打補丁時會對函數表而不是Precode打補丁, 下次調用時函數表中指向的地址是編譯後的地址, 有興趣能夠本身試試分析.

接下來咱們看看PreStubWorker的內部處理.

JIT的入口點

PreStubWorker的源代碼以下:

extern "C" PCODE STDCALL PreStubWorker(TransitionBlock * pTransitionBlock, MethodDesc * pMD)
{
    PCODE pbRetVal = NULL;

    BEGIN_PRESERVE_LAST_ERROR;

    STATIC_CONTRACT_THROWS;
    STATIC_CONTRACT_GC_TRIGGERS;
    STATIC_CONTRACT_MODE_COOPERATIVE;
    STATIC_CONTRACT_ENTRY_POINT;

    MAKE_CURRENT_THREAD_AVAILABLE();

#ifdef _DEBUG
    Thread::ObjectRefFlush(CURRENT_THREAD);
#endif

    FrameWithCookie<PrestubMethodFrame> frame(pTransitionBlock, pMD);
    PrestubMethodFrame * pPFrame = &frame;

    pPFrame->Push(CURRENT_THREAD);

    INSTALL_MANAGED_EXCEPTION_DISPATCHER;
    INSTALL_UNWIND_AND_CONTINUE_HANDLER;

    ETWOnStartup (PrestubWorker_V1,PrestubWorkerEnd_V1);

    _ASSERTE(!NingenEnabled() && "You cannot invoke managed code inside the ngen compilation process.");

    // Running the PreStubWorker on a method causes us to access its MethodTable
    g_IBCLogger.LogMethodDescAccess(pMD);

    // Make sure the method table is restored, and method instantiation if present
    pMD->CheckRestore();

    CONSISTENCY_CHECK(GetAppDomain()->CheckCanExecuteManagedCode(pMD));

    // Note this is redundant with the above check but we do it anyway for safety
    //
    // This has been disabled so we have a better chance of catching these.  Note that this check is
    // NOT sufficient for domain neutral and ngen cases.
    //
    // pMD->EnsureActive();

    MethodTable *pDispatchingMT = NULL;

    if (pMD->IsVtableMethod())
    {
        OBJECTREF curobj = pPFrame->GetThis();

        if (curobj != NULL) // Check for virtual function called non-virtually on a NULL object
        {
            pDispatchingMT = curobj->GetTrueMethodTable();

#ifdef FEATURE_ICASTABLE
            if (pDispatchingMT->IsICastable())
            {
                MethodTable *pMDMT = pMD->GetMethodTable();
                TypeHandle objectType(pDispatchingMT);
                TypeHandle methodType(pMDMT);

                GCStress<cfg_any>::MaybeTrigger();
                INDEBUG(curobj = NULL); // curobj is unprotected and CanCastTo() can trigger GC
                if (!objectType.CanCastTo(methodType)) 
                {
                    // Apperantly ICastable magic was involved when we chose this method to be called
                    // that's why we better stick to the MethodTable it belongs to, otherwise 
                    // DoPrestub() will fail not being able to find implementation for pMD in pDispatchingMT.

                    pDispatchingMT = pMDMT;
                }
            }
#endif // FEATURE_ICASTABLE

            // For value types, the only virtual methods are interface implementations.
            // Thus pDispatching == pMT because there
            // is no inheritance in value types.  Note the BoxedEntryPointStubs are shared
            // between all sharable generic instantiations, so the == test is on
            // canonical method tables.
#ifdef _DEBUG 
            MethodTable *pMDMT = pMD->GetMethodTable(); // put this here to see what the MT is in debug mode
            _ASSERTE(!pMD->GetMethodTable()->IsValueType() ||
                     (pMD->IsUnboxingStub() && (pDispatchingMT->GetCanonicalMethodTable() == pMDMT->GetCanonicalMethodTable())));
#endif // _DEBUG
        }
    }

    GCX_PREEMP_THREAD_EXISTS(CURRENT_THREAD);
    pbRetVal = pMD->DoPrestub(pDispatchingMT);

    UNINSTALL_UNWIND_AND_CONTINUE_HANDLER;
    UNINSTALL_MANAGED_EXCEPTION_DISPATCHER;

    {
        HardwareExceptionHolder

        // Give debugger opportunity to stop here
        ThePreStubPatch();
    }

    pPFrame->Pop(CURRENT_THREAD);

    POSTCONDITION(pbRetVal != NULL);

    END_PRESERVE_LAST_ERROR;

    return pbRetVal;
}

這個函數接收了兩個參數,
第一個是TransitionBlock, 其實就是一個指向棧的指針, 裏面保存了備份的寄存器,
第二個是MethodDesc, 是當前編譯函數的信息, lldb中使用dumpmd pMD便可看到具體信息.

以後會調用MethodDesc::DoPrestub, 若是函數是虛方法則傳入this對象類型的MethodTable.
MethodDesc::DoPrestub的源代碼以下:

PCODE MethodDesc::DoPrestub(MethodTable *pDispatchingMT)
{
    CONTRACT(PCODE)
    {
        STANDARD_VM_CHECK;
        POSTCONDITION(RETVAL != NULL);
    }
    CONTRACT_END;

    Stub *pStub = NULL;
    PCODE pCode = NULL;

    Thread *pThread = GetThread();

    MethodTable *pMT = GetMethodTable();

    // Running a prestub on a method causes us to access its MethodTable
    g_IBCLogger.LogMethodDescAccess(this);

    // A secondary layer of defense against executing code in inspection-only assembly.
    // This should already have been taken care of by not allowing inspection assemblies
    // to be activated. However, this is a very inexpensive piece of insurance in the name
    // of security.
    if (IsIntrospectionOnly())
    {
        _ASSERTE(!"A ReflectionOnly assembly reached the prestub. This should not have happened.");
        COMPlusThrow(kInvalidOperationException, IDS_EE_CODEEXECUTION_IN_INTROSPECTIVE_ASSEMBLY);
    }

    if (ContainsGenericVariables())
    {
        COMPlusThrow(kInvalidOperationException, IDS_EE_CODEEXECUTION_CONTAINSGENERICVAR);
    }

    /**************************   DEBUG CHECKS  *************************/
    /*-----------------------------------------------------------------
    // Halt if needed, GC stress, check the sharing count etc.
    */

#ifdef _DEBUG 
    static unsigned ctr = 0;
    ctr++;

    if (g_pConfig->ShouldPrestubHalt(this))
    {
        _ASSERTE(!"PreStubHalt");
    }

    LOG((LF_CLASSLOADER, LL_INFO10000, "In PreStubWorker for %s::%s\n",
                m_pszDebugClassName, m_pszDebugMethodName));

    // This is a nice place to test out having some fatal EE errors. We do this only in a checked build, and only
    // under the InjectFatalError key.
    if (g_pConfig->InjectFatalError() == 1)
    {
        EEPOLICY_HANDLE_FATAL_ERROR(COR_E_EXECUTIONENGINE);
    }
    else if (g_pConfig->InjectFatalError() == 2)
    {
        EEPOLICY_HANDLE_FATAL_ERROR(COR_E_STACKOVERFLOW);
    }
    else if (g_pConfig->InjectFatalError() == 3)
    {
        TestSEHGuardPageRestore();
    }

    // Useful to test GC with the prestub on the call stack
    if (g_pConfig->ShouldPrestubGC(this))
    {
        GCX_COOP();
        GCHeap::GetGCHeap()->GarbageCollect(-1);
    }
#endif // _DEBUG

    STRESS_LOG1(LF_CLASSLOADER, LL_INFO10000, "Prestubworker: method %pM\n", this);


    GCStress<cfg_any, EeconfigFastGcSPolicy, CoopGcModePolicy>::MaybeTrigger();

    // Are we in the prestub because of a rejit request?  If so, let the ReJitManager
    // take it from here.
    pCode = ReJitManager::DoReJitIfNecessary(this);
    if (pCode != NULL)
    {
        // A ReJIT was performed, so nothing left for DoPrestub() to do. Return now.
        // 
        // The stable entrypoint will either be a pointer to the original JITted code
        // (with a jmp at the top to jump to the newly-rejitted code) OR a pointer to any
        // stub code that must be executed first (e.g., a remoting stub), which in turn
        // will call the original JITted code (which then jmps to the newly-rejitted
        // code).
        RETURN GetStableEntryPoint();
    }

#ifdef FEATURE_PREJIT 
    // If this method is the root of a CER call graph and we've recorded this fact in the ngen image then we're in the prestub in
    // order to trip any runtime level preparation needed for this graph (P/Invoke stub generation/library binding, generic
    // dictionary prepopulation etc.).
    GetModule()->RestoreCer(this);
#endif // FEATURE_PREJIT

#ifdef FEATURE_COMINTEROP 
    /**************************   INTEROP   *************************/
    /*-----------------------------------------------------------------
    // Some method descriptors are COMPLUS-to-COM call descriptors
    // they are not your every day method descriptors, for example
    // they don't have an IL or code.
    */
    if (IsComPlusCall() || IsGenericComPlusCall())
    {
        pCode = GetStubForInteropMethod(this);
        
        GetPrecode()->SetTargetInterlocked(pCode);

        RETURN GetStableEntryPoint();
    }
#endif // FEATURE_COMINTEROP

    // workaround: This is to handle a punted work item dealing with a skipped module constructor
    //       due to appdomain unload. Basically shared code was JITted in domain A, and then
    //       this caused a link to another shared module with a module CCTOR, which was skipped
    //       or aborted in another appdomain we were trying to propagate the activation to.
    //
    //       Note that this is not a fix, but that it just minimizes the window in which the
    //       issue can occur.
    if (pThread->IsAbortRequested())
    {
        pThread->HandleThreadAbort();
    }

    /**************************   CLASS CONSTRUCTOR   ********************/
    // Make sure .cctor has been run

    if (IsClassConstructorTriggeredViaPrestub())
    {
        pMT->CheckRunClassInitThrowing();
    }

    /**************************   BACKPATCHING   *************************/
    // See if the addr of code has changed from the pre-stub
#ifdef FEATURE_INTERPRETER
    if (!IsReallyPointingToPrestub())
#else
    if (!IsPointingToPrestub())
#endif
    {
        LOG((LF_CLASSLOADER, LL_INFO10000,
                "    In PreStubWorker, method already jitted, backpatching call point\n"));

        RETURN DoBackpatch(pMT, pDispatchingMT, TRUE);
    }

    // record if remoting needs to intercept this call
    BOOL  fRemotingIntercepted = IsRemotingInterceptedViaPrestub();

    BOOL  fReportCompilationFinished = FALSE;
    
    /**************************   CODE CREATION  *************************/
    if (IsUnboxingStub())
    {
        pStub = MakeUnboxingStubWorker(this);
    }
#ifdef FEATURE_REMOTING
    else if (pMT->IsInterface() && !IsStatic() && !IsFCall())
    {
        pCode = CRemotingServices::GetDispatchInterfaceHelper(this);
        GetOrCreatePrecode();
    }
#endif // FEATURE_REMOTING
#if defined(FEATURE_SHARE_GENERIC_CODE) 
    else if (IsInstantiatingStub())
    {
        pStub = MakeInstantiatingStubWorker(this);
    }
#endif // defined(FEATURE_SHARE_GENERIC_CODE)
    else if (IsIL() || IsNoMetadata())
    {
        // remember if we need to backpatch the MethodTable slot
        BOOL  fBackpatch           = !fRemotingIntercepted
                                    && !IsEnCMethod();

#ifdef FEATURE_PREJIT 
        //
        // See if we have any prejitted code to use.
        //

        pCode = GetPreImplementedCode();

#ifdef PROFILING_SUPPORTED
        if (pCode != NULL)
        {
            BOOL fShouldSearchCache = TRUE;

            {
                BEGIN_PIN_PROFILER(CORProfilerTrackCacheSearches());
                g_profControlBlock.pProfInterface->
                    JITCachedFunctionSearchStarted((FunctionID) this,
                                                   &fShouldSearchCache);
                END_PIN_PROFILER();
            }

            if (!fShouldSearchCache)
            {
#ifdef FEATURE_INTERPRETER
                SetNativeCodeInterlocked(NULL, pCode, FALSE);
#else
                SetNativeCodeInterlocked(NULL, pCode);
#endif
                _ASSERTE(!IsPreImplemented());
                pCode = NULL;
            }
        }
#endif // PROFILING_SUPPORTED

        if (pCode != NULL)
        {
            LOG((LF_ZAP, LL_INFO10000,
                "ZAP: Using code" FMT_ADDR "for %s.%s sig=\"%s\" (token %x).\n",
                    DBG_ADDR(pCode),
                    m_pszDebugClassName,
                    m_pszDebugMethodName,
                    m_pszDebugMethodSignature,
                    GetMemberDef()));

            TADDR pFixupList = GetFixupList();
            if (pFixupList != NULL)
            {
                Module *pZapModule = GetZapModule();
                _ASSERTE(pZapModule != NULL);
                if (!pZapModule->FixupDelayList(pFixupList))
                {
                    _ASSERTE(!"FixupDelayList failed");
                    ThrowHR(COR_E_BADIMAGEFORMAT);
                }
            }

#ifdef HAVE_GCCOVER
            if (GCStress<cfg_instr_ngen>::IsEnabled())
                SetupGcCoverage(this, (BYTE*) pCode);
#endif // HAVE_GCCOVER

#ifdef PROFILING_SUPPORTED 
            /*
                * This notifies the profiler that a search to find a
                * cached jitted function has been made.
                */
            {
                BEGIN_PIN_PROFILER(CORProfilerTrackCacheSearches());
                g_profControlBlock.pProfInterface->
                    JITCachedFunctionSearchFinished((FunctionID) this, COR_PRF_CACHED_FUNCTION_FOUND);
                END_PIN_PROFILER();
            }
#endif // PROFILING_SUPPORTED
        }

        //
        // If not, try to jit it
        //

#endif // FEATURE_PREJIT

#ifdef FEATURE_READYTORUN
        if (pCode == NULL)
        {
            Module * pModule = GetModule();
            if (pModule->IsReadyToRun())
            {
                pCode = pModule->GetReadyToRunInfo()->GetEntryPoint(this);
                if (pCode != NULL)
                    fReportCompilationFinished = TRUE;
            }
        }
#endif // FEATURE_READYTORUN

        if (pCode == NULL)
        {
            NewHolder<COR_ILMETHOD_DECODER> pHeader(NULL);
            // Get the information on the method
            if (!IsNoMetadata())
            {
                COR_ILMETHOD* ilHeader = GetILHeader(TRUE);
                if(ilHeader == NULL)
                {
#ifdef FEATURE_COMINTEROP
                    // Abstract methods can be called through WinRT derivation if the deriving type
                    // is not implemented in managed code, and calls through the CCW to the abstract
                    // method. Throw a sensible exception in that case.
                    if (pMT->IsExportedToWinRT() && IsAbstract())
                    {
                        COMPlusThrowHR(E_NOTIMPL);
                    }
#endif // FEATURE_COMINTEROP

                    COMPlusThrowHR(COR_E_BADIMAGEFORMAT, BFA_BAD_IL);
                }

                COR_ILMETHOD_DECODER::DecoderStatus status = COR_ILMETHOD_DECODER::FORMAT_ERROR;

                {
                    // Decoder ctor can AV on a malformed method header
                    AVInRuntimeImplOkayHolder AVOkay;
                    pHeader = new COR_ILMETHOD_DECODER(ilHeader, GetMDImport(), &status);
                    if(pHeader == NULL)
                        status = COR_ILMETHOD_DECODER::FORMAT_ERROR;
                }

                if (status == COR_ILMETHOD_DECODER::VERIFICATION_ERROR &&
                    Security::CanSkipVerification(GetModule()->GetDomainAssembly()))
                {
                    status = COR_ILMETHOD_DECODER::SUCCESS;
                }

                if (status != COR_ILMETHOD_DECODER::SUCCESS)
                {
                    if (status == COR_ILMETHOD_DECODER::VERIFICATION_ERROR)
                    {
                        // Throw a verification HR
                        COMPlusThrowHR(COR_E_VERIFICATION);
                    }
                    else
                    {
                        COMPlusThrowHR(COR_E_BADIMAGEFORMAT, BFA_BAD_IL);
                    }
                }

#ifdef _VER_EE_VERIFICATION_ENABLED 
                static ConfigDWORD peVerify;

                if (peVerify.val(CLRConfig::EXTERNAL_PEVerify))
                    Verify(pHeader, TRUE, FALSE);   // Throws a VerifierException if verification fails
#endif // _VER_EE_VERIFICATION_ENABLED
            } // end if (!IsNoMetadata())

            // JIT it
            LOG((LF_CLASSLOADER, LL_INFO1000000,
                    "    In PreStubWorker, calling MakeJitWorker\n"));

            // Create the precode eagerly if it is going to be needed later.
            if (!fBackpatch)
            {
                GetOrCreatePrecode();
            }

            // Mark the code as hot in case the method ends up in the native image
            g_IBCLogger.LogMethodCodeAccess(this);

            pCode = MakeJitWorker(pHeader, 0, 0);

#ifdef FEATURE_INTERPRETER
            if ((pCode != NULL) && !HasStableEntryPoint())
            {
                // We don't yet have a stable entry point, so don't do backpatching yet.
                // But we do have to handle some extra cases that occur in backpatching.
                // (Perhaps I *should* get to the backpatching code, but in a mode where we know
                // we're not dealing with the stable entry point...)
                if (HasNativeCodeSlot())
                {
                    // We called "SetNativeCodeInterlocked" in MakeJitWorker, which updated the native
                    // code slot, but I think we also want to update the regular slot...
                    PCODE tmpEntry = GetTemporaryEntryPoint();
                    PCODE pFound = FastInterlockCompareExchangePointer(GetAddrOfSlot(), pCode, tmpEntry);
                    // Doesn't matter if we failed -- if we did, it's because somebody else made progress.
                    if (pFound != tmpEntry) pCode = pFound;
                }

                // Now we handle the case of a FuncPtrPrecode.  
                FuncPtrStubs * pFuncPtrStubs = GetLoaderAllocator()->GetFuncPtrStubsNoCreate();
                if (pFuncPtrStubs != NULL)
                {
                    Precode* pFuncPtrPrecode = pFuncPtrStubs->Lookup(this);
                    if (pFuncPtrPrecode != NULL)
                    {
                        // If there is a funcptr precode to patch, attempt to patch it.  If we lose, that's OK,
                        // somebody else made progress.
                        pFuncPtrPrecode->SetTargetInterlocked(pCode);
                    }
                }
            }
#endif // FEATURE_INTERPRETER
        } // end if (pCode == NULL)
    } // end else if (IsIL() || IsNoMetadata())
    else if (IsNDirect())
    {
        if (!GetModule()->GetSecurityDescriptor()->CanCallUnmanagedCode())
            Security::ThrowSecurityException(g_SecurityPermissionClassName, SPFLAGSUNMANAGEDCODE);

        pCode = GetStubForInteropMethod(this);
        GetOrCreatePrecode();
    }
    else if (IsFCall())
    {
        // Get the fcall implementation
        BOOL fSharedOrDynamicFCallImpl;
        pCode = ECall::GetFCallImpl(this, &fSharedOrDynamicFCallImpl);

        if (fSharedOrDynamicFCallImpl)
        {
            // Fake ctors share one implementation that has to be wrapped by prestub
            GetOrCreatePrecode();
        }
    }
    else if (IsArray())
    {
        pStub = GenerateArrayOpStub((ArrayMethodDesc*)this);
    }
    else if (IsEEImpl())
    {
        _ASSERTE(GetMethodTable()->IsDelegate());
        pCode = COMDelegate::GetInvokeMethodStub((EEImplMethodDesc*)this);
        GetOrCreatePrecode();
    }
    else
    {
        // This is a method type we don't handle yet
        _ASSERTE(!"Unknown Method Type");
    }

    /**************************   POSTJIT *************************/
#ifndef FEATURE_INTERPRETER
    _ASSERTE(pCode == NULL || GetNativeCode() == NULL || pCode == GetNativeCode());
#else // FEATURE_INTERPRETER
    // Interpreter adds a new possiblity == someone else beat us to installing an intepreter stub.
    _ASSERTE(pCode == NULL || GetNativeCode() == NULL || pCode == GetNativeCode()
             || Interpreter::InterpretationStubToMethodInfo(pCode) == this);
#endif // FEATURE_INTERPRETER

    // At this point we must have either a pointer to managed code or to a stub. All of the above code
    // should have thrown an exception if it couldn't make a stub.
    _ASSERTE((pStub != NULL) ^ (pCode != NULL));

    /**************************   SECURITY   *************************/

    // Lets check to see if we need declarative security on this stub, If we have
    // security checks on this method or class then we need to add an intermediate
    // stub that performs declarative checks prior to calling the real stub.
    // record if security needs to intercept this call (also depends on whether we plan to use stubs for declarative security)

#if !defined( HAS_REMOTING_PRECODE) && defined (FEATURE_REMOTING)
    /**************************   REMOTING   *************************/

    // check for MarshalByRef scenarios ... we need to intercept
    // Non-virtual calls on MarshalByRef types
    if (fRemotingIntercepted)
    {
        // let us setup a remoting stub to intercept all the calls
        Stub *pRemotingStub = CRemotingServices::GetStubForNonVirtualMethod(this, 
            (pStub != NULL) ? (LPVOID)pStub->GetEntryPoint() : (LPVOID)pCode, pStub);
        
        if (pRemotingStub != NULL)
        {
            pStub = pRemotingStub;
            pCode = NULL;
        }
    }
#endif // HAS_REMOTING_PRECODE

    _ASSERTE((pStub != NULL) ^ (pCode != NULL));

#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
    //
    // We are seeing memory reordering race around fixups (see DDB 193514 and related bugs). We get into
    // situation where the patched precode is visible by other threads, but the resolved fixups 
    // are not. IT SHOULD NEVER HAPPEN according to our current understanding of x86/x64 memory model.
    // (see email thread attached to the bug for details).
    //
    // We suspect that there may be bug in the hardware or that hardware may have shortcuts that may be 
    // causing grief. We will try to avoid the race by executing an extra memory barrier.
    //
    MemoryBarrier();
#endif

    if (pCode != NULL)
    {
        if (HasPrecode())
            GetPrecode()->SetTargetInterlocked(pCode);
        else
        if (!HasStableEntryPoint())
        {
            // Is the result an interpreter stub?
#ifdef FEATURE_INTERPRETER
            if (Interpreter::InterpretationStubToMethodInfo(pCode) == this)
            {
                SetEntryPointInterlocked(pCode);
            }
            else
#endif // FEATURE_INTERPRETER
            {
                SetStableEntryPointInterlocked(pCode);
            }
        }
    }
    else
    {
        if (!GetOrCreatePrecode()->SetTargetInterlocked(pStub->GetEntryPoint()))
        {
            pStub->DecRef();
        }
        else
        if (pStub->HasExternalEntryPoint())
        {
            // If the Stub wraps code that is outside of the Stub allocation, then we
            // need to free the Stub allocation now.
            pStub->DecRef();
        }
    }

#ifdef FEATURE_INTERPRETER
    _ASSERTE(!IsReallyPointingToPrestub());
#else // FEATURE_INTERPRETER
    _ASSERTE(!IsPointingToPrestub());
    _ASSERTE(HasStableEntryPoint());
#endif // FEATURE_INTERPRETER

    if (fReportCompilationFinished)
        DACNotifyCompilationFinished(this);

    RETURN DoBackpatch(pMT, pDispatchingMT, FALSE);
}

這個函數比較長, 咱們只須要關注兩個地方:

pCode = MakeJitWorker(pHeader, 0, 0);

MakeJitWorker會調用JIT編譯函數, pCode是編譯後的機器代碼地址.

if (HasPrecode())
    GetPrecode()->SetTargetInterlocked(pCode);

SetTargetInterlocked會對Precode打補丁, 第二次調用函數時會直接跳轉到編譯結果.

MakeJitWorker的源代碼以下:

PCODE MethodDesc::MakeJitWorker(COR_ILMETHOD_DECODER* ILHeader, DWORD flags, DWORD flags2)
{
    STANDARD_VM_CONTRACT;

    BOOL fIsILStub = IsILStub();        // @TODO: understand the need for this special case

    LOG((LF_JIT, LL_INFO1000000,
         "MakeJitWorker(" FMT_ADDR ", %s) for %s:%s\n",
         DBG_ADDR(this),
         fIsILStub               ? " TRUE" : "FALSE",
         GetMethodTable()->GetDebugClassName(),
         m_pszDebugMethodName));

    PCODE pCode = NULL;
    ULONG sizeOfCode = 0;
#ifdef FEATURE_INTERPRETER
    PCODE pPreviousInterpStub = NULL;
    BOOL fInterpreted = FALSE;
    BOOL fStable = TRUE;  // True iff the new code address (to be stored in pCode), is a stable entry point.
#endif

#ifdef FEATURE_MULTICOREJIT
    MulticoreJitManager & mcJitManager = GetAppDomain()->GetMulticoreJitManager();

    bool fBackgroundThread = (flags & CORJIT_FLG_MCJIT_BACKGROUND) != 0;
#endif

    {
        // Enter the global lock which protects the list of all functions being JITd
        ListLockHolder pJitLock (GetDomain()->GetJitLock());

        // It is possible that another thread stepped in before we entered the global lock for the first time.
        pCode = GetNativeCode();
        if (pCode != NULL)
        {
#ifdef FEATURE_INTERPRETER
            if (Interpreter::InterpretationStubToMethodInfo(pCode) == this)
            {
                pPreviousInterpStub = pCode;
            }
            else
#endif // FEATURE_INTERPRETER
            goto Done;
        }

        const char *description = "jit lock";
        INDEBUG(description = m_pszDebugMethodName;)
        ListLockEntryHolder pEntry(ListLockEntry::Find(pJitLock, this, description));

        // We have an entry now, we can release the global lock
        pJitLock.Release();

        // Take the entry lock
        {
            ListLockEntryLockHolder pEntryLock(pEntry, FALSE);

            if (pEntryLock.DeadlockAwareAcquire())
            {
                if (pEntry->m_hrResultCode == S_FALSE)
                {
                    // Nobody has jitted the method yet
                }
                else
                {
                    // We came in to jit but someone beat us so return the
                    // jitted method!

                    // We can just fall through because we will notice below that
                    // the method has code.

                    // @todo: Note that we may have a failed HRESULT here -
                    // we might want to return an early error rather than
                    // repeatedly failing the jit.
                }
            }
            else
            {
                // Taking this lock would cause a deadlock (presumably because we
                // are involved in a class constructor circular dependency.)  For
                // instance, another thread may be waiting to run the class constructor
                // that we are jitting, but is currently jitting this function.
                //
                // To remedy this, we want to go ahead and do the jitting anyway.
                // The other threads contending for the lock will then notice that
                // the jit finished while they were running class constructors, and abort their
                // current jit effort.
                //
                // We don't have to do anything special right here since we
                // can check HasNativeCode() to detect this case later.
                //
                // Note that at this point we don't have the lock, but that's OK because the
                // thread which does have the lock is blocked waiting for us.
            }

            // It is possible that another thread stepped in before we entered the lock.
            pCode = GetNativeCode();
#ifdef FEATURE_INTERPRETER
            if (pCode != NULL && (pCode != pPreviousInterpStub))
#else
            if (pCode != NULL)
#endif // FEATURE_INTERPRETER
            {
                goto Done;
            }

            SString namespaceOrClassName, methodName, methodSignature;

            PCODE pOtherCode = NULL; // Need to move here due to 'goto GotNewCode'
            
#ifdef FEATURE_MULTICOREJIT

            bool fCompiledInBackground = false;

            // If not called from multi-core JIT thread, 
            if (! fBackgroundThread)
            {
                // Quick check before calling expensive out of line function on this method's domain has code JITted by background thread
                if (mcJitManager.GetMulticoreJitCodeStorage().GetRemainingMethodCount() > 0)
                {
                    if (MulticoreJitManager::IsMethodSupported(this))
                    {
                        pCode = mcJitManager.RequestMethodCode(this); // Query multi-core JIT manager for compiled code

                        // Multicore JIT manager starts background thread to pre-compile methods, but it does not back-patch it/notify profiler/notify DAC,
                        // Jumtp to GotNewCode to do so
                        if (pCode != NULL)
                        {
                            fCompiledInBackground = true;
                    
#ifdef DEBUGGING_SUPPORTED
                            // Notify the debugger of the jitted function
                            if (g_pDebugInterface != NULL)
                            {
                                g_pDebugInterface->JITComplete(this, pCode);
                            }
#endif

                            goto GotNewCode;
                        }
                    }
                }
            }
#endif

            if (fIsILStub)
            {
                // we race with other threads to JIT the code for an IL stub and the
                // IL header is released once one of the threads completes.  As a result
                // we must be inside the lock to reliably get the IL header for the
                // stub.

                ILStubResolver* pResolver = AsDynamicMethodDesc()->GetILStubResolver();
                ILHeader = pResolver->GetILHeader();
            }

#ifdef MDA_SUPPORTED 
            MdaJitCompilationStart* pProbe = MDA_GET_ASSISTANT(JitCompilationStart);
            if (pProbe)
                pProbe->NowCompiling(this);
#endif // MDA_SUPPORTED

#ifdef PROFILING_SUPPORTED 
            // If profiling, need to give a chance for a tool to examine and modify
            // the IL before it gets to the JIT.  This allows one to add probe calls for
            // things like code coverage, performance, or whatever.
            {
                BEGIN_PIN_PROFILER(CORProfilerTrackJITInfo());

                // Multicore JIT should be disabled when CORProfilerTrackJITInfo is on
                // But there could be corner case in which profiler is attached when multicore background thread is calling MakeJitWorker
                // Disable this block when calling from multicore JIT background thread
                if (!IsNoMetadata()
#ifdef FEATURE_MULTICOREJIT

                    && (! fBackgroundThread)
#endif
                    )
                {
                    g_profControlBlock.pProfInterface->JITCompilationStarted((FunctionID) this, TRUE);
                    // The profiler may have changed the code on the callback.  Need to
                    // pick up the new code.  Note that you have to be fully trusted in
                    // this mode and the code will not be verified.
                    COR_ILMETHOD *pilHeader = GetILHeader(TRUE);
                    new (ILHeader) COR_ILMETHOD_DECODER(pilHeader, GetMDImport(), NULL);
                }
                END_PIN_PROFILER();
            }
#endif // PROFILING_SUPPORTED
#ifdef FEATURE_INTERPRETER
            // We move the ETW event for start of JITting inward, after we make the decision
            // to JIT rather than interpret.
#else  // FEATURE_INTERPRETER
            // Fire an ETW event to mark the beginning of JIT'ing
            ETW::MethodLog::MethodJitting(this, &namespaceOrClassName, &methodName, &methodSignature);
#endif  // FEATURE_INTERPRETER

#ifdef FEATURE_STACK_SAMPLING
#ifdef FEATURE_MULTICOREJIT
            if (!fBackgroundThread)
#endif // FEATURE_MULTICOREJIT
            {
                StackSampler::RecordJittingInfo(this, flags, flags2);
            }
#endif // FEATURE_STACK_SAMPLING

            EX_TRY
            {
                pCode = UnsafeJitFunction(this, ILHeader, flags, flags2, &sizeOfCode);
            }
            EX_CATCH
            {
                // If the current thread threw an exception, but a competing thread
                // somehow succeeded at JITting the same function (e.g., out of memory
                // encountered on current thread but not competing thread), then go ahead
                // and swallow this current thread's exception, since we somehow managed
                // to successfully JIT the code on the other thread.
                // 
                // Note that if a deadlock cycle is broken, that does not result in an
                // exception--the thread would just pass through the lock and JIT the
                // function in competition with the other thread (with the winner of the
                // race decided later on when we do SetNativeCodeInterlocked). This
                // try/catch is purely to deal with the (unusual) case where a competing
                // thread succeeded where we aborted.
                
                pOtherCode = GetNativeCode();
                
                if (pOtherCode == NULL)
                {
                    pEntry->m_hrResultCode = E_FAIL;
                    EX_RETHROW;
                }
            }
            EX_END_CATCH(RethrowTerminalExceptions)

            if (pOtherCode != NULL)
            {
                // Somebody finished jitting recursively while we were jitting the method.
                // Just use their method & leak the one we finished. (Normally we hope
                // not to finish our JIT in this case, as we will abort early if we notice
                // a reentrant jit has occurred.  But we may not catch every place so we
                // do a definitive final check here.
                pCode = pOtherCode;
                goto Done;
            }

            _ASSERTE(pCode != NULL);

#ifdef HAVE_GCCOVER
            if (GCStress<cfg_instr_jit>::IsEnabled())
            {
                SetupGcCoverage(this, (BYTE*) pCode);
            }
#endif // HAVE_GCCOVER

#ifdef FEATURE_INTERPRETER
            // Determine whether the new code address is "stable"...= is not an interpreter stub.
            fInterpreted = (Interpreter::InterpretationStubToMethodInfo(pCode) == this);
            fStable = !fInterpreted;
#endif // FEATURE_INTERPRETER

#ifdef FEATURE_MULTICOREJIT
            
            // If called from multi-core JIT background thread, store code under lock, delay patching until code is queried from application threads
            if (fBackgroundThread)
            {
                // Fire an ETW event to mark the end of JIT'ing
                ETW::MethodLog::MethodJitted(this, &namespaceOrClassName, &methodName, &methodSignature, pCode, 0 /* ReJITID */);

#ifdef FEATURE_PERFMAP
                // Save the JIT'd method information so that perf can resolve JIT'd call frames.
                PerfMap::LogJITCompiledMethod(this, pCode, sizeOfCode);
#endif
                
                mcJitManager.GetMulticoreJitCodeStorage().StoreMethodCode(this, pCode);
                
                goto Done;
            }

GotNewCode:
#endif
            // If this function had already been requested for rejit (before its original
            // code was jitted), then give the rejit manager a chance to jump-stamp the
            // code we just compiled so the first thread entering the function will jump
            // to the prestub and trigger the rejit. Note that the PublishMethodHolder takes
            // a lock to avoid a particular kind of rejit race. See
            // code:ReJitManager::PublishMethodHolder::PublishMethodHolder#PublishCode for
            // details on the rejit race.
            // 
            // Aside from rejit, performing a SetNativeCodeInterlocked at this point
            // generally ensures that there is only one winning version of the native
            // code. This also avoid races with profiler overriding ngened code (see
            // matching SetNativeCodeInterlocked done after
            // JITCachedFunctionSearchStarted)
#ifdef FEATURE_INTERPRETER
            PCODE pExpected = pPreviousInterpStub;
            if (pExpected == NULL) pExpected = GetTemporaryEntryPoint();
#endif
            {
                ReJitPublishMethodHolder publishWorker(this, pCode);
                if (!SetNativeCodeInterlocked(pCode
#ifdef FEATURE_INTERPRETER
                    , pExpected, fStable
#endif
                    ))
                {
                    // Another thread beat us to publishing its copy of the JITted code.
                    pCode = GetNativeCode();
                    goto Done;
                }
            }

#ifdef FEATURE_INTERPRETER
            // State for dynamic methods cannot be freed if the method was ever interpreted,
            // since there is no way to ensure that it is not in use at the moment.
            if (IsDynamicMethod() && !fInterpreted && (pPreviousInterpStub == NULL))
            {
                AsDynamicMethodDesc()->GetResolver()->FreeCompileTimeState();
            }
#endif // FEATURE_INTERPRETER

            // We succeeded in jitting the code, and our jitted code is the one that's going to run now.
            pEntry->m_hrResultCode = S_OK;

 #ifdef PROFILING_SUPPORTED 
            // Notify the profiler that JIT completed.
            // Must do this after the address has been set.
            // @ToDo: Why must we set the address before notifying the profiler ??
            //        Note that if IsInterceptedForDeclSecurity is set no one should access the jitted code address anyway.
            {
                BEGIN_PIN_PROFILER(CORProfilerTrackJITInfo());
                if (!IsNoMetadata())
                {
                    g_profControlBlock.pProfInterface->
                        JITCompilationFinished((FunctionID) this,
                                                pEntry->m_hrResultCode, 
                                                TRUE);
                }
                END_PIN_PROFILER();
            }
#endif // PROFILING_SUPPORTED

#ifdef FEATURE_MULTICOREJIT
            if (! fCompiledInBackground)
#endif
#ifdef FEATURE_INTERPRETER
            // If we didn't JIT, but rather, created an interpreter stub (i.e., fStable is false), don't tell ETW that we did.
            if (fStable)
#endif // FEATURE_INTERPRETER
            {
                // Fire an ETW event to mark the end of JIT'ing
                ETW::MethodLog::MethodJitted(this, &namespaceOrClassName, &methodName, &methodSignature, pCode, 0 /* ReJITID */);

#ifdef FEATURE_PERFMAP
                // Save the JIT'd method information so that perf can resolve JIT'd call frames.
                PerfMap::LogJITCompiledMethod(this, pCode, sizeOfCode);
#endif
            }
 

#ifdef FEATURE_MULTICOREJIT

            // If not called from multi-core JIT thread, not got code from storage, quick check before calling out of line function
            if (! fBackgroundThread && ! fCompiledInBackground && mcJitManager.IsRecorderActive())
            {
                if (MulticoreJitManager::IsMethodSupported(this))
                {
                    mcJitManager.RecordMethodJit(this); // Tell multi-core JIT manager to record method on successful JITting
                }
            }
#endif

            if (!fIsILStub)
            {
                // The notification will only occur if someone has registered for this method.
                DACNotifyCompilationFinished(this);
            }
        }
    }

Done:

    // We must have a code by now.
    _ASSERTE(pCode != NULL);

    LOG((LF_CORDB, LL_EVERYTHING, "MethodDesc::MakeJitWorker finished. Stub is" FMT_ADDR "\n",
         DBG_ADDR(pCode)));

    return pCode;
}

這個函數是線程安全的JIT函數,
若是多個線程編譯同一個函數, 其中一個線程會執行編譯, 其餘線程會等待編譯完成.
每一個AppDomain會有一個鎖的集合, 一個正在編譯的函數擁有一個ListLockEntry對象,
函數首先會對集合上鎖, 獲取或者建立函數對應的ListLockEntry, 而後釋放對集合的鎖,
這個時候全部線程對同一個函數都會獲取到同一個ListLockEntry, 而後再對ListLockEntry上鎖.
上鎖後調用非線程安全的JIT函數:

pCode = UnsafeJitFunction(this, ILHeader, flags, flags2, &sizeOfCode)

接下來還有幾層調用纔會到JIT主函數, 我只簡單說明他們的處理:

UnsafeJitFunction

這個函數會建立CEEJitInfo(JIT層給EE層反饋使用的類)的實例, 從函數信息中獲取編譯標誌(是否以Debug模式編譯),
調用CallCompileMethodWithSEHWrapper, 而且在相對地址溢出時禁止使用相對地址(fAllowRel32)而後重試編譯.

CallCompileMethodWithSEHWrapper

這個函數會在try中調用invokeCompileMethod.

invokeCompileMethod

這個函數讓當前線程進入Preemptive模式(GC能夠不用掛起當前線程), 而後調用invokeCompileMethodHelper.

invokeCompileMethodHelper

這個函數通常狀況下會調用jitMgr->m_jit->compileMethod.

CILJit::compileMethod

這個函數通常狀況下會調用jitNativeCode.

jitNativeCode

建立和初始化Compiler的實例, 並調用pParam->pComp->compCompile(7參數版).
內聯時也會從這個函數開始調用, 若是是內聯則Compiler實例會在第一次建立後複用.
Compiler負責單個函數的整個JIT過程.

Compiler::compCompile(7參數版)

這個函數會對Compiler實例作出一些初始化處理, 而後調用Compiler::compCompileHelper.

compCompileHelper

這個函數會先建立本地變量表lvaTable和BasicBlock的鏈表,
必要時添加一個內部使用的block(BB01), 而後解析IL代碼添加更多的block, 具體將在下面說明.
而後調用compCompile(3參數版).

compCompile(3參數版)

這就是JIT的主函數, 這個函數負責調用JIT各個階段的工做, 具體將在下面說明.

建立本地變量表

compCompileHelper會調用lvaInitTypeRef,
lvaInitTypeRef會建立本地變量表, 源代碼以下:

void Compiler::lvaInitTypeRef()
{

    /* x86 args look something like this:
        [this ptr] [hidden return buffer] [declared arguments]* [generic context] [var arg cookie]

       x64 is closer to the native ABI:
        [this ptr] [hidden return buffer] [generic context] [var arg cookie] [declared arguments]*
        (Note: prior to .NET Framework 4.5.1 for Windows 8.1 (but not .NET Framework 4.5.1 "downlevel"),
        the "hidden return buffer" came before the "this ptr". Now, the "this ptr" comes first. This
        is different from the C++ order, where the "hidden return buffer" always comes first.)

       ARM and ARM64 are the same as the current x64 convention:
        [this ptr] [hidden return buffer] [generic context] [var arg cookie] [declared arguments]*

       Key difference:
           The var arg cookie and generic context are swapped with respect to the user arguments
    */

    /* Set compArgsCount and compLocalsCount */

    info.compArgsCount = info.compMethodInfo->args.numArgs;

    // Is there a 'this' pointer

    if (!info.compIsStatic)
    {
        info.compArgsCount++;
    }
    else
    {
        info.compThisArg = BAD_VAR_NUM;
    }

    info.compILargsCount = info.compArgsCount;

#ifdef FEATURE_SIMD
    if (featureSIMD && (info.compRetNativeType == TYP_STRUCT))
    {
        var_types structType = impNormStructType(info.compMethodInfo->args.retTypeClass);
        info.compRetType     = structType;
    }
#endif // FEATURE_SIMD

    // Are we returning a struct using a return buffer argument?
    //
    const bool hasRetBuffArg = impMethodInfo_hasRetBuffArg(info.compMethodInfo);

    // Possibly change the compRetNativeType from TYP_STRUCT to a "primitive" type
    // when we are returning a struct by value and it fits in one register
    //
    if (!hasRetBuffArg && varTypeIsStruct(info.compRetNativeType))
    {
        CORINFO_CLASS_HANDLE retClsHnd = info.compMethodInfo->args.retTypeClass;

        Compiler::structPassingKind howToReturnStruct;
        var_types                   returnType = getReturnTypeForStruct(retClsHnd, &howToReturnStruct);

        if (howToReturnStruct == SPK_PrimitiveType)
        {
            assert(returnType != TYP_UNKNOWN);
            assert(returnType != TYP_STRUCT);

            info.compRetNativeType = returnType;

            // ToDo: Refactor this common code sequence into its own method as it is used 4+ times
            if ((returnType == TYP_LONG) && (compLongUsed == false))
            {
                compLongUsed = true;
            }
            else if (((returnType == TYP_FLOAT) || (returnType == TYP_DOUBLE)) && (compFloatingPointUsed == false))
            {
                compFloatingPointUsed = true;
            }
        }
    }

    // Do we have a RetBuffArg?

    if (hasRetBuffArg)
    {
        info.compArgsCount++;
    }
    else
    {
        info.compRetBuffArg = BAD_VAR_NUM;
    }

    /* There is a 'hidden' cookie pushed last when the
       calling convention is varargs */

    if (info.compIsVarArgs)
    {
        info.compArgsCount++;
    }

    // Is there an extra parameter used to pass instantiation info to
    // shared generic methods and shared generic struct instance methods?
    if (info.compMethodInfo->args.callConv & CORINFO_CALLCONV_PARAMTYPE)
    {
        info.compArgsCount++;
    }
    else
    {
        info.compTypeCtxtArg = BAD_VAR_NUM;
    }

    lvaCount = info.compLocalsCount = info.compArgsCount + info.compMethodInfo->locals.numArgs;

    info.compILlocalsCount = info.compILargsCount + info.compMethodInfo->locals.numArgs;

    /* Now allocate the variable descriptor table */

    if (compIsForInlining())
    {
        lvaTable    = impInlineInfo->InlinerCompiler->lvaTable;
        lvaCount    = impInlineInfo->InlinerCompiler->lvaCount;
        lvaTableCnt = impInlineInfo->InlinerCompiler->lvaTableCnt;

        // No more stuff needs to be done.
        return;
    }

    lvaTableCnt = lvaCount * 2;

    if (lvaTableCnt < 16)
    {
        lvaTableCnt = 16;
    }

    lvaTable         = (LclVarDsc*)compGetMemArray(lvaTableCnt, sizeof(*lvaTable), CMK_LvaTable);
    size_t tableSize = lvaTableCnt * sizeof(*lvaTable);
    memset(lvaTable, 0, tableSize);
    for (unsigned i = 0; i < lvaTableCnt; i++)
    {
        new (&lvaTable[i], jitstd::placement_t()) LclVarDsc(this); // call the constructor.
    }

    //-------------------------------------------------------------------------
    // Count the arguments and initialize the respective lvaTable[] entries
    //
    // First the implicit arguments
    //-------------------------------------------------------------------------

    InitVarDscInfo varDscInfo;
    varDscInfo.Init(lvaTable, hasRetBuffArg);

    lvaInitArgs(&varDscInfo);

    //-------------------------------------------------------------------------
    // Finally the local variables
    //-------------------------------------------------------------------------

    unsigned                varNum    = varDscInfo.varNum;
    LclVarDsc*              varDsc    = varDscInfo.varDsc;
    CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args;

    for (unsigned i = 0; i < info.compMethodInfo->locals.numArgs;
         i++, varNum++, varDsc++, localsSig = info.compCompHnd->getArgNext(localsSig))
    {
        CORINFO_CLASS_HANDLE typeHnd;
        CorInfoTypeWithMod   corInfoType =
            info.compCompHnd->getArgType(&info.compMethodInfo->locals, localsSig, &typeHnd);
        lvaInitVarDsc(varDsc, varNum, strip(corInfoType), typeHnd, localsSig, &info.compMethodInfo->locals);

        varDsc->lvPinned  = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0);
        varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame
    }

    if ( // If there already exist unsafe buffers, don't mark more structs as unsafe
        // as that will cause them to be placed along with the real unsafe buffers,
        // unnecessarily exposing them to overruns. This can affect GS tests which
        // intentionally do buffer-overruns.
        !getNeedsGSSecurityCookie() &&
        // GS checks require the stack to be re-ordered, which can't be done with EnC
        !opts.compDbgEnC && compStressCompile(STRESS_UNSAFE_BUFFER_CHECKS, 25))
    {
        setNeedsGSSecurityCookie();
        compGSReorderStackLayout = true;

        for (unsigned i = 0; i < lvaCount; i++)
        {
            if ((lvaTable[i].lvType == TYP_STRUCT) && compStressCompile(STRESS_GENERIC_VARN, 60))
            {
                lvaTable[i].lvIsUnsafeBuffer = true;
            }
        }
    }

    if (getNeedsGSSecurityCookie())
    {
        // Ensure that there will be at least one stack variable since
        // we require that the GSCookie does not have a 0 stack offset.
        unsigned dummy         = lvaGrabTempWithImplicitUse(false DEBUGARG("GSCookie dummy"));
        lvaTable[dummy].lvType = TYP_INT;
    }

#ifdef DEBUG
    if (verbose)
    {
        lvaTableDump(INITIAL_FRAME_LAYOUT);
    }
#endif
}

初始的本地變量數量是info.compArgsCount + info.compMethodInfo->locals.numArgs, 也就是IL中的參數數量+IL中的本地變量數量.
由於後面可能會添加更多的臨時變量, 本地變量表的儲存採用了length+capacity的方式,
本地變量表的指針是lvaTable, 當前長度是lvaCount, 最大長度是lvaTableCnt.
本地變量表的開頭部分會先保存IL中的參數變量, 隨後纔是IL中的本地變量,
例若有3個參數, 2個本地變量時, 本地變量表是[參數0, 參數1, 參數2, 變量0, 變量1, 空, 空, 空, ... ].

此外若是對當前函數的編譯是爲了內聯, 本地變量表會使用調用端(callsite)的對象.

根據IL建立BasicBlock

在進入JIT的主函數以前, compCompileHelper會先解析IL而且根據指令建立BasicBlock.
在上一篇中也提到過,
BasicBlock是內部不包含跳轉的邏輯塊, 跳轉指令原則只出如今block的最後, 同時跳轉目標只能是block的開頭.

建立BasicBlock的邏輯在函數fgFindBasicBlocks, 咱們來看看它的源代碼:

/*****************************************************************************
 *
 *  Main entry point to discover the basic blocks for the current function.
 */

void Compiler::fgFindBasicBlocks()
{
#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In fgFindBasicBlocks() for %s\n", info.compFullName);
    }
#endif

    /* Allocate the 'jump target' vector
     *
     *  We need one extra byte as we mark
     *  jumpTarget[info.compILCodeSize] with JT_ADDR
     *  when we need to add a dummy block
     *  to record the end of a try or handler region.
     */
    BYTE* jumpTarget = new (this, CMK_Unknown) BYTE[info.compILCodeSize + 1];
    memset(jumpTarget, JT_NONE, info.compILCodeSize + 1);
    noway_assert(JT_NONE == 0);

    /* Walk the instrs to find all jump targets */

    fgFindJumpTargets(info.compCode, info.compILCodeSize, jumpTarget);
    if (compDonotInline())
    {
        return;
    }

    unsigned XTnum;

    /* Are there any exception handlers? */

    if (info.compXcptnsCount > 0)
    {
        noway_assert(!compIsForInlining());

        /* Check and mark all the exception handlers */

        for (XTnum = 0; XTnum < info.compXcptnsCount; XTnum++)
        {
            DWORD             tmpOffset;
            CORINFO_EH_CLAUSE clause;
            info.compCompHnd->getEHinfo(info.compMethodHnd, XTnum, &clause);
            noway_assert(clause.HandlerLength != (unsigned)-1);

            if (clause.TryLength <= 0)
            {
                BADCODE("try block length <=0");
            }

            /* Mark the 'try' block extent and the handler itself */

            if (clause.TryOffset > info.compILCodeSize)
            {
                BADCODE("try offset is > codesize");
            }
            if (jumpTarget[clause.TryOffset] == JT_NONE)
            {
                jumpTarget[clause.TryOffset] = JT_ADDR;
            }

            tmpOffset = clause.TryOffset + clause.TryLength;
            if (tmpOffset > info.compILCodeSize)
            {
                BADCODE("try end is > codesize");
            }
            if (jumpTarget[tmpOffset] == JT_NONE)
            {
                jumpTarget[tmpOffset] = JT_ADDR;
            }

            if (clause.HandlerOffset > info.compILCodeSize)
            {
                BADCODE("handler offset > codesize");
            }
            if (jumpTarget[clause.HandlerOffset] == JT_NONE)
            {
                jumpTarget[clause.HandlerOffset] = JT_ADDR;
            }

            tmpOffset = clause.HandlerOffset + clause.HandlerLength;
            if (tmpOffset > info.compILCodeSize)
            {
                BADCODE("handler end > codesize");
            }
            if (jumpTarget[tmpOffset] == JT_NONE)
            {
                jumpTarget[tmpOffset] = JT_ADDR;
            }

            if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
            {
                if (clause.FilterOffset > info.compILCodeSize)
                {
                    BADCODE("filter offset > codesize");
                }
                if (jumpTarget[clause.FilterOffset] == JT_NONE)
                {
                    jumpTarget[clause.FilterOffset] = JT_ADDR;
                }
            }
        }
    }

#ifdef DEBUG
    if (verbose)
    {
        bool anyJumpTargets = false;
        printf("Jump targets:\n");
        for (unsigned i = 0; i < info.compILCodeSize + 1; i++)
        {
            if (jumpTarget[i] == JT_NONE)
            {
                continue;
            }

            anyJumpTargets = true;
            printf("  IL_%04x", i);

            if (jumpTarget[i] & JT_ADDR)
            {
                printf(" addr");
            }
            if (jumpTarget[i] & JT_MULTI)
            {
                printf(" multi");
            }
            printf("\n");
        }
        if (!anyJumpTargets)
        {
            printf("  none\n");
        }
    }
#endif // DEBUG

    /* Now create the basic blocks */

    fgMakeBasicBlocks(info.compCode, info.compILCodeSize, jumpTarget);

    if (compIsForInlining())
    {
        if (compInlineResult->IsFailure())
        {
            return;
        }

        bool hasReturnBlocks           = false;
        bool hasMoreThanOneReturnBlock = false;

        for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
        {
            if (block->bbJumpKind == BBJ_RETURN)
            {
                if (hasReturnBlocks)
                {
                    hasMoreThanOneReturnBlock = true;
                    break;
                }

                hasReturnBlocks = true;
            }
        }

        if (!hasReturnBlocks && !compInlineResult->UsesLegacyPolicy())
        {
            //
            // Mark the call node as "no return". The inliner might ignore CALLEE_DOES_NOT_RETURN and
            // fail inline for a different reasons. In that case we still want to make the "no return"
            // information available to the caller as it can impact caller's code quality.
            //

            impInlineInfo->iciCall->gtCallMoreFlags |= GTF_CALL_M_DOES_NOT_RETURN;
        }

        compInlineResult->NoteBool(InlineObservation::CALLEE_DOES_NOT_RETURN, !hasReturnBlocks);

        if (compInlineResult->IsFailure())
        {
            return;
        }

        noway_assert(info.compXcptnsCount == 0);
        compHndBBtab = impInlineInfo->InlinerCompiler->compHndBBtab;
        compHndBBtabAllocCount =
            impInlineInfo->InlinerCompiler->compHndBBtabAllocCount; // we probably only use the table, not add to it.
        compHndBBtabCount    = impInlineInfo->InlinerCompiler->compHndBBtabCount;
        info.compXcptnsCount = impInlineInfo->InlinerCompiler->info.compXcptnsCount;

        if (info.compRetNativeType != TYP_VOID && hasMoreThanOneReturnBlock)
        {
            // The lifetime of this var might expand multiple BBs. So it is a long lifetime compiler temp.
            lvaInlineeReturnSpillTemp = lvaGrabTemp(false DEBUGARG("Inline candidate multiple BBJ_RETURN spill temp"));
            lvaTable[lvaInlineeReturnSpillTemp].lvType = info.compRetNativeType;
        }
        return;
    }

    /* Mark all blocks within 'try' blocks as such */

    if (info.compXcptnsCount == 0)
    {
        return;
    }

    if (info.compXcptnsCount > MAX_XCPTN_INDEX)
    {
        IMPL_LIMITATION("too many exception clauses");
    }

    /* Allocate the exception handler table */

    fgAllocEHTable();

    /* Assume we don't need to sort the EH table (such that nested try/catch
     * appear before their try or handler parent). The EH verifier will notice
     * when we do need to sort it.
     */

    fgNeedToSortEHTable = false;

    verInitEHTree(info.compXcptnsCount);
    EHNodeDsc* initRoot = ehnNext; // remember the original root since
                                   // it may get modified during insertion

    // Annotate BBs with exception handling information required for generating correct eh code
    // as well as checking for correct IL

    EHblkDsc* HBtab;

    for (XTnum = 0, HBtab = compHndBBtab; XTnum < compHndBBtabCount; XTnum++, HBtab++)
    {
        CORINFO_EH_CLAUSE clause;
        info.compCompHnd->getEHinfo(info.compMethodHnd, XTnum, &clause);
        noway_assert(clause.HandlerLength != (unsigned)-1); // @DEPRECATED

#ifdef DEBUG
        if (verbose)
        {
            dispIncomingEHClause(XTnum, clause);
        }
#endif // DEBUG

        IL_OFFSET tryBegOff    = clause.TryOffset;
        IL_OFFSET tryEndOff    = tryBegOff + clause.TryLength;
        IL_OFFSET filterBegOff = 0;
        IL_OFFSET hndBegOff    = clause.HandlerOffset;
        IL_OFFSET hndEndOff    = hndBegOff + clause.HandlerLength;

        if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
        {
            filterBegOff = clause.FilterOffset;
        }

        if (tryEndOff > info.compILCodeSize)
        {
            BADCODE3("end of try block beyond end of method for try", " at offset %04X", tryBegOff);
        }
        if (hndEndOff > info.compILCodeSize)
        {
            BADCODE3("end of hnd block beyond end of method for try", " at offset %04X", tryBegOff);
        }

        HBtab->ebdTryBegOffset    = tryBegOff;
        HBtab->ebdTryEndOffset    = tryEndOff;
        HBtab->ebdFilterBegOffset = filterBegOff;
        HBtab->ebdHndBegOffset    = hndBegOff;
        HBtab->ebdHndEndOffset    = hndEndOff;

        /* Convert the various addresses to basic blocks */

        BasicBlock* tryBegBB = fgLookupBB(tryBegOff);
        BasicBlock* tryEndBB =
            fgLookupBB(tryEndOff); // note: this can be NULL if the try region is at the end of the function
        BasicBlock* hndBegBB = fgLookupBB(hndBegOff);
        BasicBlock* hndEndBB = nullptr;
        BasicBlock* filtBB   = nullptr;
        BasicBlock* block;

        //
        // Assert that the try/hnd beginning blocks are set up correctly
        //
        if (tryBegBB == nullptr)
        {
            BADCODE("Try Clause is invalid");
        }

        if (hndBegBB == nullptr)
        {
            BADCODE("Handler Clause is invalid");
        }

        tryBegBB->bbFlags |= BBF_HAS_LABEL;
        hndBegBB->bbFlags |= BBF_HAS_LABEL | BBF_JMP_TARGET;

#if HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION
        // This will change the block weight from 0 to 1
        // and clear the rarely run flag
        hndBegBB->makeBlockHot();
#else
        hndBegBB->bbSetRunRarely();   // handler entry points are rarely executed
#endif

        if (hndEndOff < info.compILCodeSize)
        {
            hndEndBB = fgLookupBB(hndEndOff);
        }

        if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
        {
            filtBB = HBtab->ebdFilter = fgLookupBB(clause.FilterOffset);

            filtBB->bbCatchTyp = BBCT_FILTER;
            filtBB->bbFlags |= BBF_HAS_LABEL | BBF_JMP_TARGET;

            hndBegBB->bbCatchTyp = BBCT_FILTER_HANDLER;

#if HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION
            // This will change the block weight from 0 to 1
            // and clear the rarely run flag
            filtBB->makeBlockHot();
#else
            filtBB->bbSetRunRarely(); // filter entry points are rarely executed
#endif

            // Mark all BBs that belong to the filter with the XTnum of the corresponding handler
            for (block = filtBB; /**/; block = block->bbNext)
            {
                if (block == nullptr)
                {
                    BADCODE3("Missing endfilter for filter", " at offset %04X", filtBB->bbCodeOffs);
                    return;
                }

                // Still inside the filter
                block->setHndIndex(XTnum);

                if (block->bbJumpKind == BBJ_EHFILTERRET)
                {
                    // Mark catch handler as successor.
                    block->bbJumpDest = hndBegBB;
                    assert(block->bbJumpDest->bbCatchTyp == BBCT_FILTER_HANDLER);
                    break;
                }
            }

            if (!block->bbNext || block->bbNext != hndBegBB)
            {
                BADCODE3("Filter does not immediately precede handler for filter", " at offset %04X",
                         filtBB->bbCodeOffs);
            }
        }
        else
        {
            HBtab->ebdTyp = clause.ClassToken;

            /* Set bbCatchTyp as appropriate */

            if (clause.Flags & CORINFO_EH_CLAUSE_FINALLY)
            {
                hndBegBB->bbCatchTyp = BBCT_FINALLY;
            }
            else
            {
                if (clause.Flags & CORINFO_EH_CLAUSE_FAULT)
                {
                    hndBegBB->bbCatchTyp = BBCT_FAULT;
                }
                else
                {
                    hndBegBB->bbCatchTyp = clause.ClassToken;

                    // These values should be non-zero value that will
                    // not collide with real tokens for bbCatchTyp
                    if (clause.ClassToken == 0)
                    {
                        BADCODE("Exception catch type is Null");
                    }

                    noway_assert(clause.ClassToken != BBCT_FAULT);
                    noway_assert(clause.ClassToken != BBCT_FINALLY);
                    noway_assert(clause.ClassToken != BBCT_FILTER);
                    noway_assert(clause.ClassToken != BBCT_FILTER_HANDLER);
                }
            }
        }

        /* Mark the initial block and last blocks in the 'try' region */

        tryBegBB->bbFlags |= BBF_TRY_BEG | BBF_HAS_LABEL;

        /*  Prevent future optimizations of removing the first block   */
        /*  of a TRY block and the first block of an exception handler */

        tryBegBB->bbFlags |= BBF_DONT_REMOVE;
        hndBegBB->bbFlags |= BBF_DONT_REMOVE;
        hndBegBB->bbRefs++; // The first block of a handler gets an extra, "artificial" reference count.

        if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
        {
            filtBB->bbFlags |= BBF_DONT_REMOVE;
            filtBB->bbRefs++; // The first block of a filter gets an extra, "artificial" reference count.
        }

        tryBegBB->bbFlags |= BBF_DONT_REMOVE;
        hndBegBB->bbFlags |= BBF_DONT_REMOVE;

        //
        // Store the info to the table of EH block handlers
        //

        HBtab->ebdHandlerType = ToEHHandlerType(clause.Flags);

        HBtab->ebdTryBeg  = tryBegBB;
        HBtab->ebdTryLast = (tryEndBB == nullptr) ? fgLastBB : tryEndBB->bbPrev;

        HBtab->ebdHndBeg  = hndBegBB;
        HBtab->ebdHndLast = (hndEndBB == nullptr) ? fgLastBB : hndEndBB->bbPrev;

        //
        // Assert that all of our try/hnd blocks are setup correctly.
        //
        if (HBtab->ebdTryLast == nullptr)
        {
            BADCODE("Try Clause is invalid");
        }

        if (HBtab->ebdHndLast == nullptr)
        {
            BADCODE("Handler Clause is invalid");
        }

        //
        // Verify that it's legal
        //

        verInsertEhNode(&clause, HBtab);

    } // end foreach handler table entry

    fgSortEHTable();

    // Next, set things related to nesting that depend on the sorting being complete.

    for (XTnum = 0, HBtab = compHndBBtab; XTnum < compHndBBtabCount; XTnum++, HBtab++)
    {
        /* Mark all blocks in the finally/fault or catch clause */

        BasicBlock* tryBegBB = HBtab->ebdTryBeg;
        BasicBlock* hndBegBB = HBtab->ebdHndBeg;

        IL_OFFSET tryBegOff = HBtab->ebdTryBegOffset;
        IL_OFFSET tryEndOff = HBtab->ebdTryEndOffset;

        IL_OFFSET hndBegOff = HBtab->ebdHndBegOffset;
        IL_OFFSET hndEndOff = HBtab->ebdHndEndOffset;

        BasicBlock* block;

        for (block = hndBegBB; block && (block->bbCodeOffs < hndEndOff); block = block->bbNext)
        {
            if (!block->hasHndIndex())
            {
                block->setHndIndex(XTnum);
            }

            // All blocks in a catch handler or filter are rarely run, except the entry
            if ((block != hndBegBB) && (hndBegBB->bbCatchTyp != BBCT_FINALLY))
            {
                block->bbSetRunRarely();
            }
        }

        /* Mark all blocks within the covered range of the try */

        for (block = tryBegBB; block && (block->bbCodeOffs < tryEndOff); block = block->bbNext)
        {
            /* Mark this BB as belonging to a 'try' block */

            if (!block->hasTryIndex())
            {
                block->setTryIndex(XTnum);
            }

#ifdef DEBUG
            /* Note: the BB can't span the 'try' block */

            if (!(block->bbFlags & BBF_INTERNAL))
            {
                noway_assert(tryBegOff <= block->bbCodeOffs);
                noway_assert(tryEndOff >= block->bbCodeOffsEnd || tryEndOff == tryBegOff);
            }
#endif
        }

/*  Init ebdHandlerNestingLevel of current clause, and bump up value for all
 *  enclosed clauses (which have to be before it in the table).
 *  Innermost try-finally blocks must precede outermost
 *  try-finally blocks.
 */

#if !FEATURE_EH_FUNCLETS
        HBtab->ebdHandlerNestingLevel = 0;
#endif // !FEATURE_EH_FUNCLETS

        HBtab->ebdEnclosingTryIndex = EHblkDsc::NO_ENCLOSING_INDEX;
        HBtab->ebdEnclosingHndIndex = EHblkDsc::NO_ENCLOSING_INDEX;

        noway_assert(XTnum < compHndBBtabCount);
        noway_assert(XTnum == ehGetIndex(HBtab));

        for (EHblkDsc* xtab = compHndBBtab; xtab < HBtab; xtab++)
        {
#if !FEATURE_EH_FUNCLETS
            if (jitIsBetween(xtab->ebdHndBegOffs(), hndBegOff, hndEndOff))
            {
                xtab->ebdHandlerNestingLevel++;
            }
#endif // !FEATURE_EH_FUNCLETS

            /* If we haven't recorded an enclosing try index for xtab then see
             *  if this EH region should be recorded.  We check if the
             *  first offset in the xtab lies within our region.  If so,
             *  the last offset also must lie within the region, due to
             *  nesting rules. verInsertEhNode(), below, will check for proper nesting.
             */
            if (xtab->ebdEnclosingTryIndex == EHblkDsc::NO_ENCLOSING_INDEX)
            {
                bool begBetween = jitIsBetween(xtab->ebdTryBegOffs(), tryBegOff, tryEndOff);
                if (begBetween)
                {
                    // Record the enclosing scope link
                    xtab->ebdEnclosingTryIndex = (unsigned short)XTnum;
                }
            }

            /* Do the same for the enclosing handler index.
             */
            if (xtab->ebdEnclosingHndIndex == EHblkDsc::NO_ENCLOSING_INDEX)
            {
                bool begBetween = jitIsBetween(xtab->ebdTryBegOffs(), hndBegOff, hndEndOff);
                if (begBetween)
                {
                    // Record the enclosing scope link
                    xtab->ebdEnclosingHndIndex = (unsigned short)XTnum;
                }
            }
        }

    } // end foreach handler table entry

#if !FEATURE_EH_FUNCLETS

    EHblkDsc* HBtabEnd;
    for (HBtab = compHndBBtab, HBtabEnd = compHndBBtab + compHndBBtabCount; HBtab < HBtabEnd; HBtab++)
    {
        if (ehMaxHndNestingCount <= HBtab->ebdHandlerNestingLevel)
            ehMaxHndNestingCount = HBtab->ebdHandlerNestingLevel + 1;
    }

#endif // !FEATURE_EH_FUNCLETS

#ifndef DEBUG
    if (tiVerificationNeeded)
#endif
    {
        // always run these checks for a debug build
        verCheckNestingLevel(initRoot);
    }

#ifndef DEBUG
    // fgNormalizeEH assumes that this test has been passed.  And Ssa assumes that fgNormalizeEHTable
    // has been run.  So do this unless we're in minOpts mode (and always in debug).
    if (tiVerificationNeeded || !opts.MinOpts())
#endif
    {
        fgCheckBasicBlockControlFlow();
    }

#ifdef DEBUG
    if (verbose)
    {
        JITDUMP("*************** After fgFindBasicBlocks() has created the EH table\n");
        fgDispHandlerTab();
    }

    // We can't verify the handler table until all the IL legality checks have been done (above), since bad IL
    // (such as illegal nesting of regions) will trigger asserts here.
    fgVerifyHandlerTab();
#endif

    fgNormalizeEH();
}

fgFindBasicBlocks首先建立了一個byte數組, 長度跟IL長度同樣(也就是一個IL偏移值會對應一個byte),
而後調用fgFindJumpTargets查找跳轉目標, 以這段IL爲例:

IL_0000  00                nop         
IL_0001  16                ldc.i4.0    
IL_0002  0a                stloc.0     
IL_0003  2b 0d             br.s         13 (IL_0012)
IL_0005  00                nop         
IL_0006  06                ldloc.0     
IL_0007  28 0c 00 00 0a    call         0xA00000C
IL_000c  00                nop         
IL_000d  00                nop         
IL_000e  06                ldloc.0     
IL_000f  17                ldc.i4.1    
IL_0010  58                add         
IL_0011  0a                stloc.0     
IL_0012  06                ldloc.0     
IL_0013  19                ldc.i4.3    
IL_0014  fe 04             clt         
IL_0016  0b                stloc.1     
IL_0017  07                ldloc.1     
IL_0018  2d eb             brtrue.s     -21 (IL_0005)
IL_001a  2a                ret

這段IL能夠找到兩個跳轉目標:

Jump targets:
  IL_0005
  IL_0012

而後fgFindBasicBlocks會根據函數的例外信息找到更多的跳轉目標, 例如try的開始和catch的開始都會被視爲跳轉目標.
注意fgFindJumpTargets在解析IL的後會判斷是否值得內聯, 內聯相關的處理將在下面說明.

以後調用fgMakeBasicBlocks建立BasicBlock, fgMakeBasicBlocks在遇到跳轉指令或者跳轉目標時會開始一個新的block.
調用fgMakeBasicBlocks後, compiler中就有了BasicBlock的鏈表(從fgFirstBB開始), 每一個節點對應IL中的一段範圍.

在建立完BasicBlock後還會根據例外信息建立一個例外信息表compHndBBtab(也稱EH表), 長度是compHndBBtabCount.
表中每條記錄都有try開始的block, handler(catch, finally, fault)開始的block, 和外層的try序號(若是try嵌套了).

以下圖所示:

JIT主函數

compCompileHelper把BasicBlock劃分好之後, 就會調用3參數版的Compiler::compCompile, 這個函數就是JIT的主函數.

Compiler::compCompile的源代碼以下:

//*********************************************************************************************
// #Phases
//
// This is the most interesting 'toplevel' function in the JIT.  It goes through the operations of
// importing, morphing, optimizations and code generation.  This is called from the EE through the
// code:CILJit::compileMethod function.
//
// For an overview of the structure of the JIT, see:
//   https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md
//
void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, CORJIT_FLAGS* compileFlags)
{
    if (compIsForInlining())
    {
        // Notify root instance that an inline attempt is about to import IL
        impInlineRoot()->m_inlineStrategy->NoteImport();
    }

    hashBv::Init(this);

    VarSetOps::AssignAllowUninitRhs(this, compCurLife, VarSetOps::UninitVal());

    /* The temp holding the secret stub argument is used by fgImport() when importing the intrinsic. */

    if (info.compPublishStubParam)
    {
        assert(lvaStubArgumentVar == BAD_VAR_NUM);
        lvaStubArgumentVar                  = lvaGrabTempWithImplicitUse(false DEBUGARG("stub argument"));
        lvaTable[lvaStubArgumentVar].lvType = TYP_I_IMPL;
    }

    EndPhase(PHASE_PRE_IMPORT);

    compFunctionTraceStart();

    /* Convert the instrs in each basic block to a tree based intermediate representation */

    fgImport();

    assert(!fgComputePredsDone);
    if (fgCheapPredsValid)
    {
        // Remove cheap predecessors before inlining; allowing the cheap predecessor lists to be inserted
        // with inlined blocks causes problems.
        fgRemovePreds();
    }

    if (compIsForInlining())
    {
        /* Quit inlining if fgImport() failed for any reason. */

        if (compDonotInline())
        {
            return;
        }

        /* Filter out unimported BBs */

        fgRemoveEmptyBlocks();

        return;
    }

    assert(!compDonotInline());

    EndPhase(PHASE_IMPORTATION);

    // Maybe the caller was not interested in generating code
    if (compIsForImportOnly())
    {
        compFunctionTraceEnd(nullptr, 0, false);
        return;
    }

#if !FEATURE_EH
    // If we aren't yet supporting EH in a compiler bring-up, remove as many EH handlers as possible, so
    // we can pass tests that contain try/catch EH, but don't actually throw any exceptions.
    fgRemoveEH();
#endif // !FEATURE_EH

    if (compileFlags->corJitFlags & CORJIT_FLG_BBINSTR)
    {
        fgInstrumentMethod();
    }

    // We could allow ESP frames. Just need to reserve space for
    // pushing EBP if the method becomes an EBP-frame after an edit.
    // Note that requiring a EBP Frame disallows double alignment.  Thus if we change this
    // we either have to disallow double alignment for E&C some other way or handle it in EETwain.

    if (opts.compDbgEnC)
    {
        codeGen->setFramePointerRequired(true);

        // Since we need a slots for security near ebp, its not possible
        // to do this after an Edit without shifting all the locals.
        // So we just always reserve space for these slots in case an Edit adds them
        opts.compNeedSecurityCheck = true;

        // We don't care about localloc right now. If we do support it,
        // EECodeManager::FixContextForEnC() needs to handle it smartly
        // in case the localloc was actually executed.
        //
        // compLocallocUsed            = true;
    }

    EndPhase(PHASE_POST_IMPORT);

    /* Initialize the BlockSet epoch */

    NewBasicBlockEpoch();

    /* Massage the trees so that we can generate code out of them */

    fgMorph();
    EndPhase(PHASE_MORPH);

    /* GS security checks for unsafe buffers */
    if (getNeedsGSSecurityCookie())
    {
#ifdef DEBUG
        if (verbose)
        {
            printf("\n*************** -GS checks for unsafe buffers \n");
        }
#endif

        gsGSChecksInitCookie();

        if (compGSReorderStackLayout)
        {
            gsCopyShadowParams();
        }

#ifdef DEBUG
        if (verbose)
        {
            fgDispBasicBlocks(true);
            printf("\n");
        }
#endif
    }
    EndPhase(PHASE_GS_COOKIE);

    /* Compute bbNum, bbRefs and bbPreds */

    JITDUMP("\nRenumbering the basic blocks for fgComputePred\n");
    fgRenumberBlocks();

    noway_assert(!fgComputePredsDone); // This is the first time full (not cheap) preds will be computed.
    fgComputePreds();
    EndPhase(PHASE_COMPUTE_PREDS);

    /* If we need to emit GC Poll calls, mark the blocks that need them now.  This is conservative and can
     * be optimized later. */
    fgMarkGCPollBlocks();
    EndPhase(PHASE_MARK_GC_POLL_BLOCKS);

    /* From this point on the flowgraph information such as bbNum,
     * bbRefs or bbPreds has to be kept updated */

    // Compute the edge weights (if we have profile data)
    fgComputeEdgeWeights();
    EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS);

#if FEATURE_EH_FUNCLETS

    /* Create funclets from the EH handlers. */

    fgCreateFunclets();
    EndPhase(PHASE_CREATE_FUNCLETS);

#endif // FEATURE_EH_FUNCLETS

    if (!opts.MinOpts() && !opts.compDbgCode)
    {
        optOptimizeLayout();
        EndPhase(PHASE_OPTIMIZE_LAYOUT);

        // Compute reachability sets and dominators.
        fgComputeReachability();
    }

    // Transform each GT_ALLOCOBJ node into either an allocation helper call or
    // local variable allocation on the stack.
    ObjectAllocator objectAllocator(this);
    objectAllocator.Run();

    if (!opts.MinOpts() && !opts.compDbgCode)
    {
        /*  Perform loop inversion (i.e. transform "while" loops into
            "repeat" loops) and discover and classify natural loops
            (e.g. mark iterative loops as such). Also marks loop blocks
            and sets bbWeight to the loop nesting levels
        */

        optOptimizeLoops();
        EndPhase(PHASE_OPTIMIZE_LOOPS);

        // Clone loops with optimization opportunities, and
        // choose the one based on dynamic condition evaluation.
        optCloneLoops();
        EndPhase(PHASE_CLONE_LOOPS);

        /* Unroll loops */
        optUnrollLoops();
        EndPhase(PHASE_UNROLL_LOOPS);
    }

#ifdef DEBUG
    fgDebugCheckLinks();
#endif

    /* Create the variable table (and compute variable ref counts) */

    lvaMarkLocalVars();
    EndPhase(PHASE_MARK_LOCAL_VARS);

    // IMPORTANT, after this point, every place where trees are modified or cloned
    // the local variable reference counts must be updated
    // You can test the value of the following variable to see if
    // the local variable ref counts must be updated
    //
    assert(lvaLocalVarRefCounted == true);

    if (!opts.MinOpts() && !opts.compDbgCode)
    {
        /* Optimize boolean conditions */

        optOptimizeBools();
        EndPhase(PHASE_OPTIMIZE_BOOLS);

        // optOptimizeBools() might have changed the number of blocks; the dominators/reachability might be bad.
    }

    /* Figure out the order in which operators are to be evaluated */
    fgFindOperOrder();
    EndPhase(PHASE_FIND_OPER_ORDER);

    // Weave the tree lists. Anyone who modifies the tree shapes after
    // this point is responsible for calling fgSetStmtSeq() to keep the
    // nodes properly linked.
    // This can create GC poll calls, and create new BasicBlocks (without updating dominators/reachability).
    fgSetBlockOrder();
    EndPhase(PHASE_SET_BLOCK_ORDER);

    // IMPORTANT, after this point, every place where tree topology changes must redo evaluation
    // order (gtSetStmtInfo) and relink nodes (fgSetStmtSeq) if required.
    CLANG_FORMAT_COMMENT_ANCHOR;

#ifdef DEBUG
    // Now  we have determined the order of evaluation and the gtCosts for every node.
    // If verbose, dump the full set of trees here before the optimization phases mutate them
    //
    if (verbose)
    {
        fgDispBasicBlocks(true); // 'true' will call fgDumpTrees() after dumping the BasicBlocks
        printf("\n");
    }
#endif

    // At this point we know if we are fully interruptible or not
    if (!opts.MinOpts() && !opts.compDbgCode)
    {
        bool doSsa           = true;
        bool doEarlyProp     = true;
        bool doValueNum      = true;
        bool doLoopHoisting  = true;
        bool doCopyProp      = true;
        bool doAssertionProp = true;
        bool doRangeAnalysis = true;

#ifdef DEBUG
        doSsa           = (JitConfig.JitDoSsa() != 0);
        doEarlyProp     = doSsa && (JitConfig.JitDoEarlyProp() != 0);
        doValueNum      = doSsa && (JitConfig.JitDoValueNumber() != 0);
        doLoopHoisting  = doValueNum && (JitConfig.JitDoLoopHoisting() != 0);
        doCopyProp      = doValueNum && (JitConfig.JitDoCopyProp() != 0);
        doAssertionProp = doValueNum && (JitConfig.JitDoAssertionProp() != 0);
        doRangeAnalysis = doAssertionProp && (JitConfig.JitDoRangeAnalysis() != 0);
#endif

        if (doSsa)
        {
            fgSsaBuild();
            EndPhase(PHASE_BUILD_SSA);
        }

        if (doEarlyProp)
        {
            /* Propagate array length and rewrite getType() method call */
            optEarlyProp();
            EndPhase(PHASE_EARLY_PROP);
        }

        if (doValueNum)
        {
            fgValueNumber();
            EndPhase(PHASE_VALUE_NUMBER);
        }

        if (doLoopHoisting)
        {
            /* Hoist invariant code out of loops */
            optHoistLoopCode();
            EndPhase(PHASE_HOIST_LOOP_CODE);
        }

        if (doCopyProp)
        {
            /* Perform VN based copy propagation */
            optVnCopyProp();
            EndPhase(PHASE_VN_COPY_PROP);
        }

#if FEATURE_ANYCSE
        /* Remove common sub-expressions */
        optOptimizeCSEs();
#endif // FEATURE_ANYCSE

#if ASSERTION_PROP
        if (doAssertionProp)
        {
            /* Assertion propagation */
            optAssertionPropMain();
            EndPhase(PHASE_ASSERTION_PROP_MAIN);
        }

        if (doRangeAnalysis)
        {
            /* Optimize array index range checks */
            RangeCheck rc(this);
            rc.OptimizeRangeChecks();
            EndPhase(PHASE_OPTIMIZE_INDEX_CHECKS);
        }
#endif // ASSERTION_PROP

        /* update the flowgraph if we modified it during the optimization phase*/
        if (fgModified)
        {
            fgUpdateFlowGraph();
            EndPhase(PHASE_UPDATE_FLOW_GRAPH);

            // Recompute the edge weight if we have modified the flow graph
            fgComputeEdgeWeights();
            EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS2);
        }
    }

#ifdef _TARGET_AMD64_
    //  Check if we need to add the Quirk for the PPP backward compat issue
    compQuirkForPPPflag = compQuirkForPPP();
#endif

    fgDetermineFirstColdBlock();
    EndPhase(PHASE_DETERMINE_FIRST_COLD_BLOCK);

#ifdef DEBUG
    fgDebugCheckLinks(compStressCompile(STRESS_REMORPH_TREES, 50));

    // Stash the current estimate of the function's size if necessary.
    if (verbose)
    {
        compSizeEstimate  = 0;
        compCycleEstimate = 0;
        for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
        {
            for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->getNextStmt())
            {
                compSizeEstimate += stmt->GetCostSz();
                compCycleEstimate += stmt->GetCostEx();
            }
        }
    }
#endif

#ifndef LEGACY_BACKEND
    // rationalize trees
    Rationalizer rat(this); // PHASE_RATIONALIZE
    rat.Run();
#endif // !LEGACY_BACKEND

    // Here we do "simple lowering".  When the RyuJIT backend works for all
    // platforms, this will be part of the more general lowering phase.  For now, though, we do a separate
    // pass of "final lowering."  We must do this before (final) liveness analysis, because this creates
    // range check throw blocks, in which the liveness must be correct.
    fgSimpleLowering();
    EndPhase(PHASE_SIMPLE_LOWERING);

#ifdef LEGACY_BACKEND
    /* Local variable liveness */
    fgLocalVarLiveness();
    EndPhase(PHASE_LCLVARLIVENESS);
#endif // !LEGACY_BACKEND

#ifdef DEBUG
    fgDebugCheckBBlist();
    fgDebugCheckLinks();
#endif

    /* Enable this to gather statistical data such as
     * call and register argument info, flowgraph and loop info, etc. */

    compJitStats();

#ifdef _TARGET_ARM_
    if (compLocallocUsed)
    {
        // We reserve REG_SAVED_LOCALLOC_SP to store SP on entry for stack unwinding
        codeGen->regSet.rsMaskResvd |= RBM_SAVED_LOCALLOC_SP;
    }
#endif // _TARGET_ARM_
#ifdef _TARGET_ARMARCH_
    if (compRsvdRegCheck(PRE_REGALLOC_FRAME_LAYOUT))
    {
        // We reserve R10/IP1 in this case to hold the offsets in load/store instructions
        codeGen->regSet.rsMaskResvd |= RBM_OPT_RSVD;
        assert(REG_OPT_RSVD != REG_FP);
    }

#ifdef DEBUG
    //
    // Display the pre-regalloc frame offsets that we have tentatively decided upon
    //
    if (verbose)
        lvaTableDump();
#endif
#endif // _TARGET_ARMARCH_

    /* Assign registers to variables, etc. */
    CLANG_FORMAT_COMMENT_ANCHOR;

#ifndef LEGACY_BACKEND
    ///////////////////////////////////////////////////////////////////////////////
    // Dominator and reachability sets are no longer valid. They haven't been
    // maintained up to here, and shouldn't be used (unless recomputed).
    ///////////////////////////////////////////////////////////////////////////////
    fgDomsComputed = false;

    /* Create LSRA before Lowering, this way Lowering can initialize the TreeNode Map */
    m_pLinearScan = getLinearScanAllocator(this);

    /* Lower */
    Lowering lower(this, m_pLinearScan); // PHASE_LOWERING
    lower.Run();

    assert(lvaSortAgain == false); // We should have re-run fgLocalVarLiveness() in lower.Run()
    lvaTrackedFixed = true;        // We can not add any new tracked variables after this point.

    /* Now that lowering is completed we can proceed to perform register allocation */
    m_pLinearScan->doLinearScan();
    EndPhase(PHASE_LINEAR_SCAN);

    // Copied from rpPredictRegUse()
    genFullPtrRegMap = (codeGen->genInterruptible || !codeGen->isFramePointerUsed());
#else  // LEGACY_BACKEND

    lvaTrackedFixed = true; // We cannot add any new tracked variables after this point.
    // For the classic JIT32 at this point lvaSortAgain can be set and raAssignVars() will call lvaSortOnly()

    // Now do "classic" register allocation.
    raAssignVars();
    EndPhase(PHASE_RA_ASSIGN_VARS);
#endif // LEGACY_BACKEND

#ifdef DEBUG
    fgDebugCheckLinks();
#endif

    /* Generate code */

    codeGen->genGenerateCode(methodCodePtr, methodCodeSize);

#ifdef FEATURE_JIT_METHOD_PERF
    if (pCompJitTimer)
        pCompJitTimer->Terminate(this, CompTimeSummaryInfo::s_compTimeSummary);
#endif

    RecordStateAtEndOfCompilation();

#ifdef FEATURE_TRACELOGGING
    compJitTelemetry.NotifyEndOfCompilation();
#endif

#if defined(DEBUG)
    ++Compiler::jitTotalMethodCompiled;
#endif // defined(DEBUG)

    compFunctionTraceEnd(*methodCodePtr, *methodCodeSize, false);

#if FUNC_INFO_LOGGING
    if (compJitFuncInfoFile != nullptr)
    {
        assert(!compIsForInlining());
#ifdef DEBUG // We only have access to info.compFullName in DEBUG builds.
        fprintf(compJitFuncInfoFile, "%s\n", info.compFullName);
#elif FEATURE_SIMD
        fprintf(compJitFuncInfoFile, " %s\n", eeGetMethodFullName(info.compMethodHnd));
#endif
        fprintf(compJitFuncInfoFile, ""); // in our logic this causes a flush
    }
#endif // FUNC_INFO_LOGGING
}

JIT主函數中包含了對各個階段的調用, 例如EndPhase(PHASE_PRE_IMPORT)表示這個階段的結束.

這裏的階段比微軟列出的階段要多出來一些:

接下來咱們逐個分析這些階段.

PHASE_PRE_IMPORT

這個階段負責從IL導入HIR(GenTree)前的一些工做, 包含如下的代碼:

if (compIsForInlining())
{
    // Notify root instance that an inline attempt is about to import IL
    impInlineRoot()->m_inlineStrategy->NoteImport();
}

hashBv::Init(this);

VarSetOps::AssignAllowUninitRhs(this, compCurLife, VarSetOps::UninitVal());

/* The temp holding the secret stub argument is used by fgImport() when importing the intrinsic. */

if (info.compPublishStubParam)
{
    assert(lvaStubArgumentVar == BAD_VAR_NUM);
    lvaStubArgumentVar                  = lvaGrabTempWithImplicitUse(false DEBUGARG("stub argument"));
    lvaTable[lvaStubArgumentVar].lvType = TYP_I_IMPL;
}

EndPhase(PHASE_PRE_IMPORT);

執行了import前的一些初始化工做,
hashBv::Init爲Compiler建立一個bitvector的分配器(allocator),
VarSetOps::AssignAllowUninitRhs設置compCurLife的值爲未初始化(這個變量會用於保存當前活動的本地變量集合),
compPublishStubParam選項開啓時會添加一個額外的本地變量(這個變量會保存函數進入時的rax值).

PHASE_IMPORTATION

這個階段負責從IL導入HIR(GenTree), 包含如下的代碼:

compFunctionTraceStart();

/* Convert the instrs in each basic block to a tree based intermediate representation */

fgImport();

assert(!fgComputePredsDone);
if (fgCheapPredsValid)
{
    // Remove cheap predecessors before inlining; allowing the cheap predecessor lists to be inserted
    // with inlined blocks causes problems.
    fgRemovePreds();
}

if (compIsForInlining())
{
    /* Quit inlining if fgImport() failed for any reason. */

    if (compDonotInline())
    {
        return;
    }

    /* Filter out unimported BBs */

    fgRemoveEmptyBlocks();

    return;
}

assert(!compDonotInline());

EndPhase(PHASE_IMPORTATION);

compFunctionTraceStart會打印一些除錯信息.

fgImport會解析IL並添加GenTree節點, 由於此前已經建立了BasicBlock, 根據IL建立的GenTree會分別添加到對應的BasicBlock中.
BasicBlock + GenTree就是咱們一般說的IR, IR有兩種形式, 樹形式的叫HIR(用於JIT前端), 列表形式的叫LIR(用於JIT後端), 這裏構建的是HIR.

fgImport的源代碼以下:

void Compiler::fgImport()
{
    fgHasPostfix = false;

    impImport(fgFirstBB);

    if (!(opts.eeFlags & CORJIT_FLG_SKIP_VERIFICATION))
    {
        CorInfoMethodRuntimeFlags verFlag;
        verFlag = tiIsVerifiableCode ? CORINFO_FLG_VERIFIABLE : CORINFO_FLG_UNVERIFIABLE;
        info.compCompHnd->setMethodAttribs(info.compMethodHnd, verFlag);
    }
}

對第一個BasicBlock調用了impImport.

impImport的源代碼以下:

/*****************************************************************************
 *
 *  Convert the instrs ("import") into our internal format (trees). The
 *  basic flowgraph has already been constructed and is passed in.
 */

void Compiler::impImport(BasicBlock* method)
{
#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In impImport() for %s\n", info.compFullName);
    }
#endif

    /* Allocate the stack contents */

    if (info.compMaxStack <= sizeof(impSmallStack) / sizeof(impSmallStack[0]))
    {
        /* Use local variable, don't waste time allocating on the heap */

        impStkSize              = sizeof(impSmallStack) / sizeof(impSmallStack[0]);
        verCurrentState.esStack = impSmallStack;
    }
    else
    {
        impStkSize              = info.compMaxStack;
        verCurrentState.esStack = new (this, CMK_ImpStack) StackEntry[impStkSize];
    }

    // initialize the entry state at start of method
    verInitCurrentState();

    // Initialize stuff related to figuring "spill cliques" (see spec comment for impGetSpillTmpBase).
    Compiler* inlineRoot = impInlineRoot();
    if (this == inlineRoot) // These are only used on the root of the inlining tree.
    {
        // We have initialized these previously, but to size 0.  Make them larger.
        impPendingBlockMembers.Init(getAllocator(), fgBBNumMax * 2);
        impSpillCliquePredMembers.Init(getAllocator(), fgBBNumMax * 2);
        impSpillCliqueSuccMembers.Init(getAllocator(), fgBBNumMax * 2);
    }
    inlineRoot->impPendingBlockMembers.Reset(fgBBNumMax * 2);
    inlineRoot->impSpillCliquePredMembers.Reset(fgBBNumMax * 2);
    inlineRoot->impSpillCliqueSuccMembers.Reset(fgBBNumMax * 2);
    impBlockListNodeFreeList = nullptr;

#ifdef DEBUG
    impLastILoffsStmt   = nullptr;
    impNestedStackSpill = false;
#endif
    impBoxTemp = BAD_VAR_NUM;

    impPendingList = impPendingFree = nullptr;

    /* Add the entry-point to the worker-list */

    // Skip leading internal blocks. There can be one as a leading scratch BB, and more
    // from EH normalization.
    // NOTE: It might be possible to always just put fgFirstBB on the pending list, and let everything else just fall
    // out.
    for (; method->bbFlags & BBF_INTERNAL; method = method->bbNext)
    {
        // Treat these as imported.
        assert(method->bbJumpKind == BBJ_NONE); // We assume all the leading ones are fallthrough.
        JITDUMP("Marking leading BBF_INTERNAL block BB%02u as BBF_IMPORTED\n", method->bbNum);
        method->bbFlags |= BBF_IMPORTED;
    }

    impImportBlockPending(method);

    /* Import blocks in the worker-list until there are no more */

    while (impPendingList)
    {
        /* Remove the entry at the front of the list */

        PendingDsc* dsc = impPendingList;
        impPendingList  = impPendingList->pdNext;
        impSetPendingBlockMember(dsc->pdBB, 0);

        /* Restore the stack state */

        verCurrentState.thisInitialized = dsc->pdThisPtrInit;
        verCurrentState.esStackDepth    = dsc->pdSavedStack.ssDepth;
        if (verCurrentState.esStackDepth)
        {
            impRestoreStackState(&dsc->pdSavedStack);
        }

        /* Add the entry to the free list for reuse */

        dsc->pdNext    = impPendingFree;
        impPendingFree = dsc;

        /* Now import the block */

        if (dsc->pdBB->bbFlags & BBF_FAILED_VERIFICATION)
        {

#ifdef _TARGET_64BIT_
            // On AMD64, during verification we have to match JIT64 behavior since the VM is very tighly
            // coupled with the JIT64 IL Verification logic.  Look inside verHandleVerificationFailure
            // method for further explanation on why we raise this exception instead of making the jitted
            // code throw the verification exception during execution.
            if (tiVerificationNeeded && (opts.eeFlags & CORJIT_FLG_IMPORT_ONLY) != 0)
            {
                BADCODE("Basic block marked as not verifiable");
            }
            else
#endif // _TARGET_64BIT_
            {
                verConvertBBToThrowVerificationException(dsc->pdBB DEBUGARG(true));
                impEndTreeList(dsc->pdBB);
            }
        }
        else
        {
            impImportBlock(dsc->pdBB);

            if (compDonotInline())
            {
                return;
            }
            if (compIsForImportOnly() && !tiVerificationNeeded)
            {
                return;
            }
        }
    }

#ifdef DEBUG
    if (verbose && info.compXcptnsCount)
    {
        printf("\nAfter impImport() added block for try,catch,finally");
        fgDispBasicBlocks();
        printf("\n");
    }

    // Used in impImportBlockPending() for STRESS_CHK_REIMPORT
    for (BasicBlock* block = fgFirstBB; block; block = block->bbNext)
    {
        block->bbFlags &= ~BBF_VISITED;
    }
#endif

    assert(!compIsForInlining() || !tiVerificationNeeded);
}

首先初始化運行堆棧(execution stack)verCurrentState.esStack, maxstack小於16時使用SmallStack, 不然new.
而後初始化記錄"Spill Cliques"(Spill Temps的羣體, 用於保存從運行堆棧spill出來的值的臨時變量)所需的成員.
以後標記內部添加的(BBF_INTERNAL)BasicBlock爲已導入(BBF_IMPORTED), 由於這些block並沒有對應的IL範圍.
接下來會添加第一個非內部的BasicBlock到隊列impPendingList, 而後一直處理這個隊列直到它爲空.
處理隊列中的BasicBlock會調用函數impImportBlock(dsc->pdBB).

impImportBlock的源代碼以下:

//***************************************************************
// Import the instructions for the given basic block.  Perform
// verification, throwing an exception on failure.  Push any successor blocks that are enabled for the first
// time, or whose verification pre-state is changed.

#ifdef _PREFAST_
#pragma warning(push)
#pragma warning(disable : 21000) // Suppress PREFast warning about overly large function
#endif
void Compiler::impImportBlock(BasicBlock* block)
{
    // BBF_INTERNAL blocks only exist during importation due to EH canonicalization. We need to
    // handle them specially. In particular, there is no IL to import for them, but we do need
    // to mark them as imported and put their successors on the pending import list.
    if (block->bbFlags & BBF_INTERNAL)
    {
        JITDUMP("Marking BBF_INTERNAL block BB%02u as BBF_IMPORTED\n", block->bbNum);
        block->bbFlags |= BBF_IMPORTED;

        for (unsigned i = 0; i < block->NumSucc(); i++)
        {
            impImportBlockPending(block->GetSucc(i));
        }

        return;
    }

    bool markImport;

    assert(block);

    /* Make the block globaly available */

    compCurBB = block;

#ifdef DEBUG
    /* Initialize the debug variables */
    impCurOpcName = "unknown";
    impCurOpcOffs = block->bbCodeOffs;
#endif

    /* Set the current stack state to the merged result */
    verResetCurrentState(block, &verCurrentState);

    /* Now walk the code and import the IL into GenTrees */

    struct FilterVerificationExceptionsParam
    {
        Compiler*   pThis;
        BasicBlock* block;
    };
    FilterVerificationExceptionsParam param;

    param.pThis = this;
    param.block = block;

    PAL_TRY(FilterVerificationExceptionsParam*, pParam, &param)
    {
        /* @VERIFICATION : For now, the only state propagation from try
           to it's handler is "thisInit" state (stack is empty at start of try).
           In general, for state that we track in verification, we need to
           model the possibility that an exception might happen at any IL
           instruction, so we really need to merge all states that obtain
           between IL instructions in a try block into the start states of
           all handlers.

           However we do not allow the 'this' pointer to be uninitialized when
           entering most kinds try regions (only try/fault are allowed to have
           an uninitialized this pointer on entry to the try)

           Fortunately, the stack is thrown away when an exception
           leads to a handler, so we don't have to worry about that.
           We DO, however, have to worry about the "thisInit" state.
           But only for the try/fault case.

           The only allowed transition is from TIS_Uninit to TIS_Init.

           So for a try/fault region for the fault handler block
           we will merge the start state of the try begin
           and the post-state of each block that is part of this try region
        */

        // merge the start state of the try begin
        //
        if (pParam->block->bbFlags & BBF_TRY_BEG)
        {
            pParam->pThis->impVerifyEHBlock(pParam->block, true);
        }

        pParam->pThis->impImportBlockCode(pParam->block);

        // As discussed above:
        // merge the post-state of each block that is part of this try region
        //
        if (pParam->block->hasTryIndex())
        {
            pParam->pThis->impVerifyEHBlock(pParam->block, false);
        }
    }
    PAL_EXCEPT_FILTER(FilterVerificationExceptions)
    {
        verHandleVerificationFailure(block DEBUGARG(false));
    }
    PAL_ENDTRY

    if (compDonotInline())
    {
        return;
    }

    assert(!compDonotInline());

    markImport = false;

SPILLSTACK:

    unsigned    baseTmp             = NO_BASE_TMP; // input temps assigned to successor blocks
    bool        reimportSpillClique = false;
    BasicBlock* tgtBlock            = nullptr;

    /* If the stack is non-empty, we might have to spill its contents */

    if (verCurrentState.esStackDepth != 0)
    {
        impBoxTemp = BAD_VAR_NUM; // if a box temp is used in a block that leaves something
                                  // on the stack, its lifetime is hard to determine, simply
                                  // don't reuse such temps.

        GenTreePtr addStmt = nullptr;

        /* Do the successors of 'block' have any other predecessors ?
           We do not want to do some of the optimizations related to multiRef
           if we can reimport blocks */

        unsigned multRef = impCanReimport ? unsigned(~0) : 0;

        switch (block->bbJumpKind)
        {
            case BBJ_COND:

                /* Temporarily remove the 'jtrue' from the end of the tree list */

                assert(impTreeLast);
                assert(impTreeLast->gtOper == GT_STMT);
                assert(impTreeLast->gtStmt.gtStmtExpr->gtOper == GT_JTRUE);

                addStmt     = impTreeLast;
                impTreeLast = impTreeLast->gtPrev;

                /* Note if the next block has more than one ancestor */

                multRef |= block->bbNext->bbRefs;

                /* Does the next block have temps assigned? */

                baseTmp  = block->bbNext->bbStkTempsIn;
                tgtBlock = block->bbNext;

                if (baseTmp != NO_BASE_TMP)
                {
                    break;
                }

                /* Try the target of the jump then */

                multRef |= block->bbJumpDest->bbRefs;
                baseTmp  = block->bbJumpDest->bbStkTempsIn;
                tgtBlock = block->bbJumpDest;
                break;

            case BBJ_ALWAYS:
                multRef |= block->bbJumpDest->bbRefs;
                baseTmp  = block->bbJumpDest->bbStkTempsIn;
                tgtBlock = block->bbJumpDest;
                break;

            case BBJ_NONE:
                multRef |= block->bbNext->bbRefs;
                baseTmp  = block->bbNext->bbStkTempsIn;
                tgtBlock = block->bbNext;
                break;

            case BBJ_SWITCH:

                BasicBlock** jmpTab;
                unsigned     jmpCnt;

                /* Temporarily remove the GT_SWITCH from the end of the tree list */

                assert(impTreeLast);
                assert(impTreeLast->gtOper == GT_STMT);
                assert(impTreeLast->gtStmt.gtStmtExpr->gtOper == GT_SWITCH);

                addStmt     = impTreeLast;
                impTreeLast = impTreeLast->gtPrev;

                jmpCnt = block->bbJumpSwt->bbsCount;
                jmpTab = block->bbJumpSwt->bbsDstTab;

                do
                {
                    tgtBlock = (*jmpTab);

                    multRef |= tgtBlock->bbRefs;

                    // Thanks to spill cliques, we should have assigned all or none
                    assert((baseTmp == NO_BASE_TMP) || (baseTmp == tgtBlock->bbStkTempsIn));
                    baseTmp = tgtBlock->bbStkTempsIn;
                    if (multRef > 1)
                    {
                        break;
                    }
                } while (++jmpTab, --jmpCnt);

                break;

            case BBJ_CALLFINALLY:
            case BBJ_EHCATCHRET:
            case BBJ_RETURN:
            case BBJ_EHFINALLYRET:
            case BBJ_EHFILTERRET:
            case BBJ_THROW:
                NO_WAY("can't have 'unreached' end of BB with non-empty stack");
                break;

            default:
                noway_assert(!"Unexpected bbJumpKind");
                break;
        }

        assert(multRef >= 1);

        /* Do we have a base temp number? */

        bool newTemps = (baseTmp == NO_BASE_TMP);

        if (newTemps)
        {
            /* Grab enough temps for the whole stack */
            baseTmp = impGetSpillTmpBase(block);
        }

        /* Spill all stack entries into temps */
        unsigned level, tempNum;

        JITDUMP("\nSpilling stack entries into temps\n");
        for (level = 0, tempNum = baseTmp; level < verCurrentState.esStackDepth; level++, tempNum++)
        {
            GenTreePtr tree = verCurrentState.esStack[level].val;

            /* VC generates code where it pushes a byref from one branch, and an int (ldc.i4 0) from
               the other. This should merge to a byref in unverifiable code.
               However, if the branch which leaves the TYP_I_IMPL on the stack is imported first, the
               successor would be imported assuming there was a TYP_I_IMPL on
               the stack. Thus the value would not get GC-tracked. Hence,
               change the temp to TYP_BYREF and reimport the successors.
               Note: We should only allow this in unverifiable code.
            */
            if (tree->gtType == TYP_BYREF && lvaTable[tempNum].lvType == TYP_I_IMPL && !verNeedsVerification())
            {
                lvaTable[tempNum].lvType = TYP_BYREF;
                impReimportMarkSuccessors(block);
                markImport = true;
            }

#ifdef _TARGET_64BIT_
            if (genActualType(tree->gtType) == TYP_I_IMPL && lvaTable[tempNum].lvType == TYP_INT)
            {
                if (tiVerificationNeeded && tgtBlock->bbEntryState != nullptr &&
                    (tgtBlock->bbFlags & BBF_FAILED_VERIFICATION) == 0)
                {
                    // Merge the current state into the entry state of block;
                    // the call to verMergeEntryStates must have changed
                    // the entry state of the block by merging the int local var
                    // and the native-int stack entry.
                    bool changed = false;
                    if (verMergeEntryStates(tgtBlock, &changed))
                    {
                        impRetypeEntryStateTemps(tgtBlock);
                        impReimportBlockPending(tgtBlock);
                        assert(changed);
                    }
                    else
                    {
                        tgtBlock->bbFlags |= BBF_FAILED_VERIFICATION;
                        break;
                    }
                }

                // Some other block in the spill clique set this to "int", but now we have "native int".
                // Change the type and go back to re-import any blocks that used the wrong type.
                lvaTable[tempNum].lvType = TYP_I_IMPL;
                reimportSpillClique      = true;
            }
            else if (genActualType(tree->gtType) == TYP_INT && lvaTable[tempNum].lvType == TYP_I_IMPL)
            {
                // Spill clique has decided this should be "native int", but this block only pushes an "int".
                // Insert a sign-extension to "native int" so we match the clique.
                verCurrentState.esStack[level].val = gtNewCastNode(TYP_I_IMPL, tree, TYP_I_IMPL);
            }

            // Consider the case where one branch left a 'byref' on the stack and the other leaves
            // an 'int'. On 32-bit, this is allowed (in non-verifiable code) since they are the same
            // size. JIT64 managed to make this work on 64-bit. For compatibility, we support JIT64
            // behavior instead of asserting and then generating bad code (where we save/restore the
            // low 32 bits of a byref pointer to an 'int' sized local). If the 'int' side has been
            // imported already, we need to change the type of the local and reimport the spill clique.
            // If the 'byref' side has imported, we insert a cast from int to 'native int' to match
            // the 'byref' size.
            if (!tiVerificationNeeded)
            {
                if (genActualType(tree->gtType) == TYP_BYREF && lvaTable[tempNum].lvType == TYP_INT)
                {
                    // Some other block in the spill clique set this to "int", but now we have "byref".
                    // Change the type and go back to re-import any blocks that used the wrong type.
                    lvaTable[tempNum].lvType = TYP_BYREF;
                    reimportSpillClique      = true;
                }
                else if (genActualType(tree->gtType) == TYP_INT && lvaTable[tempNum].lvType == TYP_BYREF)
                {
                    // Spill clique has decided this should be "byref", but this block only pushes an "int".
                    // Insert a sign-extension to "native int" so we match the clique size.
                    verCurrentState.esStack[level].val = gtNewCastNode(TYP_I_IMPL, tree, TYP_I_IMPL);
                }
            }
#endif // _TARGET_64BIT_

#if FEATURE_X87_DOUBLES
            // X87 stack doesn't differentiate between float/double
            // so promoting is no big deal.
            // For everybody else keep it as float until we have a collision and then promote
            // Just like for x64's TYP_INT<->TYP_I_IMPL

            if (multRef > 1 && tree->gtType == TYP_FLOAT)
            {
                verCurrentState.esStack[level].val = gtNewCastNode(TYP_DOUBLE, tree, TYP_DOUBLE);
            }

#else // !FEATURE_X87_DOUBLES

            if (tree->gtType == TYP_DOUBLE && lvaTable[tempNum].lvType == TYP_FLOAT)
            {
                // Some other block in the spill clique set this to "float", but now we have "double".
                // Change the type and go back to re-import any blocks that used the wrong type.
                lvaTable[tempNum].lvType = TYP_DOUBLE;
                reimportSpillClique      = true;
            }
            else if (tree->gtType == TYP_FLOAT && lvaTable[tempNum].lvType == TYP_DOUBLE)
            {
                // Spill clique has decided this should be "double", but this block only pushes a "float".
                // Insert a cast to "double" so we match the clique.
                verCurrentState.esStack[level].val = gtNewCastNode(TYP_DOUBLE, tree, TYP_DOUBLE);
            }

#endif // FEATURE_X87_DOUBLES

            /* If addStmt has a reference to tempNum (can only happen if we
               are spilling to the temps already used by a previous block),
               we need to spill addStmt */

            if (addStmt && !newTemps && gtHasRef(addStmt->gtStmt.gtStmtExpr, tempNum, false))
            {
                GenTreePtr addTree = addStmt->gtStmt.gtStmtExpr;

                if (addTree->gtOper == GT_JTRUE)
                {
                    GenTreePtr relOp = addTree->gtOp.gtOp1;
                    assert(relOp->OperIsCompare());

                    var_types type = genActualType(relOp->gtOp.gtOp1->TypeGet());

                    if (gtHasRef(relOp->gtOp.gtOp1, tempNum, false))
                    {
                        unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt JTRUE ref Op1"));
                        impAssignTempGen(temp, relOp->gtOp.gtOp1, level);
                        type              = genActualType(lvaTable[temp].TypeGet());
                        relOp->gtOp.gtOp1 = gtNewLclvNode(temp, type);
                    }

                    if (gtHasRef(relOp->gtOp.gtOp2, tempNum, false))
                    {
                        unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt JTRUE ref Op2"));
                        impAssignTempGen(temp, relOp->gtOp.gtOp2, level);
                        type              = genActualType(lvaTable[temp].TypeGet());
                        relOp->gtOp.gtOp2 = gtNewLclvNode(temp, type);
                    }
                }
                else
                {
                    assert(addTree->gtOper == GT_SWITCH && genActualType(addTree->gtOp.gtOp1->gtType) == TYP_I_IMPL);

                    unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt SWITCH"));
                    impAssignTempGen(temp, addTree->gtOp.gtOp1, level);
                    addTree->gtOp.gtOp1 = gtNewLclvNode(temp, TYP_I_IMPL);
                }
            }

            /* Spill the stack entry, and replace with the temp */

            if (!impSpillStackEntry(level, tempNum
#ifdef DEBUG
                                    ,
                                    true, "Spill Stack Entry"
#endif
                                    ))
            {
                if (markImport)
                {
                    BADCODE("bad stack state");
                }

                // Oops. Something went wrong when spilling. Bad code.
                verHandleVerificationFailure(block DEBUGARG(true));

                goto SPILLSTACK;
            }
        }

        /* Put back the 'jtrue'/'switch' if we removed it earlier */

        if (addStmt)
        {
            impAppendStmt(addStmt, (unsigned)CHECK_SPILL_NONE);
        }
    }

    // Some of the append/spill logic works on compCurBB

    assert(compCurBB == block);

    /* Save the tree list in the block */
    impEndTreeList(block);

    // impEndTreeList sets BBF_IMPORTED on the block
    // We do *NOT* want to set it later than this because
    // impReimportSpillClique might clear it if this block is both a
    // predecessor and successor in the current spill clique
    assert(block->bbFlags & BBF_IMPORTED);

    // If we had a int/native int, or float/double collision, we need to re-import
    if (reimportSpillClique)
    {
        // This will re-import all the successors of block (as well as each of their predecessors)
        impReimportSpillClique(block);

        // For blocks that haven't been imported yet, we still need to mark them as pending import.
        for (unsigned i = 0; i < block->NumSucc(); i++)
        {
            BasicBlock* succ = block->GetSucc(i);
            if ((succ->bbFlags & BBF_IMPORTED) == 0)
            {
                impImportBlockPending(succ);
            }
        }
    }
    else // the normal case
    {
        // otherwise just import the successors of block

        /* Does this block jump to any other blocks? */
        for (unsigned i = 0; i < block->NumSucc(); i++)
        {
            impImportBlockPending(block->GetSucc(i));
        }
    }
}
#ifdef _PREFAST_
#pragma warning(pop)
#endif

這個函數首先會調用impImportBlockCode, impImportBlockCode負責根據IL生成GenTree的主要處理.
導入block後, 若是運行堆棧不爲空(跳轉後的指令須要跳轉前push進去的參數), 須要把運行堆棧中的值spill到臨時變量.
block結束後spill的臨時變量的索引開始值會保存在bbStkTempsOut, block開始時須要讀取的臨時變量的索引開始值保存在bbStkTempsIn.
由於運行堆棧中的值基本上不會跨越BasicBlock(從C#編譯出來的IL), 就不詳細分析這裏的邏輯了.
接下來看impImportBlockCode.

impImportBlockCode的源代碼以下:
這個函數有5000多行, 這裏我只截取一部分.

#ifdef _PREFAST_
#pragma warning(push)
#pragma warning(disable : 21000) // Suppress PREFast warning about overly large function
#endif
/*****************************************************************************
 *  Import the instr for the given basic block
 */
void Compiler::impImportBlockCode(BasicBlock* block)
{
#define _impResolveToken(kind) impResolveToken(codeAddr, &resolvedToken, kind)

#ifdef DEBUG

    if (verbose)
    {
        printf("\nImporting BB%02u (PC=%03u) of '%s'", block->bbNum, block->bbCodeOffs, info.compFullName);
    }
#endif

    unsigned  nxtStmtIndex = impInitBlockLineInfo();
    IL_OFFSET nxtStmtOffs;

    GenTreePtr                   arrayNodeFrom, arrayNodeTo, arrayNodeToIndex;
    bool                         expandInline;
    CorInfoHelpFunc              helper;
    CorInfoIsAccessAllowedResult accessAllowedResult;
    CORINFO_HELPER_DESC          calloutHelper;
    const BYTE*                  lastLoadToken = nullptr;

    // reject cyclic constraints
    if (tiVerificationNeeded)
    {
        Verify(!info.hasCircularClassConstraints, "Method parent has circular class type parameter constraints.");
        Verify(!info.hasCircularMethodConstraints, "Method has circular method type parameter constraints.");
    }

    /* Get the tree list started */

    impBeginTreeList();

    /* Walk the opcodes that comprise the basic block */

    const BYTE* codeAddr = info.compCode + block->bbCodeOffs;
    const BYTE* codeEndp = info.compCode + block->bbCodeOffsEnd;

    IL_OFFSET opcodeOffs    = block->bbCodeOffs;
    IL_OFFSET lastSpillOffs = opcodeOffs;

    signed jmpDist;

    /* remember the start of the delegate creation sequence (used for verification) */
    const BYTE* delegateCreateStart = nullptr;

    int  prefixFlags = 0;
    bool explicitTailCall, constraintCall, readonlyCall;

    bool     insertLdloc = false; // set by CEE_DUP and cleared by following store
    typeInfo tiRetVal;

    unsigned numArgs = info.compArgsCount;

    /* Now process all the opcodes in the block */

    var_types callTyp    = TYP_COUNT;
    OPCODE    prevOpcode = CEE_ILLEGAL;

    if (block->bbCatchTyp)
    {
        if (info.compStmtOffsetsImplicit & ICorDebugInfo::CALL_SITE_BOUNDARIES)
        {
            impCurStmtOffsSet(block->bbCodeOffs);
        }

        // We will spill the GT_CATCH_ARG and the input of the BB_QMARK block
        // to a temp. This is a trade off for code simplicity
        impSpillSpecialSideEff();
    }

    while (codeAddr < codeEndp)
    {
        bool                   usingReadyToRunHelper = false;
        CORINFO_RESOLVED_TOKEN resolvedToken;
        CORINFO_RESOLVED_TOKEN constrainedResolvedToken;
        CORINFO_CALL_INFO      callInfo;
        CORINFO_FIELD_INFO     fieldInfo;

        tiRetVal = typeInfo(); // Default type info

        //---------------------------------------------------------------------

        /* We need to restrict the max tree depth as many of the Compiler
           functions are recursive. We do this by spilling the stack */

        if (verCurrentState.esStackDepth)
        {
            /* Has it been a while since we last saw a non-empty stack (which
               guarantees that the tree depth isnt accumulating. */

            if ((opcodeOffs - lastSpillOffs) > 200)
            {
                impSpillStackEnsure();
                lastSpillOffs = opcodeOffs;
            }
        }
        else
        {
            lastSpillOffs   = opcodeOffs;
            impBoxTempInUse = false; // nothing on the stack, box temp OK to use again
        }

        /* Compute the current instr offset */

        opcodeOffs = (IL_OFFSET)(codeAddr - info.compCode);

#if defined(DEBUGGING_SUPPORT) || defined(DEBUG)

#ifndef DEBUG
        if (opts.compDbgInfo)
#endif
        {
            if (!compIsForInlining())
            {
                nxtStmtOffs =
                    (nxtStmtIndex < info.compStmtOffsetsCount) ? info.compStmtOffsets[nxtStmtIndex] : BAD_IL_OFFSET;

                /* Have we reached the next stmt boundary ? */

                if (nxtStmtOffs != BAD_IL_OFFSET && opcodeOffs >= nxtStmtOffs)
                {
                    assert(nxtStmtOffs == info.compStmtOffsets[nxtStmtIndex]);

                    if (verCurrentState.esStackDepth != 0 && opts.compDbgCode)
                    {
                        /* We need to provide accurate IP-mapping at this point.
                           So spill anything on the stack so that it will form
                           gtStmts with the correct stmt offset noted */

                        impSpillStackEnsure(true);
                    }

                    // Has impCurStmtOffs been reported in any tree?

                    if (impCurStmtOffs != BAD_IL_OFFSET && opts.compDbgCode)
                    {
                        GenTreePtr placeHolder = new (this, GT_NO_OP) GenTree(GT_NO_OP, TYP_VOID);
                        impAppendTree(placeHolder, (unsigned)CHECK_SPILL_NONE, impCurStmtOffs);

                        assert(impCurStmtOffs == BAD_IL_OFFSET);
                    }

                    if (impCurStmtOffs == BAD_IL_OFFSET)
                    {
                        /* Make sure that nxtStmtIndex is in sync with opcodeOffs.
                           If opcodeOffs has gone past nxtStmtIndex, catch up */

                        while ((nxtStmtIndex + 1) < info.compStmtOffsetsCount &&
                               info.compStmtOffsets[nxtStmtIndex + 1] <= opcodeOffs)
                        {
                            nxtStmtIndex++;
                        }

                        /* Go to the new stmt */

                        impCurStmtOffsSet(info.compStmtOffsets[nxtStmtIndex]);

                        /* Update the stmt boundary index */

                        nxtStmtIndex++;
                        assert(nxtStmtIndex <= info.compStmtOffsetsCount);

                        /* Are there any more line# entries after this one? */

                        if (nxtStmtIndex < info.compStmtOffsetsCount)
                        {
                            /* Remember where the next line# starts */

                            nxtStmtOffs = info.compStmtOffsets[nxtStmtIndex];
                        }
                        else
                        {
                            /* No more line# entries */

                            nxtStmtOffs = BAD_IL_OFFSET;
                        }
                    }
                }
                else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::STACK_EMPTY_BOUNDARIES) &&
                         (verCurrentState.esStackDepth == 0))
                {
                    /* At stack-empty locations, we have already added the tree to
                       the stmt list with the last offset. We just need to update
                       impCurStmtOffs
                     */

                    impCurStmtOffsSet(opcodeOffs);
                }
                else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::CALL_SITE_BOUNDARIES) &&
                         impOpcodeIsCallSiteBoundary(prevOpcode))
                {
                    /* Make sure we have a type cached */
                    assert(callTyp != TYP_COUNT);

                    if (callTyp == TYP_VOID)
                    {
                        impCurStmtOffsSet(opcodeOffs);
                    }
                    else if (opts.compDbgCode)
                    {
                        impSpillStackEnsure(true);
                        impCurStmtOffsSet(opcodeOffs);
                    }
                }
                else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::NOP_BOUNDARIES) && (prevOpcode == CEE_NOP))
                {
                    if (opts.compDbgCode)
                    {
                        impSpillStackEnsure(true);
                    }

                    impCurStmtOffsSet(opcodeOffs);
                }

                assert(impCurStmtOffs == BAD_IL_OFFSET || nxtStmtOffs == BAD_IL_OFFSET ||
                       jitGetILoffs(impCurStmtOffs) <= nxtStmtOffs);
            }
        }

#endif // defined(DEBUGGING_SUPPORT) || defined(DEBUG)

        CORINFO_CLASS_HANDLE clsHnd       = DUMMY_INIT(NULL);
        CORINFO_CLASS_HANDLE ldelemClsHnd = DUMMY_INIT(NULL);
        CORINFO_CLASS_HANDLE stelemClsHnd = DUMMY_INIT(NULL);

        var_types       lclTyp, ovflType = TYP_UNKNOWN;
        GenTreePtr      op1           = DUMMY_INIT(NULL);
        GenTreePtr      op2           = DUMMY_INIT(NULL);
        GenTreeArgList* args          = nullptr; // What good do these "DUMMY_INIT"s do?
        GenTreePtr      newObjThisPtr = DUMMY_INIT(NULL);
        bool            uns           = DUMMY_INIT(false);

        /* Get the next opcode and the size of its parameters */

        OPCODE opcode = (OPCODE)getU1LittleEndian(codeAddr);
        codeAddr += sizeof(__int8);

#ifdef DEBUG
        impCurOpcOffs = (IL_OFFSET)(codeAddr - info.compCode - 1);
        JITDUMP("\n    [%2u] %3u (0x%03x) ", verCurrentState.esStackDepth, impCurOpcOffs, impCurOpcOffs);
#endif

    DECODE_OPCODE:

        // Return if any previous code has caused inline to fail.
        if (compDonotInline())
        {
            return;
        }

        /* Get the size of additional parameters */

        signed int sz = opcodeSizes[opcode];

#ifdef DEBUG
        clsHnd  = NO_CLASS_HANDLE;
        lclTyp  = TYP_COUNT;
        callTyp = TYP_COUNT;

        impCurOpcOffs = (IL_OFFSET)(codeAddr - info.compCode - 1);
        impCurOpcName = opcodeNames[opcode];

        if (verbose && (opcode != CEE_PREFIX1))
        {
            printf("%s", impCurOpcName);
        }

        /* Use assertImp() to display the opcode */

        op1 = op2 = nullptr;
#endif

        /* See what kind of an opcode we have, then */

        unsigned mflags   = 0;
        unsigned clsFlags = 0;

        switch (opcode)
        {
            unsigned  lclNum;
            var_types type;

            GenTreePtr op3;
            genTreeOps oper;
            unsigned   size;

            int val;

            CORINFO_SIG_INFO     sig;
            unsigned             flags;
            IL_OFFSET            jmpAddr;
            bool                 ovfl, unordered, callNode;
            bool                 ldstruct;
            CORINFO_CLASS_HANDLE tokenType;

            union {
                int     intVal;
                float   fltVal;
                __int64 lngVal;
                double  dblVal;
            } cval;

            case CEE_PREFIX1:
                opcode = (OPCODE)(getU1LittleEndian(codeAddr) + 256);
                codeAddr += sizeof(__int8);
                opcodeOffs = (IL_OFFSET)(codeAddr - info.compCode);
                goto DECODE_OPCODE;

            SPILL_APPEND:

                /* Append 'op1' to the list of statements */
                impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
                goto DONE_APPEND;

            APPEND:

                /* Append 'op1' to the list of statements */

                impAppendTree(op1, (unsigned)CHECK_SPILL_NONE, impCurStmtOffs);
                goto DONE_APPEND;

            DONE_APPEND:

#ifdef DEBUG
                // Remember at which BC offset the tree was finished
                impNoteLastILoffs();
#endif
                break;

            case CEE_LDNULL:
                impPushNullObjRefOnStack();
                break;

            case CEE_LDC_I4_M1:
            case CEE_LDC_I4_0:
            case CEE_LDC_I4_1:
            case CEE_LDC_I4_2:
            case CEE_LDC_I4_3:
            case CEE_LDC_I4_4:
            case CEE_LDC_I4_5:
            case CEE_LDC_I4_6:
            case CEE_LDC_I4_7:
            case CEE_LDC_I4_8:
                cval.intVal = (opcode - CEE_LDC_I4_0);
                assert(-1 <= cval.intVal && cval.intVal <= 8);
                goto PUSH_I4CON;

            case CEE_LDC_I4_S:
                cval.intVal = getI1LittleEndian(codeAddr);
                goto PUSH_I4CON;
            case CEE_LDC_I4:
                cval.intVal = getI4LittleEndian(codeAddr);
                goto PUSH_I4CON;
            PUSH_I4CON:
                JITDUMP(" %d", cval.intVal);
                impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT));
                break;

            case CEE_LDC_I8:
                cval.lngVal = getI8LittleEndian(codeAddr);
                JITDUMP(" 0x%016llx", cval.lngVal);
                impPushOnStack(gtNewLconNode(cval.lngVal), typeInfo(TI_LONG));
                break;

            case CEE_LDC_R8:
                cval.dblVal = getR8LittleEndian(codeAddr);
                JITDUMP(" %#.17g", cval.dblVal);
                impPushOnStack(gtNewDconNode(cval.dblVal), typeInfo(TI_DOUBLE));
                break;

            case CEE_LDC_R4:
                cval.dblVal = getR4LittleEndian(codeAddr);
                JITDUMP(" %#.17g", cval.dblVal);
                {
                    GenTreePtr cnsOp = gtNewDconNode(cval.dblVal);
#if !FEATURE_X87_DOUBLES
                    // X87 stack doesn't differentiate between float/double
                    // so R4 is treated as R8, but everybody else does
                    cnsOp->gtType = TYP_FLOAT;
#endif // FEATURE_X87_DOUBLES
                    impPushOnStack(cnsOp, typeInfo(TI_DOUBLE));
                }
                break;

            case CEE_LDSTR:

                if (compIsForInlining())
                {
                    if (impInlineInfo->inlineCandidateInfo->dwRestrictions & INLINE_NO_CALLEE_LDSTR)
                    {
                        compInlineResult->NoteFatal(InlineObservation::CALLSITE_HAS_LDSTR_RESTRICTION);
                        return;
                    }
                }

                val = getU4LittleEndian(codeAddr);
                JITDUMP(" %08X", val);
                if (tiVerificationNeeded)
                {
                    Verify(info.compCompHnd->isValidStringRef(info.compScopeHnd, val), "bad string");
                    tiRetVal = typeInfo(TI_REF, impGetStringClass());
                }
                impPushOnStack(gtNewSconNode(val, info.compScopeHnd), tiRetVal);

                break;

            case CEE_LDARG:
                lclNum = getU2LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                impLoadArg(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_LDARG_S:
                lclNum = getU1LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                impLoadArg(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_LDARG_0:
            case CEE_LDARG_1:
            case CEE_LDARG_2:
            case CEE_LDARG_3:
                lclNum = (opcode - CEE_LDARG_0);
                assert(lclNum >= 0 && lclNum < 4);
                impLoadArg(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_LDLOC:
                lclNum = getU2LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                impLoadLoc(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_LDLOC_S:
                lclNum = getU1LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                impLoadLoc(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_LDLOC_0:
            case CEE_LDLOC_1:
            case CEE_LDLOC_2:
            case CEE_LDLOC_3:
                lclNum = (opcode - CEE_LDLOC_0);
                assert(lclNum >= 0 && lclNum < 4);
                impLoadLoc(lclNum, opcodeOffs + sz + 1);
                break;

            case CEE_STARG:
                lclNum = getU2LittleEndian(codeAddr);
                goto STARG;

            case CEE_STARG_S:
                lclNum = getU1LittleEndian(codeAddr);
            STARG:
                JITDUMP(" %u", lclNum);

                if (tiVerificationNeeded)
                {
                    Verify(lclNum < info.compILargsCount, "bad arg num");
                }

                if (compIsForInlining())
                {
                    op1 = impInlineFetchArg(lclNum, impInlineInfo->inlArgInfo, impInlineInfo->lclVarInfo);
                    noway_assert(op1->gtOper == GT_LCL_VAR);
                    lclNum = op1->AsLclVar()->gtLclNum;

                    goto VAR_ST_VALID;
                }

                lclNum = compMapILargNum(lclNum); // account for possible hidden param
                assertImp(lclNum < numArgs);

                if (lclNum == info.compThisArg)
                {
                    lclNum = lvaArg0Var;
                }
                lvaTable[lclNum].lvArgWrite = 1;

                if (tiVerificationNeeded)
                {
                    typeInfo& tiLclVar = lvaTable[lclNum].lvVerTypeInfo;
                    Verify(tiCompatibleWith(impStackTop().seTypeInfo, NormaliseForStack(tiLclVar), true),
                           "type mismatch");

                    if (verTrackObjCtorInitState && (verCurrentState.thisInitialized != TIS_Init))
                    {
                        Verify(!tiLclVar.IsThisPtr(), "storing to uninit this ptr");
                    }
                }

                goto VAR_ST;

            case CEE_STLOC:
                lclNum = getU2LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                goto LOC_ST;

            case CEE_STLOC_S:
                lclNum = getU1LittleEndian(codeAddr);
                JITDUMP(" %u", lclNum);
                goto LOC_ST;

            case CEE_STLOC_0:
            case CEE_STLOC_1:
            case CEE_STLOC_2:
            case CEE_STLOC_3:
                lclNum = (opcode - CEE_STLOC_0);
                assert(lclNum >= 0 && lclNum < 4);

            LOC_ST:
                if (tiVerificationNeeded)
                {
                    Verify(lclNum < info.compMethodInfo->locals.numArgs, "bad local num");
                    Verify(tiCompatibleWith(impStackTop().seTypeInfo,
                                            NormaliseForStack(lvaTable[lclNum + numArgs].lvVerTypeInfo), true),
                           "type mismatch");
                }

                if (compIsForInlining())
                {
                    lclTyp = impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclTypeInfo;

                    /* Have we allocated a temp for this local? */

                    lclNum = impInlineFetchLocal(lclNum DEBUGARG("Inline stloc first use temp"));

                    goto _PopValue;
                }

                lclNum += numArgs;

            VAR_ST:

                if (lclNum >= info.compLocalsCount && lclNum != lvaArg0Var)
                {
                    assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
                    BADCODE("Bad IL");
                }

            VAR_ST_VALID:

                /* if it is a struct assignment, make certain we don't overflow the buffer */
                assert(lclTyp != TYP_STRUCT || lvaLclSize(lclNum) >= info.compCompHnd->getClassSize(clsHnd));

                if (lvaTable[lclNum].lvNormalizeOnLoad())
                {
                    lclTyp = lvaGetRealType(lclNum);
                }
                else
                {
                    lclTyp = lvaGetActualType(lclNum);
                }

            _PopValue:
                /* Pop the value being assigned */

                {
                    StackEntry se = impPopStack(clsHnd);
                    op1           = se.val;
                    tiRetVal      = se.seTypeInfo;
                }

#ifdef FEATURE_SIMD
                if (varTypeIsSIMD(lclTyp) && (lclTyp != op1->TypeGet()))
                {
                    assert(op1->TypeGet() == TYP_STRUCT);
                    op1->gtType = lclTyp;
                }
#endif // FEATURE_SIMD

                op1 = impImplicitIorI4Cast(op1, lclTyp);

#ifdef _TARGET_64BIT_
                // Downcast the TYP_I_IMPL into a 32-bit Int for x86 JIT compatiblity
                if (varTypeIsI(op1->TypeGet()) && (genActualType(lclTyp) == TYP_INT))
                {
                    assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
                    op1 = gtNewCastNode(TYP_INT, op1, TYP_INT);
                }
#endif // _TARGET_64BIT_

                // We had better assign it a value of the correct type
                assertImp(
                    genActualType(lclTyp) == genActualType(op1->gtType) ||
                    genActualType(lclTyp) == TYP_I_IMPL && op1->IsVarAddr() ||
                    (genActualType(lclTyp) == TYP_I_IMPL && (op1->gtType == TYP_BYREF || op1->gtType == TYP_REF)) ||
                    (genActualType(op1->gtType) == TYP_I_IMPL && lclTyp == TYP_BYREF) ||
                    (varTypeIsFloating(lclTyp) && varTypeIsFloating(op1->TypeGet())) ||
                    ((genActualType(lclTyp) == TYP_BYREF) && genActualType(op1->TypeGet()) == TYP_REF));

                /* If op1 is "&var" then its type is the transient "*" and it can
                   be used either as TYP_BYREF or TYP_I_IMPL */

                if (op1->IsVarAddr())
                {
                    assertImp(genActualType(lclTyp) == TYP_I_IMPL || lclTyp == TYP_BYREF);

                    /* When "&var" is created, we assume it is a byref. If it is
                       being assigned to a TYP_I_IMPL var, change the type to
                       prevent unnecessary GC info */

                    if (genActualType(lclTyp) == TYP_I_IMPL)
                    {
                        op1->gtType = TYP_I_IMPL;
                    }
                }

                /* Filter out simple assignments to itself */

                if (op1->gtOper == GT_LCL_VAR && lclNum == op1->gtLclVarCommon.gtLclNum)
                {
                    if (insertLdloc)
                    {
                        // This is a sequence of (ldloc, dup, stloc).  Can simplify
                        // to (ldloc, stloc).  Goto LDVAR to reconstruct the ldloc node.
                        CLANG_FORMAT_COMMENT_ANCHOR;

#ifdef DEBUG
                        if (tiVerificationNeeded)
                        {
                            assert(
                                typeInfo::AreEquivalent(tiRetVal, NormaliseForStack(lvaTable[lclNum].lvVerTypeInfo)));
                        }
#endif

                        op1         = nullptr;
                        insertLdloc = false;

                        impLoadVar(lclNum, opcodeOffs + sz + 1);
                        break;
                    }
                    else if (opts.compDbgCode)
                    {
                        op1 = gtNewNothingNode();
                        goto SPILL_APPEND;
                    }
                    else
                    {
                        break;
                    }
                }

                /* Create the assignment node */

                op2 = gtNewLclvNode(lclNum, lclTyp, opcodeOffs + sz + 1);

                /* If the local is aliased, we need to spill calls and
                   indirections from the stack. */

                if ((lvaTable[lclNum].lvAddrExposed || lvaTable[lclNum].lvHasLdAddrOp) &&
                    verCurrentState.esStackDepth > 0)
                {
                    impSpillSideEffects(false, (unsigned)CHECK_SPILL_ALL DEBUGARG("Local could be aliased"));
                }

                /* Spill any refs to the local from the stack */

                impSpillLclRefs(lclNum);

#if !FEATURE_X87_DOUBLES
                // We can generate an assignment to a TYP_FLOAT from a TYP_DOUBLE
                // We insert a cast to the dest 'op2' type
                //
                if ((op1->TypeGet() != op2->TypeGet()) && varTypeIsFloating(op1->gtType) &&
                    varTypeIsFloating(op2->gtType))
                {
                    op1 = gtNewCastNode(op2->TypeGet(), op1, op2->TypeGet());
                }
#endif // !FEATURE_X87_DOUBLES

                if (varTypeIsStruct(lclTyp))
                {
                    op1 = impAssignStruct(op2, op1, clsHnd, (unsigned)CHECK_SPILL_ALL);
                }
                else
                {
                    // The code generator generates GC tracking information
                    // based on the RHS of the assignment.  Later the LHS (which is
                    // is a BYREF) gets used and the emitter checks that that variable
                    // is being tracked.  It is not (since the RHS was an int and did
                    // not need tracking).  To keep this assert happy, we change the RHS
                    if (lclTyp == TYP_BYREF && !varTypeIsGC(op1->gtType))
                    {
                        op1->gtType = TYP_BYREF;
                    }
                    op1 = gtNewAssignNode(op2, op1);
                }

                /* If insertLdloc is true, then we need to insert a ldloc following the
                   stloc.  This is done when converting a (dup, stloc) sequence into
                   a (stloc, ldloc) sequence. */

                if (insertLdloc)
                {
                    // From SPILL_APPEND
                    impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);

#ifdef DEBUG
                    // From DONE_APPEND
                    impNoteLastILoffs();
#endif
                    op1         = nullptr;
                    insertLdloc = false;

                    impLoadVar(lclNum, opcodeOffs + sz + 1, tiRetVal);
                    break;
                }

                goto SPILL_APPEND;

            // 省略了一堆case...

            case CEE_NOP:
                if (opts.compDbgCode)
                {
                    op1 = new (this, GT_NO_OP) GenTree(GT_NO_OP, TYP_VOID);
                    goto SPILL_APPEND;
                }
                break;

            /******************************** NYI *******************************/

            case 0xCC:
                OutputDebugStringA("CLR: Invalid x86 breakpoint in IL stream\n");

            case CEE_ILLEGAL:
            case CEE_MACRO_END:

            default:
                BADCODE3("unknown opcode", ": %02X", (int)opcode);
        }

        codeAddr += sz;
        prevOpcode = opcode;

        prefixFlags = 0;
        assert(!insertLdloc || opcode == CEE_DUP);
    }

    assert(!insertLdloc);

    return;
#undef _impResolveToken
}
#ifdef _PREFAST_
#pragma warning(pop)
#endif

首先codeAddr和codeEndp是block對應的IL的開始和結束地址, opcode是當前地址對應的byte,
以ldloc.0爲例, 這個指令的二進制是06, 06是opcode CEE_LDLOC_0,
以ldc.i4.s 100爲例, 這個指令的二進制是1f 64, 1f是opcode CEE_LDC_I4_S, 64是參數也就是100的16進制.
這個函數會用一個循環來解析屬於當前block的IL範圍內的IL指令, 由於IL指令有不少, 我只能挑幾個典型的來解釋.

IL指令ldc.i4.s會向運行堆棧推入一個常量int, 常量的範圍在1 byte之內, 解析的代碼以下:

case CEE_LDC_I4_S:
    cval.intVal = getI1LittleEndian(codeAddr);
    goto PUSH_I4CON;
case CEE_LDC_I4:
    cval.intVal = getI4LittleEndian(codeAddr);
    goto PUSH_I4CON;
PUSH_I4CON:
    JITDUMP(" %d", cval.intVal);
    impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT));
    break;

咱們能夠看到它會讀取指令後的1 byte(無s的指令會讀取4 byte), 而後調用impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT)).
gtNewIconNode函數(Icon是int constant的縮寫)會建立一個GT_CNS_INT類型的GenTree, 表示int常量的節點.
建立節點後會把這個節點推到運行堆棧裏, impPushOnStack的源代碼以下:

/*****************************************************************************
 *
 *  Pushes the given tree on the stack.
 */

void Compiler::impPushOnStack(GenTreePtr tree, typeInfo ti)
{
    /* Check for overflow. If inlining, we may be using a bigger stack */

    if ((verCurrentState.esStackDepth >= info.compMaxStack) &&
        (verCurrentState.esStackDepth >= impStkSize || ((compCurBB->bbFlags & BBF_IMPORTED) == 0)))
    {
        BADCODE("stack overflow");
    }

#ifdef DEBUG
    // If we are pushing a struct, make certain we know the precise type!
    if (tree->TypeGet() == TYP_STRUCT)
    {
        assert(ti.IsType(TI_STRUCT));
        CORINFO_CLASS_HANDLE clsHnd = ti.GetClassHandle();
        assert(clsHnd != NO_CLASS_HANDLE);
    }

    if (tiVerificationNeeded && !ti.IsDead())
    {
        assert(typeInfo::AreEquivalent(NormaliseForStack(ti), ti)); // types are normalized

        // The ti type is consistent with the tree type.
        //

        // On 64-bit systems, nodes whose "proper" type is "native int" get labeled TYP_LONG.
        // In the verification type system, we always transform "native int" to "TI_INT".
        // Ideally, we would keep track of which nodes labeled "TYP_LONG" are really "native int", but
        // attempts to do that have proved too difficult.  Instead, we'll assume that in checks like this,
        // when there's a mismatch, it's because of this reason -- the typeInfo::AreEquivalentModuloNativeInt
        // method used in the last disjunct allows exactly this mismatch.
        assert(ti.IsDead() || ti.IsByRef() && (tree->TypeGet() == TYP_I_IMPL || tree->TypeGet() == TYP_BYREF) ||
               ti.IsUnboxedGenericTypeVar() && tree->TypeGet() == TYP_REF ||
               ti.IsObjRef() && tree->TypeGet() == TYP_REF || ti.IsMethod() && tree->TypeGet() == TYP_I_IMPL ||
               ti.IsType(TI_STRUCT) && tree->TypeGet() != TYP_REF ||
               typeInfo::AreEquivalentModuloNativeInt(NormaliseForStack(ti),
                                                      NormaliseForStack(typeInfo(tree->TypeGet()))));

        // If it is a struct type, make certain we normalized the primitive types
        assert(!ti.IsType(TI_STRUCT) ||
               info.compCompHnd->getTypeForPrimitiveValueClass(ti.GetClassHandle()) == CORINFO_TYPE_UNDEF);
    }

#if VERBOSE_VERIFY
    if (VERBOSE && tiVerificationNeeded)
    {
        printf("\n");
        printf(TI_DUMP_PADDING);
        printf("About to push to stack: ");
        ti.Dump();
    }
#endif // VERBOSE_VERIFY

#endif // DEBUG

    verCurrentState.esStack[verCurrentState.esStackDepth].seTypeInfo = ti;
    verCurrentState.esStack[verCurrentState.esStackDepth++].val      = tree;

    if ((tree->gtType == TYP_LONG) && (compLongUsed == false))
    {
        compLongUsed = true;
    }
    else if (((tree->gtType == TYP_FLOAT) || (tree->gtType == TYP_DOUBLE)) && (compFloatingPointUsed == false))
    {
        compFloatingPointUsed = true;
    }
}

impPushOnStack會把GenTree節點添加到運行堆棧verCurrentState.esStack, 包含類型信息和剛纔創建的GT_CNS_INT節點.

假設ldc.i4.s 100後面的指令是stloc.0, 表示給本地變量0賦值100, 那麼後面的stloc.0指令須要使用前面的值,
咱們來看看CEE_STLOC_0是怎麼處理的:

case CEE_STLOC_0:
case CEE_STLOC_1:
case CEE_STLOC_2:
case CEE_STLOC_3:
    lclNum = (opcode - CEE_STLOC_0);
    assert(lclNum >= 0 && lclNum < 4);

LOC_ST:
    if (tiVerificationNeeded)
    {
        Verify(lclNum < info.compMethodInfo->locals.numArgs, "bad local num");
        Verify(tiCompatibleWith(impStackTop().seTypeInfo,
                                NormaliseForStack(lvaTable[lclNum + numArgs].lvVerTypeInfo), true),
               "type mismatch");
    }

    if (compIsForInlining())
    {
        lclTyp = impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclTypeInfo;

        /* Have we allocated a temp for this local? */

        lclNum = impInlineFetchLocal(lclNum DEBUGARG("Inline stloc first use temp"));

        goto _PopValue;
    }

    lclNum += numArgs;

VAR_ST:

    if (lclNum >= info.compLocalsCount && lclNum != lvaArg0Var)
    {
        assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
        BADCODE("Bad IL");
    }

VAR_ST_VALID:

    /* if it is a struct assignment, make certain we don't overflow the buffer */
    assert(lclTyp != TYP_STRUCT || lvaLclSize(lclNum) >= info.compCompHnd->getClassSize(clsHnd));

    if (lvaTable[lclNum].lvNormalizeOnLoad())
    {
        lclTyp = lvaGetRealType(lclNum);
    }
    else
    {
        lclTyp = lvaGetActualType(lclNum);
    }

_PopValue:
    /* Pop the value being assigned */

    {
        StackEntry se = impPopStack(clsHnd);
        op1           = se.val;
        tiRetVal      = se.seTypeInfo;
    }

#ifdef FEATURE_SIMD
    if (varTypeIsSIMD(lclTyp) && (lclTyp != op1->TypeGet()))
    {
        assert(op1->TypeGet() == TYP_STRUCT);
        op1->gtType = lclTyp;
    }
#endif // FEATURE_SIMD

    op1 = impImplicitIorI4Cast(op1, lclTyp);

#ifdef _TARGET_64BIT_
    // Downcast the TYP_I_IMPL into a 32-bit Int for x86 JIT compatiblity
    if (varTypeIsI(op1->TypeGet()) && (genActualType(lclTyp) == TYP_INT))
    {
        assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
        op1 = gtNewCastNode(TYP_INT, op1, TYP_INT);
    }
#endif // _TARGET_64BIT_

    // We had better assign it a value of the correct type
    assertImp(
        genActualType(lclTyp) == genActualType(op1->gtType) ||
        genActualType(lclTyp) == TYP_I_IMPL && op1->IsVarAddr() ||
        (genActualType(lclTyp) == TYP_I_IMPL && (op1->gtType == TYP_BYREF || op1->gtType == TYP_REF)) ||
        (genActualType(op1->gtType) == TYP_I_IMPL && lclTyp == TYP_BYREF) ||
        (varTypeIsFloating(lclTyp) && varTypeIsFloating(op1->TypeGet())) ||
        ((genActualType(lclTyp) == TYP_BYREF) && genActualType(op1->TypeGet()) == TYP_REF));

    /* If op1 is "&var" then its type is the transient "*" and it can
       be used either as TYP_BYREF or TYP_I_IMPL */

    if (op1->IsVarAddr())
    {
        assertImp(genActualType(lclTyp) == TYP_I_IMPL || lclTyp == TYP_BYREF);

        /* When "&var" is created, we assume it is a byref. If it is
           being assigned to a TYP_I_IMPL var, change the type to
           prevent unnecessary GC info */

        if (genActualType(lclTyp) == TYP_I_IMPL)
        {
            op1->gtType = TYP_I_IMPL;
        }
    }

    /* Filter out simple assignments to itself */

    if (op1->gtOper == GT_LCL_VAR && lclNum == op1->gtLclVarCommon.gtLclNum)
    {
        if (insertLdloc)
        {
            // This is a sequence of (ldloc, dup, stloc).  Can simplify
            // to (ldloc, stloc).  Goto LDVAR to reconstruct the ldloc node.
            CLANG_FORMAT_COMMENT_ANCHOR;

#ifdef DEBUG
            if (tiVerificationNeeded)
            {
                assert(
                    typeInfo::AreEquivalent(tiRetVal, NormaliseForStack(lvaTable[lclNum].lvVerTypeInfo)));
            }
#endif

            op1         = nullptr;
            insertLdloc = false;

            impLoadVar(lclNum, opcodeOffs + sz + 1);
            break;
        }
        else if (opts.compDbgCode)
        {
            op1 = gtNewNothingNode();
            goto SPILL_APPEND;
        }
        else
        {
            break;
        }
    }

    /* Create the assignment node */

    op2 = gtNewLclvNode(lclNum, lclTyp, opcodeOffs + sz + 1);

    /* If the local is aliased, we need to spill calls and
       indirections from the stack. */

    if ((lvaTable[lclNum].lvAddrExposed || lvaTable[lclNum].lvHasLdAddrOp) &&
        verCurrentState.esStackDepth > 0)
    {
        impSpillSideEffects(false, (unsigned)CHECK_SPILL_ALL DEBUGARG("Local could be aliased"));
    }

    /* Spill any refs to the local from the stack */

    impSpillLclRefs(lclNum);

#if !FEATURE_X87_DOUBLES
    // We can generate an assignment to a TYP_FLOAT from a TYP_DOUBLE
    // We insert a cast to the dest 'op2' type
    //
    if ((op1->TypeGet() != op2->TypeGet()) && varTypeIsFloating(op1->gtType) &&
        varTypeIsFloating(op2->gtType))
    {
        op1 = gtNewCastNode(op2->TypeGet(), op1, op2->TypeGet());
    }
#endif // !FEATURE_X87_DOUBLES

    if (varTypeIsStruct(lclTyp))
    {
        op1 = impAssignStruct(op2, op1, clsHnd, (unsigned)CHECK_SPILL_ALL);
    }
    else
    {
        // The code generator generates GC tracking information
        // based on the RHS of the assignment.  Later the LHS (which is
        // is a BYREF) gets used and the emitter checks that that variable
        // is being tracked.  It is not (since the RHS was an int and did
        // not need tracking).  To keep this assert happy, we change the RHS
        if (lclTyp == TYP_BYREF && !varTypeIsGC(op1->gtType))
        {
            op1->gtType = TYP_BYREF;
        }
        op1 = gtNewAssignNode(op2, op1);
    }

    /* If insertLdloc is true, then we need to insert a ldloc following the
       stloc.  This is done when converting a (dup, stloc) sequence into
       a (stloc, ldloc) sequence. */

    if (insertLdloc)
    {
        // From SPILL_APPEND
        impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);

#ifdef DEBUG
        // From DONE_APPEND
        impNoteLastILoffs();
#endif
        op1         = nullptr;
        insertLdloc = false;

        impLoadVar(lclNum, opcodeOffs + sz + 1, tiRetVal);
        break;
    }

    goto SPILL_APPEND;

SPILL_APPEND:

    /* Append 'op1' to the list of statements */
    impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
    goto DONE_APPEND;

DONE_APPEND:

#ifdef DEBUG
    // Remember at which BC offset the tree was finished
    impNoteLastILoffs();
#endif
    break;

處理CEE_STLOC_0的代碼有點長, 請耐心看:
首先0~3的指令會共用處理, stloc.0是0a, stloc.1是0b, stloc.2是0c, stloc.3是0d.
獲得保存的本地變量序號後還要知道它在本地變量表lvaTable中的索引值是多少, 由於本地變量表開頭存的是參數, 因此這裏的索引值是lclNum += numArgs.
而後建立賦值(GT_ASG)的節點, 賦值的節點有兩個參數, 第一個是lclVar 0, 第二個是const 100(類型一致因此不須要cast), 以下:

/--*  const     int    100
\--*  =         int
   \--*  lclVar    int    V01

如今咱們建立了一顆GenTree樹, 這個樹是一個單獨的語句, 咱們能夠把這個語句添加到BasicBlock中,
添加到BasicBlock使用的代碼是impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs):

/*****************************************************************************
 *
 *  Append the given expression tree to the current block's tree list.
 *  Return the newly created statement.
 */

GenTreePtr Compiler::impAppendTree(GenTreePtr tree, unsigned chkLevel, IL_OFFSETX offset)
{
    assert(tree);

    /* Allocate an 'expression statement' node */

    GenTreePtr expr = gtNewStmt(tree, offset);

    /* Append the statement to the current block's stmt list */

    impAppendStmt(expr, chkLevel);

    return expr;
}

/*****************************************************************************
 *
 *  Append the given GT_STMT node to the current block's tree list.
 *  [0..chkLevel) is the portion of the stack which we will check for
 *    interference with stmt and spill if needed.
 */

inline void Compiler::impAppendStmt(GenTreePtr stmt, unsigned chkLevel)
{
    assert(stmt->gtOper == GT_STMT);
    noway_assert(impTreeLast != nullptr);

    /* If the statement being appended has any side-effects, check the stack
       to see if anything needs to be spilled to preserve correct ordering. */

    GenTreePtr expr  = stmt->gtStmt.gtStmtExpr;
    unsigned   flags = expr->gtFlags & GTF_GLOB_EFFECT;

    // Assignment to (unaliased) locals don't count as a side-effect as
    // we handle them specially using impSpillLclRefs(). Temp locals should
    // be fine too.
    // TODO-1stClassStructs: The check below should apply equally to struct assignments,
    // but previously the block ops were always being marked GTF_GLOB_REF, even if
    // the operands could not be global refs.

    if ((expr->gtOper == GT_ASG) && (expr->gtOp.gtOp1->gtOper == GT_LCL_VAR) &&
        !(expr->gtOp.gtOp1->gtFlags & GTF_GLOB_REF) && !gtHasLocalsWithAddrOp(expr->gtOp.gtOp2) &&
        !varTypeIsStruct(expr->gtOp.gtOp1))
    {
        unsigned op2Flags = expr->gtOp.gtOp2->gtFlags & GTF_GLOB_EFFECT;
        assert(flags == (op2Flags | GTF_ASG));
        flags = op2Flags;
    }

    if (chkLevel == (unsigned)CHECK_SPILL_ALL)
    {
        chkLevel = verCurrentState.esStackDepth;
    }

    if (chkLevel && chkLevel != (unsigned)CHECK_SPILL_NONE)
    {
        assert(chkLevel <= verCurrentState.esStackDepth);

        if (flags)
        {
            // If there is a call, we have to spill global refs
            bool spillGlobEffects = (flags & GTF_CALL) ? true : false;

            if (expr->gtOper == GT_ASG)
            {
                GenTree* lhs = expr->gtGetOp1();
                // If we are assigning to a global ref, we have to spill global refs on stack.
                // TODO-1stClassStructs: Previously, spillGlobEffects was set to true for
                // GT_INITBLK and GT_COPYBLK, but this is overly conservative, and should be
                // revisited. (Note that it was NOT set to true for GT_COPYOBJ.)
                if (!expr->OperIsBlkOp())
                {
                    // If we are assigning to a global ref, we have to spill global refs on stack
                    if ((lhs->gtFlags & GTF_GLOB_REF) != 0)
                    {
                        spillGlobEffects = true;
                    }
                }
                else if ((lhs->OperIsBlk() && !lhs->AsBlk()->HasGCPtr()) ||
                         ((lhs->OperGet() == GT_LCL_VAR) &&
                          (lvaTable[lhs->AsLclVarCommon()->gtLclNum].lvStructGcCount == 0)))
                {
                    spillGlobEffects = true;
                }
            }

            impSpillSideEffects(spillGlobEffects, chkLevel DEBUGARG("impAppendStmt"));
        }
        else
        {
            impSpillSpecialSideEff();
        }
    }

    impAppendStmtCheck(stmt, chkLevel);

    /* Point 'prev' at the previous node, so that we can walk backwards */

    stmt->gtPrev = impTreeLast;

    /* Append the expression statement to the list */

    impTreeLast->gtNext = stmt;
    impTreeLast         = stmt;

#ifdef FEATURE_SIMD
    impMarkContiguousSIMDFieldAssignments(stmt);
#endif

#ifdef DEBUGGING_SUPPORT

    /* Once we set impCurStmtOffs in an appended tree, we are ready to
       report the following offsets. So reset impCurStmtOffs */

    if (impTreeLast->gtStmt.gtStmtILoffsx == impCurStmtOffs)
    {
        impCurStmtOffsSet(BAD_IL_OFFSET);
    }

#endif

#ifdef DEBUG
    if (impLastILoffsStmt == nullptr)
    {
        impLastILoffsStmt = stmt;
    }

    if (verbose)
    {
        printf("\n\n");
        gtDispTree(stmt);
    }
#endif
}

這段代碼會添加一個GT_STMT節點到當前的impTreeLast鏈表中, 這個鏈表後面會在impEndTreeList分配給block->bbTreeList.
GT_STMT節點的內容以下:

*  stmtExpr  void
|  /--*  const     int    100
\--*  =         int
   \--*  lclVar    int    V01

能夠看到是把原來的分配節點GT_ASG放到了GT_STMT的下面.
微軟提供了一張Compiler, BasicBlock, GenTree的結構圖(HIR版):

這裏給出了最簡單的兩個指令ldc.i4.s和stloc.0的解析例子, 有興趣能夠本身分析更多類型的指令.
如今咱們能夠知道運行堆棧在JIT中用於關聯各個指令, 讓它們構建成一棵GenTree樹, 實際生成的代碼將不會有運行堆棧這個概念.

在處理完當前block後, 會添加block的後繼blocksuccessors到隊列impPendingList中:

for (unsigned i = 0; i < block->NumSucc(); i++)
{
    impImportBlockPending(block->GetSucc(i));
}

處理完全部block後, 每一個BasicBlock中就有了語句(GT_STMT)的鏈表, 每條語句下面都會有一個GenTree樹.

fgImport的例子以下:

PHASE_POST_IMPORT

這個階段負責從IL導入HIR(GenTree)後的一些工做, 包含如下的代碼:

// Maybe the caller was not interested in generating code
if (compIsForImportOnly())
{
    compFunctionTraceEnd(nullptr, 0, false);
    return;
}

#if !FEATURE_EH
// If we aren't yet supporting EH in a compiler bring-up, remove as many EH handlers as possible, so
// we can pass tests that contain try/catch EH, but don't actually throw any exceptions.
fgRemoveEH();
#endif // !FEATURE_EH

if (compileFlags->corJitFlags & CORJIT_FLG_BBINSTR)
{
    fgInstrumentMethod();
}

// We could allow ESP frames. Just need to reserve space for
// pushing EBP if the method becomes an EBP-frame after an edit.
// Note that requiring a EBP Frame disallows double alignment.  Thus if we change this
// we either have to disallow double alignment for E&C some other way or handle it in EETwain.

if (opts.compDbgEnC)
{
    codeGen->setFramePointerRequired(true);

    // Since we need a slots for security near ebp, its not possible
    // to do this after an Edit without shifting all the locals.
    // So we just always reserve space for these slots in case an Edit adds them
    opts.compNeedSecurityCheck = true;

    // We don't care about localloc right now. If we do support it,
    // EECodeManager::FixContextForEnC() needs to handle it smartly
    // in case the localloc was actually executed.
    //
    // compLocallocUsed            = true;
}

EndPhase(PHASE_POST_IMPORT);

這個階段負責import以後的一些零碎的處理.
若是隻須要檢查函數的IL是否合法, 那麼編譯時會帶CORJIT_FLG_IMPORT_ONLY, 在通過import階段後就不須要再繼續了.
fgInstrumentMethod用於插入profiler須要的語句, 這裏不詳細分析.
opts.compDbgEnC啓用時表明編譯IL程序集時用的是Debug配置, 這裏會標記須要使用frame pointer和須要安全檢查.
(x64容許函數不使用rbp寄存器保存進入函數前堆棧地址, 這樣能夠多出一個空餘的寄存器以生成更高效的代碼, 可是會讓debug更困難)

PHASE_MORPH

由於import階段只是簡單的把IL轉換成HIR, 轉換出來的HIR還須要進行加工.
這個階段負責了HIR的加工, 包含如下的代碼:

/* Initialize the BlockSet epoch */

NewBasicBlockEpoch();

/* Massage the trees so that we can generate code out of them */

fgMorph();
EndPhase(PHASE_MORPH);

NewBasicBlockEpoch更新了當前BasicBlock集合的epoch(fgCurBBEpoch), 這個值用於標識當前BasicBlock集合的版本.

fgMorph包含了這個階段主要的處理, 源代碼以下:

/*****************************************************************************
 *
 *  Transform all basic blocks for codegen.
 */

void Compiler::fgMorph()
{
    noway_assert(!compIsForInlining()); // Inlinee's compiler should never reach here.

    fgOutgoingArgTemps = nullptr;

#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In fgMorph()\n");
    }
    if (verboseTrees)
    {
        fgDispBasicBlocks(true);
    }
#endif // DEBUG

    // Insert call to class constructor as the first basic block if
    // we were asked to do so.
    if (info.compCompHnd->initClass(nullptr /* field */, info.compMethodHnd /* method */,
                                    impTokenLookupContextHandle /* context */) &
        CORINFO_INITCLASS_USE_HELPER)
    {
        fgEnsureFirstBBisScratch();
        fgInsertStmtAtBeg(fgFirstBB, fgInitThisClass());
    }

#ifdef DEBUG
    if (opts.compGcChecks)
    {
        for (unsigned i = 0; i < info.compArgsCount; i++)
        {
            if (lvaTable[i].TypeGet() == TYP_REF)
            {
                // confirm that the argument is a GC pointer (for debugging (GC stress))
                GenTreePtr      op   = gtNewLclvNode(i, TYP_REF);
                GenTreeArgList* args = gtNewArgList(op);
                op                   = gtNewHelperCallNode(CORINFO_HELP_CHECK_OBJ, TYP_VOID, 0, args);

                fgEnsureFirstBBisScratch();
                fgInsertStmtAtEnd(fgFirstBB, op);
            }
        }
    }

    if (opts.compStackCheckOnRet)
    {
        lvaReturnEspCheck                  = lvaGrabTempWithImplicitUse(false DEBUGARG("ReturnEspCheck"));
        lvaTable[lvaReturnEspCheck].lvType = TYP_INT;
    }

    if (opts.compStackCheckOnCall)
    {
        lvaCallEspCheck                  = lvaGrabTempWithImplicitUse(false DEBUGARG("CallEspCheck"));
        lvaTable[lvaCallEspCheck].lvType = TYP_INT;
    }
#endif // DEBUG

    /* Filter out unimported BBs */

    fgRemoveEmptyBlocks();

    /* Add any internal blocks/trees we may need */

    fgAddInternal();

#if OPT_BOOL_OPS
    fgMultipleNots = false;
#endif

#ifdef DEBUG
    /* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
    fgDebugCheckBBlist(false, false);
#endif // DEBUG

    /* Inline */
    fgInline();
#if 0
    JITDUMP("trees after inlining\n");
    DBEXEC(VERBOSE, fgDispBasicBlocks(true));
#endif

    RecordStateAtEndOfInlining(); // Record "start" values for post-inlining cycles and elapsed time.

#ifdef DEBUG
    /* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
    fgDebugCheckBBlist(false, false);
#endif // DEBUG

    /* For x64 and ARM64 we need to mark irregular parameters early so that they don't get promoted */
    fgMarkImplicitByRefArgs();

    /* Promote struct locals if necessary */
    fgPromoteStructs();

    /* Now it is the time to figure out what locals have address-taken. */
    fgMarkAddressExposedLocals();

#ifdef DEBUG
    /* Now that locals have address-taken marked, we can safely apply stress. */
    lvaStressLclFld();
    fgStress64RsltMul();
#endif // DEBUG

    /* Morph the trees in all the blocks of the method */

    fgMorphBlocks();

#if 0
    JITDUMP("trees after fgMorphBlocks\n");
    DBEXEC(VERBOSE, fgDispBasicBlocks(true));
#endif

    /* Decide the kind of code we want to generate */

    fgSetOptions();

    fgExpandQmarkNodes();

#ifdef DEBUG
    compCurBB = nullptr;
#endif // DEBUG
}

函數中的處理以下:

fgInsertStmtAtBeg(fgFirstBB, fgInitThisClass());
若是類型須要動態初始化(泛型而且有靜態構造函數), 在第一個block插入調用JIT_ClassInitDynamicClass的代碼

fgRemoveEmptyBlocks
枚舉全部未import(也就是說這個block中的代碼沒法到達)的block,
若是有則更新block的序號和epoch.

fgAddInternal:
添加內部的BasicBlock和GenTree.
首先若是函數不是靜態的, 且this變量須要傳出地址(ref)或者修改, 則須要一個內部的本地變量(lvaArg0Var)儲存this的值.
若是函數須要安全檢查(compNeedSecurityCheck), 則添加一個臨時變量(lvaSecurityObject).
若是當前平臺不是x86(32位), 則爲同步方法生成代碼, 進入時調用JIT_MonEnterWorker, 退出時調用JIT_MonExitWorker.
判斷是否要只生成一個return block(例如包含pinvoke的函數, 調用了非託管代碼的函數, 或者同步函數),
若是須要只生成一個return block, 則添加一個合併用的BasicBlock和儲存返回值用的本地變量, 這裏還不會把其餘return block重定向到新block.
若是函數有調用非託管函數, 則添加一個臨時變量(lvaInlinedPInvokeFrameVar).
若是啓用了JustMyCode, 則添加if (*pFlag != 0) { JIT_DbgIsJustMyCode() }到第一個block, 注意這裏的節點是QMARK(?:).
若是tiRuntimeCalloutNeeded成立則添加verificationRuntimeCheck(MethodHnd)到第一個block.

fgInline
這個函數負責內聯函數中的call,
雖然在微軟的文檔和我前一篇文章都把inline看成一個單獨的階段, 但在coreclr內部inline是屬於PHASE_MORPH的.
首先會建立一個根內聯上下文(rootContext), 而後把它分配到當前的全部語句(stmt)節點中, 內聯上下文用於標記語句來源於哪裏和組織一個樹結構.
而後枚舉全部語句(stmt), 判斷是否call而且是內聯候選(GTF_CALL_INLINE_CANDIDATE), 若是是則嘗試內聯(fgMorphCallInline).

前面的PHASE_IMPORTATION在導入call時會判斷是否內聯候選(impMarkInlineCandidate), 判斷的條件包含:
注意如下條件不必定正確, 可能會根據clr的版本或者運行環境(設置的內聯策略)不一樣而不一樣.

未開啓優化時不內聯
函數是尾調用則不內聯
函數的gtFlags & GTF_CALL_VIRT_KIND_MASK不等於GTF_CALL_NONVIRT時不內聯
函數是helper call時不內聯
函數是indirect call時不內聯
環境設置了COMPlus_AggressiveInlining時, 設置 CORINFO_FLG_FORCEINLINE
未設置CORINFO_FLG_FORCEINLINE且函數在catch或者filter中時不內聯
以前嘗試內聯失敗, 標記了CORINFO_FLG_DONT_INLINE時不內聯
同步函數(CORINFO_FLG_SYNCH)不內聯
函數須要安全檢查(CORINFO_FLG_SECURITYCHECK)則不內聯
若是函數有例外處理器則不內聯
函數無內容(大小=0)則不內聯
函數參數是vararg時不內聯
methodInfo中的本地變量數量大於MAX_INL_LCLS(32)時不內聯
methodInfo中的參數數量大於MAX_INL_LCLS時不內聯
判斷IL代碼大小
- 若是codesize <= CALLEE_IL_CODE_SIZE(16)則標記CALLEE_BELOW_ALWAYS_INLINE_SIZE
- 若是force inline則標記CALLEE_IS_FORCE_INLINE(例如標記了MethodImpl屬性)
- 若是codesize <= DEFAULT_MAX_INLINE_SIZE(100)則標記CALLEE_IS_DISCRETIONARY_INLINE, 後面根據利益判斷
- 標記CALLEE_TOO_MUCH_IL, 表示代碼過長不內聯
嘗試初始化函數所在的class
- 若是函數屬於generic definition, 則不能內聯
- 若是類型須要在訪問任何字段前初始化(IsBeforeFieldInit), 則不能內聯
- 若是未知足其餘early out條件, 嘗試了初始化class, 且失敗了則不能內聯
其餘判斷
- Boundary method的定義:
  - 會建立StackCrawlMark查找它的caller的函數
  - 調用知足以上條件的函數的函數 (標記爲IsMdRequireSecObject)
  - 調用虛方法的函數 (虛方法可能知足以上的條件)
- 調用Boundary method的函數不內聯
- 若是caller和callee的grant set或refuse set不一致則不內聯
- 判斷是否跨程序集
  - 同一程序集的則判斷可內聯
  - 不一樣程序集時, 要求如下任意一項成立
    - caller是full trust, refused set爲空
    - appdomain的IsHomogenous成立, 且caller和callee的refused set都爲空
  - 若是callee和caller所在的module不同, 且callee的string pool基於module
    - 則標記dwRestrictions |= INLINE_NO_CALLEE_LDSTR (callee中不能有ldstr)

以上條件都知足了就會標記call爲內聯候選, 並實際嘗試內聯(fgMorphCallInline), 嘗試內聯的步驟以下:

檢測是否相互內聯(a inline b, b inline a), 若是是則標記內聯失敗
經過內聯上下文檢測內聯層數是否過多, 超過DEFAULT_MAX_INLINE_DEPTH(20)則標記內聯失敗
針對callee調用jitNativeCode, 導入的BasicBlock和GenTree會在InlineeCompiler中
- 針對inline函數的利益分析(DetermineProfitability)將會在這裏進行(fgFindJumpTargets), 若是判斷不值得內聯則會返回失敗
- DetermineProfitability的算法:
  - m_CalleeNativeSizeEstimate = DetermineNativeSizeEstimate() // 使用statemachine估算的機器代碼大小
  - m_CallsiteNativeSizeEstimate = DetermineCallsiteNativeSizeEstimate(methodInfo) // 估算調用此函數的指令花費的機器代碼大小
  - m_Multiplier = DetermineMultiplier() // 係數, 值越大越容易內聯, 詳見DetermineMultiplier
  - threshold = (int)(m_CallsiteNativeSizeEstimate * m_Multiplier) // 閾值
  - 若是 m_CalleeNativeSizeEstimate > threshold 則設置不內聯, 也就是callee的機器代碼越大則越不容易內聯, 係數越大則越容易內聯
- 內聯最多處理到PHASE_IMPORTATION, 能夠參考上面compCompile函數的代碼

若是編譯callee成功, 而且是否內聯的判斷也經過則能夠把callee中的HIR嵌入到caller的HIR中:

若是InlineeCompiler中只有一個BasicBlock, 把該BasicBlock中的全部stmt插入到原stmt後面, 標記原來的stmt爲空
若是InlineeCompiler中有多個BasicBlock
- 按原stmt的位置分割所在的BasicBlock到topBlock和bottomBlock
- 插入callee的BasicBlock到topBlock和bottomBlock 之間
- 標記原stmt爲空, 原stmt還在topBlock中
原stmt下的call會被替換爲inline後的返回表達式

若是編譯callee失敗, 或者是否內聯的判斷不經過, 則須要恢復被修改的狀態:

清理新建立的本地變量, 恢復原有的本地變量數量(lvaCount)
若是調用結果不是void
- 把stmt中的expr設爲空, 原來的stmt仍會被retExpr引用, 後面會替換回來
取消原expr(call)的內聯候選標記(GTF_CALL_INLINE_CANDIDATE)

到最後會再一次的遍歷函數中引用了返回結果(retExpr)的樹, 若是內聯成功則替換節點到lclVar或者lclFld.

fgMarkImplicitByRefArgs

標記本地變量非標準大小的結構體爲BY_REF, 標記後結構體將不能被promote.
結構體的promote簡單的來講就是把結構體中的字段看成一個單獨的本地變量,
例如struct X { int a; int b; int c },
若是有本地變量X x, 則能夠替換這個變量爲三個本地變量int a; int b; int c;.
在x86下非標準大小是3, 5, 6, 7, >8, arm64下是>16.

fgPromoteStructs

提高struct中的變量做爲本地變量.
首先遍歷本地變量中的struct變量, 判斷是否應該提高, 依據包括(可能根據環境不一樣而不一樣):

若是本地變量總計有512個以上則不提高
若是變量在SIMD指令中使用則不提高
若是變量是HFA(homogeneous floating-point aggregate)類型則不提高
若是struct大小比sizeof(double) * 4更大則不提高
若是struct有4個以上的字段則不提高
若是struct有字段地址是重疊的(例如union)則不提高
若是struct有自定義layout而且是HFA類型則不提高
若是struct包含非primitive類型的字段則不提高
若是struct包含有特殊對齊的字段(fldOffset % fldSize) != 0)則不提高

若是判斷應該提高, 則會添加struct的全部成員到本地變量表(lvaTable)中,
原來的struct變量仍然會保留, 新添加的本地變量的lvParentLcl會指向原來的struct變量.

fgMarkAddressExposedLocals

標記全部地址被導出(經過ref傳給了其餘函數, 或者設到了全局變量)的本地變量, 這些本地變量將不能優化到寄存器中.
同時遍歷GenTree, 若是節點類型是GT_FIELD而且對應的struct變量已經promoted, 則修改節點爲lclVar.

fgMorphBlocks

這個函數也是個大頭, 裏面包含了各類對GenTree的變形處理, 由於處理至關多這裏我只列出幾個.
更多的處理能夠參考個人JIT筆記.

斷言建立(optAssertionGen)

根據一些GenTree模式能夠建立斷言(Assertion), 例如a = 1後能夠斷言a的值是1, b.abc()後能夠斷言b不等於null(已經檢查過一次null).
斷言能夠用於優化代碼, 例如歸併節點, 減小null檢查和減小邊界檢查.

斷言傳播(optAssertionProp)

根據建立的斷言能夠進行優化, 例如肯定本地變量等於常量時修改成該常量, 肯定對象不爲null時標記不須要null檢查等.
在PHASE_MORPH階段optAssertionProp只能作一些簡單的優化,
後面建立了SSA和VN之後的PHASE_ASSERTION_PROP_MAIN階段會再次調用這個函數進行更多優化.

轉換部分cpu不支持的操做到函數調用

例如在32位上對long(64bit)作除法時, 由於cpu不支持, 須要轉換爲jit helper call.

添加隱式拋出例外的BasicBlock

若是代碼中須要檢查數值是否溢出或者數組是否越界訪問, 則須要添加一個拋出例外的BasicBlock.
同一種類型的例外只會添加一個BasicBlock.
注意針對null的檢查不會添加BasicBlock, null檢查的實現機制是硬件異常, 詳見以前的文章.

轉換到效率更高的等價模式

一些模式, 例如x+產量1==常量2能夠轉換爲x==常量2-常量1=>x==常量3, 轉換後能夠減小計算的步驟.
其餘會轉換的模式還包括:

x >= y == 0 => x < y
x >= 1 => x > 0 (x是int)
x < 1 => x <= 0 (x是int)
(x+常量1)+(y+常量2) => (x+y)+常量3
x + 0 => x
等等

fgSetOptions

這個函數用於設置CodeGen(生成機器代碼的模塊)使用的選項, 包括:

genInterruptible: 是否生成徹底可中斷的代碼, 用於debugger
setFramePointerRequired: 是否要求保存frame pointer(rbp)
setFramePointerRequiredEH: EH表有內容時要求frame pointer, 變量跟上面同樣
setFramePointerRequiredGCInfo: 若是參數太多, 要安全檢查或者有動態長度參數則要求frame pointer, 同上

fgExpandQmarkNodes

這個函數用於分解QMark節點， QMark其實就是三元表達式, 例如x?123:321.
原本這樣的判斷會分爲三個BasicBlock, 但前面爲了方便就使用了QMark節點而不去修改BasicBlock.
這個函數會查找樹中的QMark節點, 轉換爲jTrue和添加BasicBlock.

PHASE_GS_COOKIE

若是函數中有unsafe buffer, 則會添加一個內部變量(GS Cookie)來檢測是否發生棧溢出.
這個階段負責了添加內部變量和添加設置內部變量的值的語句, 包含如下的代碼:

/* GS security checks for unsafe buffers */
if (getNeedsGSSecurityCookie())
{
#ifdef DEBUG
    if (verbose)
    {
        printf("\n*************** -GS checks for unsafe buffers \n");
    }
#endif

    gsGSChecksInitCookie();

    if (compGSReorderStackLayout)
    {
        gsCopyShadowParams();
    }

#ifdef DEBUG
    if (verbose)
    {
        fgDispBasicBlocks(true);
        printf("\n");
    }
#endif
}
EndPhase(PHASE_GS_COOKIE);

gsGSChecksInitCookie函數添加了一個新的本地變量(GS Cookie), 它的值是一個magic number, 在linux上它的值會是程序啓動時的GetTickCount().
後面CodeGen會在函數返回前檢查GS Cookie的值, 若是和預設的magic number不一致則調用CORINFO_HELP_FAIL_FAST函數.

PHASE_COMPUTE_PREDS

由於前面的morph階段可能會添加新的BasicBlock(內聯或者QMark),
這個階段會從新分配BasicBlock的序號而且計算preds(前繼block), 包含如下的代碼:

/* Compute bbNum, bbRefs and bbPreds */

JITDUMP("\nRenumbering the basic blocks for fgComputePred\n");
fgRenumberBlocks();

noway_assert(!fgComputePredsDone); // This is the first time full (not cheap) preds will be computed.
fgComputePreds();
EndPhase(PHASE_COMPUTE_PREDS);

fgRenumberBlocks函數對block的序號進行重排, 序號從1開始遞增.
fgComputePreds函數會從新計算各個block的preds(前繼block), 關於preds能夠參考前一篇文章中對Flowgraph Analysis的說明.

fgComputePreds的算法以下:

枚舉 BasicBlock
- block->bbRefs = 0
調用 fgRemovePreds, 刪除全部 BasicBlock 的 bbPreds
設置第一個 BasicBlock 的 fgFirstBB->bbRefs = 1
枚舉 BasicBlock
- 若是類型是 BBJ_LEAVE, BBJ_COND, BBJ_ALWAYS, BBJ_EHCATCHRET
  - 調用 fgAddRefPred(block->bbJumpDest, block, nullptr, true)
- 若是類型是 BBJ_NONE
  - 調用 fgAddRefPred(block->bbNext, block, nullptr, true)
- 若是類型是 BBJ_EHFILTERRET
  - 調用 fgAddRefPred(block->bbJumpDest, block, nullptr, true)
- 若是類型是 BBJ_EHFINALLYRET
  - 查找調用 finally funclet 的 block, 若是找到則 (調用完之後返回到bcall->bbNext)
  - fgAddRefPred(bcall->bbNext, block, nullptr, true)
- 若是類型是 BBJ_THROW, BBJ_RETURN
  - 不作處理
- 若是類型是 BBJ_SWITCH
  - 設置全部跳轉目標的 fgAddRefPred(*jumpTab, block, nullptr, true)

PHASE_MARK_GC_POLL_BLOCKS

這個階段判斷哪些block須要檢查是否正在運行gc, 包含如下的代碼:

/* If we need to emit GC Poll calls, mark the blocks that need them now.  This is conservative and can
 * be optimized later. */
fgMarkGCPollBlocks();
EndPhase(PHASE_MARK_GC_POLL_BLOCKS);

fgMarkGCPollBlocks函數會枚舉BasicBlock,
若是block會向前跳轉(例如循環), 或者block是返回block, 則標記block->bbFlags |= BBF_NEEDS_GCPOLL.
標記了BBF_NEEDS_GCPOLL的block會在後面插入調用CORINFO_HELP_POLL_GC函數的代碼, 用於在運行gc時暫停當前的線程.

PHASE_COMPUTE_EDGE_WEIGHTS

這個階段會計算block和block edge的權重(weight), 包含如下的代碼:

/* From this point on the flowgraph information such as bbNum,
 * bbRefs or bbPreds has to be kept updated */

// Compute the edge weights (if we have profile data)
fgComputeEdgeWeights();
EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS);

block的權重(BasicBlock::bbWeight)用於表示block中的代碼是否容易被運行, 權重值默認爲1, 越高表明block越容易被運行.
edge是一個表示block之間的跳轉的術語, edge weight越大表示兩個block之間的跳轉越容易發生,
edge weight保存在BasicBlock::bbPreds鏈表的元素(類型是flowList)中, 分別有兩個值flEdgeWeightMin和flEdgeWeightMax.
edge weight的計算很是複雜, 具體請看fgAddRefPred和fgComputeEdgeWeights函數.
對於較少運行block會標記BBF_RUN_RARELY,
這個標記會在後面用於分析哪些block是熱(hot)的, 哪些block是冷(cold)的, cold block有可能會排到後面並使用不一樣的heap塊.

PHASE_CREATE_FUNCLETS

這個階段會爲例外處理器(例如catch和finally)建立小函數(funclet), 包含如下的代碼:

#if FEATURE_EH_FUNCLETS

/* Create funclets from the EH handlers. */

fgCreateFunclets();
EndPhase(PHASE_CREATE_FUNCLETS);

#endif // FEATURE_EH_FUNCLETS

小函數(funclet)是x64(64位)上調用例外處理器的方式, x86(32位)上不會採用這種方式.
例如代碼是:

int x = GetX();
try {
    Console.WriteLine(x);
    throw new Exception("abc");
} catch (Exception ex) {
    Console.WriteLine(ex);
    Console.WriteLine(x);
}

在x64上會生成如下的彙編代碼:

生成的主函數
00007FFF0FEC0480 55                   push        rbp // 備份原rbp
00007FFF0FEC0481 56                   push        rsi // 備份原rsi
00007FFF0FEC0482 48 83 EC 38          sub         rsp,38h // 預留本地變量空間, 大小0x38
00007FFF0FEC0486 48 8D 6C 24 40       lea         rbp,[rsp+40h] // rbp等於push rbp以前rsp的地址(0x38+0x8)
00007FFF0FEC048B 48 89 65 E0          mov         qword ptr [rbp-20h],rsp // 保存預留本地變量後的rsp, 到本地變量[rbp-0x20], 也就是PSPSym
00007FFF0FEC048F E8 24 FC FF FF       call        00007FFF0FEC00B8 // 調用GetX()
00007FFF0FEC0494 89 45 F4             mov         dword ptr [rbp-0Ch],eax // 返回結果存本地變量[rbp-0x0c], 也就是x
   185:             try {
   186:                 Console.WriteLine(x);
00007FFF0FEC0497 8B 4D F4             mov         ecx,dword ptr [rbp-0Ch] // x => 第一個參數
00007FFF0FEC049A E8 B9 FE FF FF       call        00007FFF0FEC0358 // 調用Console.WriteLine
   187:                 throw new Exception("abc");
00007FFF0FEC049F 48 B9 B8 58 6C 6E FF 7F 00 00 mov         rcx,7FFF6E6C58B8h // Exception的MethodTable => 第一個參數
00007FFF0FEC04A9 E8 A2 35 B1 5F       call        00007FFF6F9D3A50 // 調用CORINFO_HELP_NEWFAST(JIT_New, 或彙編版本)
00007FFF0FEC04AE 48 8B F0             mov         rsi,rax // 例外對象存rsi
00007FFF0FEC04B1 B9 12 02 00 00       mov         ecx,212h // rid => 第一個參數
00007FFF0FEC04B6 48 BA 78 4D D6 0F FF 7F 00 00 mov         rdx,7FFF0FD64D78h // module handle => 第二個參數
00007FFF0FEC04C0 E8 6B 20 AF 5F       call        00007FFF6F9B2530 // 調用CORINFO_HELP_STRCNS(JIT_StrCns), 用於lazy load字符串常量對象
00007FFF0FEC04C5 48 8B D0             mov         rdx,rax // 常量字符串對象 => 第二個參數
00007FFF0FEC04C8 48 8B CE             mov         rcx,rsi // 例外對象 => 第一個參數
00007FFF0FEC04CB E8 20 07 43 5E       call        00007FFF6E2F0BF0 // 調用System.Exception:.ctor
00007FFF0FEC04D0 48 8B CE             mov         rcx,rsi // 例外對象 => 第一個參數
00007FFF0FEC04D3 E8 48 FC A0 5F       call        00007FFF6F8D0120 // 調用CORINFO_HELP_THROW(IL_Throw)
00007FFF0FEC04D8 CC                   int         3 // unreachable
00007FFF0FEC04D9 48 8D 65 F8          lea         rsp,[rbp-8] // 恢復到備份rbp和rsi後的地址
00007FFF0FEC04DD 5E                   pop         rsi // 恢復rsi
00007FFF0FEC04DE 5D                   pop         rbp // 恢復rbp
00007FFF0FEC04DF C3                   ret
生成的funclet
00007FFF0FEC04E0 55                   push        rbp // 備份rbp
00007FFF0FEC04E1 56                   push        rsi // 備份rsi
00007FFF0FEC04E2 48 83 EC 28          sub         rsp,28h // 本地的rsp預留0x28(PSP slot 0x8 + Outgoing arg space 0x20(若是funclet會調用其餘函數))
00007FFF0FEC04E6 48 8B 69 20          mov         rbp,qword ptr [rcx+20h] // rcx是InitialSP(預留本地變量後的rsp)
                                                                        // 原函數的rbp跟rsp差40, 因此[InitialSP+20h]等於[rbp-20h], 也就是PSPSym
                                                                        // 這個例子中由於只有一層, PSPSym裏面保存的值跟傳入的rcx同樣(InitialSP)
00007FFF0FEC04EA 48 89 6C 24 20       mov         qword ptr [rsp+20h],rbp // 複製PSPSym到funclet本身的frame
00007FFF0FEC04EF 48 8D 6D 40          lea         rbp,[rbp+40h] // 原函數的rbp跟rsp差40, 計算得出原函數的rbp
   188:             } catch (Exception ex) {
   189:                 Console.WriteLine(ex);
00007FFF0FEC04F3 48 8B CA             mov         rcx,rdx // rdx例外對象, 移動到第一個參數
00007FFF0FEC04F6 E8 7D FE FF FF       call        00007FFF0FEC0378 // 調用Console.WriteLine
   190:                 Console.WriteLine(x);
00007FFF0FEC04FB 8B 4D F4             mov         ecx,dword ptr [rbp-0Ch] // [rbp-0xc]就是變量x, 移動到第一個參數
00007FFF0FEC04FE E8 55 FE FF FF       call        00007FFF0FEC0358 // 調用Console.WriteLine
00007FFF0FEC0503 48 8D 05 CF FF FF FF lea         rax,[7FFF0FEC04D9h] // 恢復執行的地址
00007FFF0FEC050A 48 83 C4 28          add         rsp,28h // 釋放本地的rsp預留的空間
00007FFF0FEC050E 5E                   pop         rsi // 恢復rsi
00007FFF0FEC050F 5D                   pop         rbp // 恢復rbp
00007FFF0FEC0510 C3                   ret

咱們能夠看到在x64上實質上會爲例外處理器單獨生成一個小函數(00007FFF0FEC04E0~00007FFF0FEC0510),
發生例外時將會調用這個小函數進行處理, 處理完返回主函數.

fgCreateFunclets負責建立funclet, 源代碼以下:

/*****************************************************************************
 *
 *  Function to create funclets out of all EH catch/finally/fault blocks.
 *  We only move filter and handler blocks, not try blocks.
 */

void Compiler::fgCreateFunclets()
{
    assert(!fgFuncletsCreated);

#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In fgCreateFunclets()\n");
    }
#endif

    fgCreateFuncletPrologBlocks();

    unsigned           XTnum;
    EHblkDsc*          HBtab;
    const unsigned int funcCnt = ehFuncletCount() + 1;

    if (!FitsIn<unsigned short>(funcCnt))
    {
        IMPL_LIMITATION("Too many funclets");
    }

    FuncInfoDsc* funcInfo = new (this, CMK_BasicBlock) FuncInfoDsc[funcCnt];

    unsigned short funcIdx;

    // Setup the root FuncInfoDsc and prepare to start associating
    // FuncInfoDsc's with their corresponding EH region
    memset((void*)funcInfo, 0, funcCnt * sizeof(FuncInfoDsc));
    assert(funcInfo[0].funKind == FUNC_ROOT);
    funcIdx = 1;

    // Because we iterate from the top to the bottom of the compHndBBtab array, we are iterating
    // from most nested (innermost) to least nested (outermost) EH region. It would be reasonable
    // to iterate in the opposite order, but the order of funclets shouldn't matter.
    //
    // We move every handler region to the end of the function: each handler will become a funclet.
    //
    // Note that fgRelocateEHRange() can add new entries to the EH table. However, they will always
    // be added *after* the current index, so our iteration here is not invalidated.
    // It *can* invalidate the compHndBBtab pointer itself, though, if it gets reallocated!

    for (XTnum = 0; XTnum < compHndBBtabCount; XTnum++)
    {
        HBtab = ehGetDsc(XTnum); // must re-compute this every loop, since fgRelocateEHRange changes the table
        if (HBtab->HasFilter())
        {
            assert(funcIdx < funcCnt);
            funcInfo[funcIdx].funKind    = FUNC_FILTER;
            funcInfo[funcIdx].funEHIndex = (unsigned short)XTnum;
            funcIdx++;
        }
        assert(funcIdx < funcCnt);
        funcInfo[funcIdx].funKind    = FUNC_HANDLER;
        funcInfo[funcIdx].funEHIndex = (unsigned short)XTnum;
        HBtab->ebdFuncIndex          = funcIdx;
        funcIdx++;
        fgRelocateEHRange(XTnum, FG_RELOCATE_HANDLER);
    }

    // We better have populated all of them by now
    assert(funcIdx == funcCnt);

    // Publish
    compCurrFuncIdx   = 0;
    compFuncInfos     = funcInfo;
    compFuncInfoCount = (unsigned short)funcCnt;

    fgFuncletsCreated = true;

#if DEBUG
    if (verbose)
    {
        JITDUMP("\nAfter fgCreateFunclets()");
        fgDispBasicBlocks();
        fgDispHandlerTab();
    }

    fgVerifyHandlerTab();
    fgDebugCheckBBlist();
#endif // DEBUG
}

首先fgCreateFuncletPrologBlocks函數枚舉EH表,
若是handler對應的第一個block可能從handler中的其餘block跳轉(第一個block在循環中),
那麼這個block可能會運行屢次, funclet的prolog代碼將不能插入到這個block, 遇到這種狀況須要在handler的第一個block前插入一個新的block.
而後分配一個保存函數信息的數組保存到compFuncInfos, 第0個元素是主函數, 後面的元素都是funclet.
最後枚舉EH表, 填充compFuncInfos中的元素, 而且調用fgRelocateEHRange函數.

fgRelocateEHRange函數把handler範圍內的block移動到BasicBlock列表的最後面, CodeGen生成代碼時也會聽從這個佈局, 把funclet生成在主函數的後面.
例如移動前的block是這樣的:

-------------------------------------------------------------------------------------------------------------------------------------
BBnum         descAddr ref try hnd preds           weight   [IL range]      [jump]      [EH region]         [flags]
-------------------------------------------------------------------------------------------------------------------------------------
BB01 [000000000137BC60]  1                              1   [000..006)                                     i label target gcsafe 
BB02 [000000000137BD70]  1  0    BB01                   0   [006..017)        (throw ) T0      try { }     keep i try rare label gcsafe newobj 
BB03 [000000000137BE80]  0     0                        1   [017..024)-> BB04 ( cret )    H0   catch { }   keep i label target gcsafe 
BB04 [000000000137BF90]  1       BB03                   1   [024..025)        (return)                     i label target 
-------------------------------------------------------------------------------------------------------------------------------------

移動後就會變成這樣:

-------------------------------------------------------------------------------------------------------------------------------------
BBnum         descAddr ref try hnd preds           weight   [IL range]      [jump]      [EH region]         [flags]
-------------------------------------------------------------------------------------------------------------------------------------
BB01 [000000000137BC60]  1                              1   [000..006)                                     i label target gcsafe 
BB02 [000000000137BD70]  1  0    BB01                   0   [006..017)        (throw ) T0      try { }     keep i try rare label gcsafe newobj 
BB04 [000000000137BF90]  1       BB03                   1   [024..025)        (return)                     i label target 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB03 [000000000137BE80]  0     0                        1   [017..024)-> BB04 ( cret )    H0 F catch { }   keep i label target gcsafe flet 
-------------------------------------------------------------------------------------------------------------------------------------

PHASE_OPTIMIZE_LAYOUT

這個階段會優化BasicBlock的佈局(順序), 包含如下的代碼:

if (!opts.MinOpts() && !opts.compDbgCode)
{
    optOptimizeLayout();
    EndPhase(PHASE_OPTIMIZE_LAYOUT);

    // Compute reachability sets and dominators.
    fgComputeReachability();
}

optOptimizeLayout的源代碼以下:

/*****************************************************************************
 *
 *  Optimize the BasicBlock layout of the method
 */

void Compiler::optOptimizeLayout()
{
    noway_assert(!opts.MinOpts() && !opts.compDbgCode);

#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In optOptimizeLayout()\n");
        fgDispHandlerTab();
    }

    /* Check that the flowgraph data (bbNum, bbRefs, bbPreds) is up-to-date */
    fgDebugCheckBBlist();
#endif

    noway_assert(fgModified == false);

    for (BasicBlock* block = fgFirstBB; block; block = block->bbNext)
    {
        /* Make sure the appropriate fields are initialized */

        if (block->bbWeight == BB_ZERO_WEIGHT)
        {
            /* Zero weighted block can't have a LOOP_HEAD flag */
            noway_assert(block->isLoopHead() == false);
            continue;
        }

        assert(block->bbLoopNum == 0);

        if (compCodeOpt() != SMALL_CODE)
        {
            /* Optimize "while(cond){}" loops to "cond; do{}while(cond);" */

            fgOptWhileLoop(block);
        }
    }

    if (fgModified)
    {
        // Recompute the edge weight if we have modified the flow graph in fgOptWhileLoop
        fgComputeEdgeWeights();
    }

    fgUpdateFlowGraph(true);
    fgReorderBlocks();
    fgUpdateFlowGraph();
}

fgOptWhileLoop函數會優化while結構, 如裏面的註釋,
優化前的結構以下:

jmp test
loop:
        ...
        ...
test:
        cond
        jtrue   loop

優化後的結構以下, 加了一個事前的檢測:

cond
        jfalse done
        // else fall-through
loop:
        ...
        ...
test:
        cond
        jtrue   loop
done:

若是fgOptWhileLoop有更新則調用fgComputeEdgeWeights從新計算權重.

fgUpdateFlowGraph函數會刪除空block, 沒法到達的block和多餘的跳轉.
若是傳給fgUpdateFlowGraph的參數doTailDuplication是true還會執行如下的優化:
優化前的代碼:

block:
    jmp target
target:
    cond
    jtrue succ
fallthrough:
    ...
succ:
    ...

優化後的代碼:
優化後target可能會變得多餘, 因此下面還會執行一次參數是false的fgUpdateFlowGraph來刪除它.

block:
    cond
    jtrue succ
new:
    jmp fallthrough
target:
    cond
    jtrue succ
fallthrough:
    ...
succ:
    ...

fgReorderBlocks函數根據以前計算的權重(bbWeight)把比較少運行的block排到後面, 到後面這些block可能會變成cold code而且與hot code分開寫入.

fgComputeReachability

這個函數負責計算能夠到達block的block集合和DOM樹, 沒有標記所屬的階段, 包含如下的代碼:

// Compute reachability sets and dominators.
fgComputeReachability();

fgComputeReachability的源代碼以下:

/*****************************************************************************
 *
 *  Function called to compute the dominator and reachable sets.
 *
 *  Assumes the predecessor lists are computed and correct.
 */

void Compiler::fgComputeReachability()
{
#ifdef DEBUG
    if (verbose)
    {
        printf("*************** In fgComputeReachability\n");
    }

    fgVerifyHandlerTab();

    // Make sure that the predecessor lists are accurate
    assert(fgComputePredsDone);
    fgDebugCheckBBlist();
#endif // DEBUG

    /* Create a list of all BBJ_RETURN blocks. The head of the list is 'fgReturnBlocks'. */
    fgReturnBlocks = nullptr;

    for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
    {
        // If this is a BBJ_RETURN block, add it to our list of all BBJ_RETURN blocks. This list is only
        // used to find return blocks.
        if (block->bbJumpKind == BBJ_RETURN)
        {
            fgReturnBlocks = new (this, CMK_Reachability) BasicBlockList(block, fgReturnBlocks);
        }
    }

    // Compute reachability and then delete blocks determined to be unreachable. If we delete blocks, we
    // need to loop, as that might have caused more blocks to become unreachable. This can happen in the
    // case where a call to a finally is unreachable and deleted (maybe the call to the finally is
    // preceded by a throw or an infinite loop), making the blocks following the finally unreachable.
    // However, all EH entry blocks are considered global entry blocks, causing the blocks following the
    // call to the finally to stay rooted, until a second round of reachability is done.
    // The dominator algorithm expects that all blocks can be reached from the fgEnterBlks set.
    unsigned passNum = 1;
    bool     changed;
    do
    {
        // Just to be paranoid, avoid infinite loops; fall back to minopts.
        if (passNum > 10)
        {
            noway_assert(!"Too many unreachable block removal loops");
        }

        /* Walk the flow graph, reassign block numbers to keep them in ascending order */
        JITDUMP("\nRenumbering the basic blocks for fgComputeReachability pass #%u\n", passNum);
        passNum++;
        fgRenumberBlocks();

        //
        // Compute fgEnterBlks
        //

        fgComputeEnterBlocksSet();

        //
        // Compute bbReach
        //

        fgComputeReachabilitySets();

        //
        // Use reachability information to delete unreachable blocks.
        // Also, determine if the flow graph has loops and set 'fgHasLoops' accordingly.
        // Set the BBF_LOOP_HEAD flag on the block target of backwards branches.
        //

        changed = fgRemoveUnreachableBlocks();

    } while (changed);

#ifdef DEBUG
    if (verbose)
    {
        printf("\nAfter computing reachability:\n");
        fgDispBasicBlocks(verboseTrees);
        printf("\n");
    }

    fgVerifyHandlerTab();
    fgDebugCheckBBlist(true);
#endif // DEBUG

    //
    // Now, compute the dominators
    //

    fgComputeDoms();
}

首先這個函數會把全部返回的block添加到fgReturnBlocks鏈表,
而後調用fgRenumberBlocks從新給block分配序號(下面的處理要求block的序號是整理過的),
而後調用fgComputeEnterBlocksSet把進入函數或者funclet的block(fgFirstBB和各個例外處理器的第一個block)加到fgEnterBlks集合中.
而後調用fgComputeReachabilitySets計算哪些block能夠到達block(block自身和全部preds的bbReach的union)並保存到BasicBlock::bbReach.
而後調用fgRemoveUnreachableBlocks把不可從函數入口(fgEnterBlks)到達的block(fgEnterBlks | bbReach爲空)刪除.
最後調用fgComputeDoms計算DOM樹.

關於DOM(dominator)樹請參考前一篇文章中對Flowgraph Analysis的介紹,
一句話來講若是進入Block B必須通過Block A, 則稱A是B的Dominator, 最近的Dominator會保存在BasicBlock::bbIDom中.
CoreCLR中計算DOM樹的算法跟這篇論文中的算法同樣.

PHASE_ALLOCATE_OBJECTS

這個階段負責把GT_ALLOCOBJ節點轉換爲GT_CALL節點, 包含如下的代碼:

// Transform each GT_ALLOCOBJ node into either an allocation helper call or
// local variable allocation on the stack.
ObjectAllocator objectAllocator(this);
objectAllocator.Run();

在以前分析new的文章中提到過,
ObjectAllocator::Run會把allocobj節點轉換爲具體的jit helper call.
轉換爲call之後就和普通的函數調用同樣了, 參數接收MethodTable的指針, 返回新建立的對象(構造函數未調用, 字段值全是0).

PHASE_OPTIMIZE_LOOPS

這個階段負責識別和標記函數中的循環, 包含如下的代碼:

optOptimizeLoops();
EndPhase(PHASE_OPTIMIZE_LOOPS);

optOptimizeLoops的處理以下:

首先調用optSetBlockWeights, 根據DOM樹設置不能到達return block的block的權重(bbWeight) /= 2.
而後調用optFindNaturalLoops, 根據DOM樹識別出循環並保存循環的信息到optLoopTable.

一個循環包含如下的組成部分(來源於optFindNaturalLoops的註釋):

/* We will use the following terminology:
 * HEAD    - the basic block that flows into the loop ENTRY block (Currently MUST be lexically before entry).
             Not part of the looping of the loop.
 * FIRST   - the lexically first basic block (in bbNext order) within this loop.  (May be part of a nested loop,
 *           but not the outer loop. ???)
 * TOP     - the target of the backward edge from BOTTOM. In most cases FIRST and TOP are the same.
 * BOTTOM  - the lexically last block in the loop (i.e. the block from which we jump to the top)
 * EXIT    - the loop exit or the block right after the bottom
 * ENTRY   - the entry in the loop (not necessarly the TOP), but there must be only one entry
 *
 * We (currently) require the body of a loop to be a contiguous (in bbNext order) sequence of basic blocks.
        |
        v
      head
        |
        |    top/beg <--+
        |       |       |
        |      ...      |
        |       |       |
        |       v       |
        +---> entry     |
                |       |
               ...      |
                |       |
                v       |
         +-- exit/tail  |
         |      |       |
         |     ...      |
         |      |       |
         |      v       |
         |    bottom ---+
         |
         +------+
                |
                v
 */

最後枚舉循環中(top~bottom)的block並調用optMarkLoopBlocks.
optMarkLoopBlocks會增長循環中的block的權重(bbWeight),
對於backedge block(block的preds是比它更後的block, 例如循環的第一個block)的dominator, 權重會乘以BB_LOOP_WEIGHT(8), 不然乘以BB_LOOP_WEIGHT/2(4).

PHASE_CLONE_LOOPS

這個階段用於執行復制循環的優化, 包含如下的代碼:

// Clone loops with optimization opportunities, and
// choose the one based on dynamic condition evaluation.
optCloneLoops();
EndPhase(PHASE_CLONE_LOOPS);

上一個階段PHASE_OPTIMIZE_LOOPS找出了函數中的循環信息,
optCloneLoops會判斷哪些循環能夠執行復制循環的優化並執行.
複製循環的優化具體以下(來源於optCloneLoop的註釋):

// We're going to make

// Head --> Entry
// First
// Top
// Entry
// Bottom  ?-> Top
// X
//
//   become
//
// Head ?-> Entry2
// Head2--> Entry    (Optional; if Entry == Top == First, let Head fall through to F/T/E)
// First
// Top
// Entry
// Bottom  ?-> Top
// X2--> X
// First2
// Top2
// Entry2
// Bottom2 ?-> Top2
// X

更具體的例子:

for (var x = 0; x < a.Length; ++x) {
    b[x] = a[x];
}

(optCloneLoop)[https://github.com/dotnet/coreclr/blob/v1.1.0/src/jit/optimizer.cpp#L4420]前:

if (x < a.Length) {
    do {
        var tmp = a[x];
        b[x] = tmp;
        x = x + 1;
    } while (x < a.Length);
}

(optCloneLoop)[https://github.com/dotnet/coreclr/blob/v1.1.0/src/jit/optimizer.cpp#L4420]後:

if (x < a.Length) {
    if ((a != null && b != null) && (a.Length <= b.Length)) {
        do {
            var tmp = a[x]; // no bounds check
            b[x] = tmp; // no bounds check
            x = x + 1;
        } while (x < a.Length);
    } else {
        do {
            var tmp = a[x];
            b[x] = tmp;
            x = x + 1;
        } while (x < a.Length);
    }
}

這個優化的目的是在確保不會越界的狀況(運行時)下, 能夠省略掉循環中的邊界檢查.

PHASE_UNROLL_LOOPS

這個階段用於執行展開循環的優化, 包含如下的代碼:

/* Unroll loops */
optUnrollLoops();
EndPhase(PHASE_UNROLL_LOOPS);

optUnrollLoops會嘗試展開循環, 展開循環的條件有:

循環次數在編譯時能夠肯定
當前編譯模式不是debug, 也不須要小代碼優化
循環中代碼體積不超過UNROLL_LIMIT_SZ(值參考代碼)
循環次數不超過ITER_LIMIT(值參考代碼)

知足時將會把循環體按次數進行復制, 例如for (var x = 0; x < 3; ++x) { abc(); }會優化成abc(); abc(); abc();.

PHASE_MARK_LOCAL_VARS

這個階段會更新本地變量表lvaTable中的信息, 包含如下的代碼:

/* Create the variable table (and compute variable ref counts) */

lvaMarkLocalVars();
EndPhase(PHASE_MARK_LOCAL_VARS);

lvaMarkLocalVars的處理以下:

調用lvaAllocOutgoingArgSpace
- 添加本地變量lvaOutgoingArgSpaceVar
- 在x86上經過棧傳遞參數的時候會使用push, 在其餘平臺上能夠直接複製值到這個變量完成棧參數的傳遞
- 參考這個文檔中對FEATURE_FIXED_OUT_ARGS的說明
若是平臺是x86(須要ShadowSP slots)
- 添加本地變量lvaShadowSPslotsVar
- 由於x86不會生成funclet, 例外處理機制須要使用額外的變量
- 參考這個文檔中對ShadowSP slots的說明
若是平臺不是x86(須要使用funclet)
- 添加本地變量lvaPSPSym
- PSPSym的全稱是Previous Stack Pointer Symbol, 是一個指針大小的值, 保存上一個函數的堆棧地址
- 調用eh funclet的時候恢復rsp到main function的rsp值, funclet就能夠訪問到原來的本地變量
- 參考這個文檔中對PSPSym的說明
- 還能夠參考上面funclet的例子中的彙編代碼
若是使用了localloc(stackalloc)
- 添加本地變量lvaLocAllocSPvar
- 用於保存修改後的rsp地址(genLclHeap)
若是當前是除錯模式則給各個本地變量分配序號
- varDsc->lvSlotNum = lclNum (從0開始遞增)
枚舉BasicBlock, 調用lvaMarkLocalVars
- 枚舉block中的樹更新本地變量的引用計數
若是本地變量用於儲存來源於寄存器的引用參數, 則添加兩次引用次數
若是lvaKeepAliveAndReportThis成立(例如同步函數須要unlock this)
- 而且若是該函數中無其餘部分使用this, 則設置this的引用計數爲1
若是lvaReportParamTypeArg成立
- 而且若是該函數中無其餘部分使用這個變量, 則設置這個變量的引用計數爲1
- paramTypeArg(Generic Context)的做用是調用時傳入MethodDesc
- 例如new A<string>().Generic<int>(123)時會傳入Generic<int>對應的MethodDesc
調用lvaSortByRefCount
- 判斷各個本地變量是否能夠跟蹤(lvTracked), 和是否能夠存到寄存器(lvDoNotEnregister)
- 生成小代碼時按lvRefCnt, 不然按lvRefCntWtd從大到小排序本地變量
- 排序後生成新的lvaCurEpoch

PHASE_OPTIMIZE_BOOLS

這個階段用於合併相鄰的兩個根據條件跳轉的BasicBlock, 包含如下的代碼:

/* Optimize boolean conditions */

optOptimizeBools();
EndPhase(PHASE_OPTIMIZE_BOOLS);

// optOptimizeBools() might have changed the number of blocks; the dominators/reachability might be bad.

optOptimizeBools會作如下的優化:

若是block的結構以下, 且B2中只有單條指令:

B1: brtrue(t1, BX)
B2: brtrue(t2, BX)
B3

則轉換爲如下的結構:

B1: brtrue(t1|t2, BX)
B3:

若是block的結構以下, 且B2中只有單條指令:

B1: brtrue(t1, B3)
B2: brtrue(t2, BX)
B3:
...
BX:

則轉換爲如下的結構:

B1: brtrue((!t1)&&t2, BX)
B3:
...
BX:

PHASE_FIND_OPER_ORDER

這個階段用於判斷各個節點(GenTree)的評價順序並設置它們的運行和體積成本, 包含如下的代碼:

/* Figure out the order in which operators are to be evaluated */
fgFindOperOrder();
EndPhase(PHASE_FIND_OPER_ORDER);

fgFindOperOrder對每一個BasicBlock中的語句調用gtSetStmtInfo.
gtSetStmtInfo針對GenTree遞歸調用gtSetEvalOrder.
gtSetEvalOrder函數會設置GenTree的運行成本(gtCostEx)和體積成本(gtCostSz),
且若是一個知足交換律的二元運算符的第二個參數成本比第一個參數高時, 標記這個運算須要先評價第二個參數.
運行成本(gtCostEx)和體積成本(gtCostSz)在後面用於判斷是否值得執行CSE優化.

PHASE_SET_BLOCK_ORDER

這個階段用於按各個節點(GenTree)的評價順序把它們連成一個鏈表(LIR格式), 包含如下的代碼:

// Weave the tree lists. Anyone who modifies the tree shapes after
// this point is responsible for calling fgSetStmtSeq() to keep the
// nodes properly linked.
// This can create GC poll calls, and create new BasicBlocks (without updating dominators/reachability).
fgSetBlockOrder();
EndPhase(PHASE_SET_BLOCK_ORDER);

// IMPORTANT, after this point, every place where tree topology changes must redo evaluation
// order (gtSetStmtInfo) and relink nodes (fgSetStmtSeq) if required.
CLANG_FORMAT_COMMENT_ANCHOR;

fgSetBlockOrder會作如下的事情:

判斷是否要生成可中斷的代碼(例若有循環時須要生成), 若是要則設置genInterruptible = true
調用fgCreateGCPolls
- 枚舉BasicBlock, 若是block標記爲BBF_NEEDS_GCPOLL則插入調用CORINFO_HELP_POLL_GC(JIT_PollGC)的代碼
- JIT_PollGC會在運行GC時暫停當前的線程
枚舉BasicBlock, 調用fgSetBlockOrder
- 枚舉block中的語句, 調用fgSetStmtSeq
  - 對於語句中的節點(GenTree)遞歸調用fgSetTreeSeqHelper
    - 例如 a + b 會分別對 a, b, + 這3個節點調用fgSetTreeSeqFinish
    - fgSetTreeSeqFinish調用時會增長fgTreeSeqNum, 而且添加節點到鏈表fgTreeSeqLst
    - 所有完成後鏈表fgTreeSeqLst保存了全部GenTree節點, 這就是LIR的結構, 但正式使用LIR還要再通過幾個階段

PHASE_BUILD_SSA

這個階段負責對訪問本地變量的GenTree標記SSA版本, 包含如下的代碼:

if (doSsa)
{
    fgSsaBuild();
    EndPhase(PHASE_BUILD_SSA);
}

fgSsaBuild會給訪問本地變量的節點(例如lclvar)分配SSA版本,
訪問的形式有USE(讀取了變量), DEF(寫入了變量), USEASG(讀取而後寫入了變量, 例如+=),
變量的值寫入一次SSA版本會加1, 同時讀取的節點也會標記讀取的是哪一個版本的值, SSA版本保存在節點的GenTreeLclVarCommon::_gtSsaNum成員中.
若是讀取的值來源於不一樣的block, 須要在運行時肯定則在block的開頭添加phi節點.
前一篇文章介紹了標記SSA的例子, 以下:

fgSsaBuild的具體算法比較複雜, 請參考個人JIT筆記中的信息或者源代碼.

PHASE_EARLY_PROP

這個階段會根據SSA追蹤本地變量並作出簡單的優化, 包含如下的代碼:

if (doEarlyProp)
{
    /* Propagate array length and rewrite getType() method call */
    optEarlyProp();
    EndPhase(PHASE_EARLY_PROP);
}

optEarlyProp的處理以下:

枚舉BasicBlock和BasicBlock中的語句
- 按執行順序枚舉語句中的tree, 調用optEarlyPropRewriteTree
  - 對於GT_ARR_LENGTH節點(獲取數組長度的節點), 基於SSA跟蹤數組的來源, 若是跟蹤到new 數組[常量], 則把該節點替換爲常量
  - 對於使用GT_INDIR獲取MethodTable(vtable)的節點, 基於SSA追蹤對象的來源, 和上面同樣找到則把節點替換爲常量
  - 對於獲取對象成員而且須要檢查null的節點, 若是成員的offset不超過必定值則能夠去除nullcheck(由於必定會發生頁錯誤), 在以前的文章中有提到過這個機制
  - 若是節點有修改則調用gtSetStmtInfo從新計算運行和體積成本
  - 若是節點有修改則調用fgSetStmtSeq更新GenTree的鏈表

PHASE_VALUE_NUMBER

這個階段會爲GenTree分配VN(Value Number), 包含如下的代碼:

if (doValueNum)
{
    fgValueNumber();
    EndPhase(PHASE_VALUE_NUMBER);
}

前面的SSA是針對訪問本地變量的節點(GenTree)分配一個惟一的版本號, 版本號一致則值一致,
這裏的VN則是針對全部節點(GenTree)分配一個惟一的標識, 標識相同則值相同.

fgValueNumber會調用fgValueNumberBlock和fgValueNumberTree標記各個節點的VN.
VN有兩種類型, Liberal假定其餘線程只有在同步點纔會修改heap中的內容, Conservative假定其餘線程在任意兩次訪問之間都有可能修改heap中的內容.
VN會從ValueNumStore中分配, ValueNumStore包含如下類型的VN集合:

m_intCnsMap: int常量的VN集合
m_longCnsMap: long常量的VN集合
m_handleMap: field或者class handle的VN集合
m_floatCnsMap: float常量的VN集合
m_doubleCnsMap: double常量的VN集合
m_byrefCnsMap: byref常量的VN集合
m_VNFunc0Map: 帶0個參數的操做符的VN集合
m_VNFunc1Map: 帶1個參數的操做符(unary)的VN集合, 例如-x
m_VNFunc2Map: 帶2個參數的操做符(binary)的VN集合, 例如a + b
m_VNFunc3Map: 帶3個參數的操做符的VN集合

例如a = 1; b = GetNum(); c = a + b; d = a + b;,
a的VN是常量1, 儲存在m_intCnsMap中,
b的VN由於沒法肯定值, 會調用VNForExpr分配一個新的VN,
c的VN是a+b的組合, 儲存在m_VNFunc2Map中,
d的VN是a+b的組合, 由於以前已經生成過, 會從m_VNFunc2Map獲取一個現有的VN,
這時咱們能夠肯定c和d的值是相同的.

生成VN的具體算法請參考個人JIT筆記或者源代碼.

PHASE_HOIST_LOOP_CODE

這個階段會把循環中和循環無關的表達式提到循環外面, 包含如下的代碼:

if (doLoopHoisting)
{
    /* Hoist invariant code out of loops */
    optHoistLoopCode();
    EndPhase(PHASE_HOIST_LOOP_CODE);
}

optHoistLoopCode會枚舉循環中的表達式,
獲取表達式的VN, 並調用optVNIsLoopInvariant判斷表達式的值是否和循環無關,
若是和循環無關, 而且表達式無反作用, 而且表達式的節點擁有足夠的成本(gtCostEx)則把表達式提到循環外面.

例如優化前的代碼:

var a = SomeFunction();
for (var x = 0; x < 3; ++x) {
    Console.WriteLine(a * 3);
}

優化後能夠把a * 3提到外面:

var a = SomeFunction();
var tmp = a * 3;
for (var x = 0; x < 3; ++x) {
    Console.WriteLine(tmp);
}

判斷表達式的值和循環無關的依據有:

若是VN是phi, 則phi的來源須要在循環外部(例如上面若是是x * 3則來源是循環內部)
若是表達式訪問了heap上的變量(class的成員)則不能判斷無關
表達式中訪問的本地變量的SSA版本的定義須要在循環外部(例如上面的a的定義在循環外部)

PHASE_VN_COPY_PROP

這個階段會替換具備相同VN的本地變量, 包含如下的代碼:

if (doCopyProp)
{
    /* Perform VN based copy propagation */
    optVnCopyProp();
    EndPhase(PHASE_VN_COPY_PROP);
}

optVnCopyProp會枚舉全部讀取(USE)本地變量的節點,
調用optCopyProp, 查找當前是否有VN相同並存活的其餘變量, 若是有則替換讀取的變量到該變量.

例如優化前的代碼:

var a = GetNum();
var b = a;
var c = b + 123;

優化後能夠把b替換爲a:

var a = GetNum();
var b = a;
var c = a + 123;

後面若是b的引用計數爲0則咱們能夠安全的刪掉變量b.
這項優化能夠減小多餘的變量複製.

PHASE_OPTIMIZE_VALNUM_CSES

這個階段會替換具備相同VN的表達式, 俗稱CSE優化, 包含如下的代碼:

#if FEATURE_ANYCSE
/* Remove common sub-expressions */
optOptimizeCSEs();
#endif // FEATURE_ANYCSE

optOptimizeCSEs會枚舉全部節點,
調用optIsCSEcandidate判斷是否應該對節點進行CSE優化, 判斷依據包括表達式的成本(小代碼時gtCostSz不然gtCostEx),
若是判斷經過則調用optValnumCSE_Index, 對於擁有相同VN的節點,
第一次僅僅添加節點到optCSEhash索引中,
第二次由於節點已經在optCSEhash索引中, 會給該索引中的元素分配一個新的csdIndex(自增值), 而後設置節點的gtCSEnum等於csdIndex,
第三次以後節點已經在optCSEhash索引中, 也已經分配過csdIndex, 後面的節點的gtCSEnum都會指向同一個csdIndex.
完成後若是optCSEhash中有任意的元素有csdIndex, 則調用如下的函數執行CSE優化:

例如優化前的代碼:

var a = SomeFunction();
var b = (a + 5) * a;
var c = (a + 5) + a;

優化後能夠把a + 5提取出來:

var a = SomeFunction();
var tmp = a + 5;
var b = tmp * a;
var c = tmp + a;

這項優化能夠減小重複的計算, 但會增長本地變量的數量.

PHASE_ASSERTION_PROP_MAIN

這個階段會根據SSA和VN再次傳播斷言, 包含如下的代碼:

if (doAssertionProp)
{
    /* Assertion propagation */
    optAssertionPropMain();
    EndPhase(PHASE_ASSERTION_PROP_MAIN);
}

optAssertionPropMain包含如下的處理:

遍歷節點調用optVNAssertionPropCurStmtVisitor
- 調用optVnNonNullPropCurStmt
  - 針對call節點, 若是能夠經過VN肯定this不爲null, 則標記能夠省略null檢查
  - 針對indir(deref)節點, 若是能夠經過VN肯定變量不爲null, 則標記能夠省略null檢查
- 調用optVNConstantPropCurStmt
  - 若是節點的VN是常量, 替換節點到該常量
再次調用optAssertionGen根據當前的狀態建立斷言
調用optComputeAssertionGen按跳轉條件建立斷言
- 例如 if (a > 3) { /* block a */ } else { /* block b */ }, 能夠斷言block a中a > 3和block b中a <= 3
再次調用[optAssertionProp]按傳播後的斷言優化節點
- optAssertionProp_LclVar
  - 若是肯定本地變量等於常量，修改成該常量
  - 若是肯定本地變量等於另外一本地變量，修改成另外一本地變量
- optAssertionProp_Ind
  - 若是indir(deref)左邊的節點是lclVar, 而且該節點肯定不爲null, 則標記能夠省略null檢查
- optAssertionProp_BndsChk
  - 若是數組的位置是常量而且肯定不會溢出, 則標記不須要檢查邊界
- optAssertionProp_Comma
  - 若是前面標記了不須要檢查邊界, 則刪除邊界檢查(comma bound_check, expr) => (expr)
- optAssertionProp_Cast
  - 若是是小範圍類型轉換爲大範圍類型, 則標記不會溢出
  - 若是是大範圍類型轉換爲小範圍類型, 且肯定不會溢出則去除cast
- optAssertionProp_Call
  - 若是能夠肯定this不爲null, 則標記能夠省略null檢查
- optAssertionProp_RelOp
  - 替換等於或者不等於的表達式, 例如x == const, x的值肯定是能夠替換成true或false

PHASE_OPTIMIZE_INDEX_CHECKS

這個階段會根據VN和斷言刪除多餘的數組邊界檢查, 包含如下的代碼:

if (doRangeAnalysis)
{
    /* Optimize array index range checks */
    RangeCheck rc(this);
    rc.OptimizeRangeChecks();
    EndPhase(PHASE_OPTIMIZE_INDEX_CHECKS);
}

OptimizeRangeChecks會枚舉檢查邊界的節點(COMMA且左參數是ARR_BOUNDS_CHECK)並調用OptimizeRangeCheck,
若是能夠經過VN肯定訪問的序號小於數組長度, 則能夠去掉邊界檢查(COMMA左邊只留反作用),

PHASE_UPDATE_FLOW_GRAPH

若是優化過程當中作出了修改, 這個階段會再次調用fgUpdateFlowGraph刪除空block, 沒法到達的block和多餘的跳轉:

/* update the flowgraph if we modified it during the optimization phase*/
if (fgModified)
{
    fgUpdateFlowGraph();
    EndPhase(PHASE_UPDATE_FLOW_GRAPH);
    
    ...
}

PHASE_COMPUTE_EDGE_WEIGHTS2

若是優化過程當中作出了修改, 這個階段會再次調用fgComputeEdgeWeights計算block和block edge的權重(weight):
從階段的名字也能夠看出來這個階段的處理跟前面的PHASE_COMPUTE_EDGE_WEIGHTS階段同樣.

/* update the flowgraph if we modified it during the optimization phase*/
if (fgModified)
{
    ...
    
    // Recompute the edge weight if we have modified the flow graph
    fgComputeEdgeWeights();
    EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS2);
}

PHASE_DETERMINE_FIRST_COLD_BLOCK

這個階段負責標記第一個冷(cold)的BasicBlock, 包含如下的代碼:

fgDetermineFirstColdBlock();
EndPhase(PHASE_DETERMINE_FIRST_COLD_BLOCK);

由於前面的fgReorderBlocks已經把權重較小的block排到鏈表的後面,
fgDetermineFirstColdBlock會查找BasicBlock鏈表的最後連續標記了BBF_RUN_RARELY的部分,
設置第一個標記的block到fgFirstColdBlock, 而後標記這些block爲BBF_COLD, 若是找不到則fgFirstColdBlock會等於null.

CodeGen會根據fgFirstColdBlock把代碼分爲兩部分, 熱(hot)的部分和冷(cold)的部分分別寫入到不一樣的位置.

PHASE_RATIONALIZE

這個階段是JIT後端的第一個階段, 解決LIR中須要上下文判斷的節點並正式開始使用LIR, 包含如下的代碼:

#ifndef LEGACY_BACKEND
// rationalize trees
Rationalizer rat(this); // PHASE_RATIONALIZE
rat.Run();
#endif // !LEGACY_BACKEND

Rationalizer::Run包含如下的處理:

枚舉BasicBlock中的語句(stmt)
- 若是當前的平臺不支持GT_INTRINSIC節點的操做(例如abs, round, sqrt)則替換爲helper call
- 設置上一個語句的最後一個節點的下一個節點是下一個語句的第一個節點
- 設置下一個語句的第一個節點的上一個節點是上一個語句的最後一個節點
標記BasicBlock的第一個節點和最後一個節點
標記BasicBlock的格式已是LIR(BBF_IS_LIR)
枚舉BasicBlock中的語句(stmt)
- 把語句節點(GT_STMT)轉換爲IL偏移值節點(GT_IL_OFFSET), 用於標記哪些節點屬於哪行IL語句
- 針對語句中的節點調用Rationalizer::RewriteNode
  - 把修改變量的GT_LCL_VAR, GT_LCL_FLD, GT_REG_VAR, GT_PHI_ARG節點轉換爲GT_STORE_LCL_VAR, GT_STORE_LCL_FLD
  - 把修改地址值的GT_IND節點轉換爲GT_STOREIND
  - 把修改類字段的GT_CLS_VAR節點轉換爲GT_CLS_VAR_ADDR+GT_STOREIND
  - 把修改塊值的GT_BLK, GT_OBJ, GT_DYN_BLK節點轉換爲GT_STORE_BLK, GT_STORE_OBJ, GT_STORE_DYN_BLK
  - 刪除GT_BOX節點(由於已經轉換爲call)
  - 對於GT_ADDR節點
    - 若是目標是本地變量則修改節點爲GT_LCL_VAR_ADDR或者GT_LCL_FLD_ADDR
    - 若是目標是類字段則修改節點爲GT_CLS_VAR_ADDR
    - 若是對象是indir則能夠同時刪除indir和addr(&*someVar => someVar)
  - 對於GT_NOP節點, 若是有參數則替換爲參數並刪除
  - 對於GT_COMMA節點
    - 若是第一個參數無反作用, 則刪除第一個參數的全部節點
    - 若是第二個參數無反作用且值未被使用, 則刪除第二個參數的全部節點
    - 刪除GT_COMMA節點(第一個和第二個參數已經按順序鏈接起來)
  - 刪除GT_ARGPLACE節點(後面會添加GT_PUTARG_REG或GT_PUTARG_STK節點)
  - 把讀取類字段的GT_CLS_VAR節點轉換爲GT_CLS_VAR_ADDR+GT_IND
  - 確保當前cpu支持GT_INTRINSIC節點對應的操做(例如abs, round, sqrt)
設置正式開始使用LIR Compiler::compRationalIRForm = true

PHASE_SIMPLE_LOWERING

這個階段會作一些簡單的Lowering(使LIR更接近機器代碼)工做, 包含如下的代碼:

// Here we do "simple lowering".  When the RyuJIT backend works for all
// platforms, this will be part of the more general lowering phase.  For now, though, we do a separate
// pass of "final lowering."  We must do this before (final) liveness analysis, because this creates
// range check throw blocks, in which the liveness must be correct.
fgSimpleLowering();
EndPhase(PHASE_SIMPLE_LOWERING);

fgSimpleLowering包含如下的處理:

按LIR順序枚舉節點
- 若是節點是GT_ARR_LENGTH, 轉換爲GT_IND(arr + ArrLenOffset)
  - 例如數組對象在x64下0~8是指向MethodTable的指針, 8~12是數組長度, 則轉換爲indir(lclVar +(ref) const 8)
- 若是節點是GT_ARR_BOUNDS_CHECK
  - 確保拋出IndexOutOfRangeException的BasicBlock存在, 不存在則添加

PHASE_LCLVARLIVENESS

這個階段會設置各個BasicBlock進入和離開時存活的變量集合, 包含如下的代碼:
這個階段僅在使用舊的JIT後端(JIT32)時會啓用, 也就是通常的CoreCLR不會執行這個階段.

#ifdef LEGACY_BACKEND
/* Local variable liveness */
fgLocalVarLiveness();
EndPhase(PHASE_LCLVARLIVENESS);
#endif // !LEGACY_BACKEND

fgLocalVarLiveness會設置BasicBlock的如下成員:

bbVarUse 使用過的本地變量集合
bbVarDef 修改過的本地變量集合
bbVarTmp 臨時變量
bbLiveIn 進入block時存活的變量集合
bbLiveOut 離開block後存活的變量集合
bbHeapUse 是否使用過全局heap
bbHeapDef 是否修改過全局heap
bbHeapLiveIn 進入blob時全局heap是否存活
bbHeapLiveOut 離開blob後全局heap是否存活
bbHeapHavoc 是否會讓全局heap進入未知的狀態

PHASE_LOWERING

這個階段會作主要的Lowering(使LIR更接近機器代碼)工做, 肯定各個節點須要的寄存器數量, 包含如下的代碼:

///////////////////////////////////////////////////////////////////////////////
// Dominator and reachability sets are no longer valid. They haven't been
// maintained up to here, and shouldn't be used (unless recomputed).
///////////////////////////////////////////////////////////////////////////////
fgDomsComputed = false;

/* Create LSRA before Lowering, this way Lowering can initialize the TreeNode Map */
m_pLinearScan = getLinearScanAllocator(this);

/* Lower */
Lowering lower(this, m_pLinearScan); // PHASE_LOWERING
lower.Run();

Lowering::Run包含如下的處理:

按LIR順序枚舉節點
- 若是是x86(32位)則分解long節點到兩個int節點(loResult => hiResult => long)
- GT_IND: 判斷是否能夠替換爲LEA節點(可使用CPU中的LEA指令)
  - 例如*(((v07 << 2) + v01) + 16)能夠替換爲*(lea(v01 + v07*4 + 16))
- GT_STOREIND: 判斷是否能夠替換爲LEA節點, 同上
- GT_ADD: 判斷是否能夠替換爲LEA節點, 同上
- GT_UDIV: 判斷是否能夠替換到RSZ節點
  - 例如16/2能夠替換爲16>>1
- GT_UMOD: 判斷是否能夠替換到AND節點
  - 例如17/2能夠替換爲17&(2-1)
- GT_DIV, GT_MOD:
  - 若是divisor是int.MinValue或者long.MinValue, 轉換到EQ(只有本身除本身能夠獲得1)
  - 若是divisor是power of 2
    - 轉換DIV到RSH, 例如16/-2轉換到-(16>>1)
    - 轉換MOD, 例如31%8轉換到31-8*(31/8)轉換到31-((31>>3)<<3)轉換到31-(31& ~(8-1))
- GT_SWITCH
  - 替換switch下的節點到一個本地變量
    - 例如switch v01 - 100替換到tmp = v01 - 100; switch tmp
  - 添加判斷並跳到default case的節點
    - 例如if (tmp > jumpTableLength - 2) { goto jumpTable[jumpTableLength - 1]; }
  - 建立一個新的BasicBlock, 把原來的BBJ_SWITCH轉移到這個block
    - 轉移後的結構:
      - 原block (BBJ_COND, 條件成立時跳轉到default case)
      - 新block (包含轉移後的switch)
      - 剩餘的block
  - 若是剩餘的跳轉目標都是同一個block, 能夠省略掉switch, 直接跳過去
  - 不然若是跳轉個數小於minSwitchTabJumpCnt則轉換switch到多個jtrue(if ... else if ... else)
  - 不然轉換switch到GT_SWITCH_TABLE節點(後面會生成一個包含偏移值的索引表, 按索引來跳轉)
- GT_CALL
  - 針對參數添加GT_PUTARG_REG或者GT_PUTARG_STK節點
  - 若是是調用委託則轉換到具體的取值+調用
    - 例如把call originalThis轉換到call indir(lea(originalThis+24)) with indir(lea(originalThis+8))
    - indir(lea(originalThis+24))是函數的地址
    - indir(lea(originalThis+8))是真正的this, 會替換掉原有的this式
  - 不然若是是GTF_CALL_VIRT_STUB則替換到call ind(函數地址的地址)
  - 不然若是是GTF_CALL_VIRT_VTABLE則替換到call ind(vtable中函數的地址)
    - 例如ind(lea(ind(lea(ind(lea(this+0))+72))+32))
  - 不然若是是GTF_CALL_NONVIRT
    - 若是是helper call則獲取具體的函數地址(例如JIT_New的函數地址)
    - 若是函數地址已知則生成call addr
    - 若是函數地址的地址已知則生成call ind(addr)
    - 若是函數地址的地址的地址已知則生成call ind(ind(addr))
- GT_JMP, GT_RETURN
  - 若是調用了非託管函數則在前面插入PME(pinvoke method epilog)
- GT_CAST
  - 轉換GT_CAST(small, float/double)到GT_CAST(GT_CAST(small, int), float/double)
  - 轉換GT_CAST(float/double, small)到GT_CAST(GT_CAST(float/double, int), small)
- GT_ARR_ELEM: 轉換到獲取元素地址而且IND的節點(例如IND(LEA))
- GT_STORE_BLK, GT_STORE_OBJ, GT_STORE_DYN_BLK: 判斷計算地址的節點是否能夠替換爲LEA節點, 同上
按LIR順序枚舉節點
- 計算節點須要的寄存器數量
- 設置哪些節點是contained(contained節點是其餘節點的指令的一部分)

能夠參考上一篇文章關於Lowering的例子:

PHASE_LINEAR_SCAN

這個階段負責給各個節點分配寄存器, 使用的是LSRA算法, 包含如下的代碼:

assert(lvaSortAgain == false); // We should have re-run fgLocalVarLiveness() in lower.Run()
lvaTrackedFixed = true;        // We can not add any new tracked variables after this point.

/* Now that lowering is completed we can proceed to perform register allocation */
m_pLinearScan->doLinearScan();
EndPhase(PHASE_LINEAR_SCAN);

LSRA算法能夠看這一篇論文中的說明, 但CoreCLR中使用的算法和論文中的算法不徹底同樣.
LSRA算法要求根據LIR生成如下數據:

Interval

Interval表示同一個變量(本地L, 內部T, 其餘I)對應的使用期間, 包含多個RefPosition,
本地變量的Interval會在一開始建立好, 其餘(臨時)的Interval會在須要使用寄存器(例如call返回值)時使用,
Interval有激活(activate)和未激活(inactive)狀態, 未激活狀態表明在當前位置該變量不會被使用(不佔用寄存器).

LocationInfo

LocationInfo表示代碼位置, 在構建時會對LIR中的GenTree分配位置, 位置總會+2.

RefPosition

RefPosition有如下的類型:

Def: 記錄寫入變量的位置, 有對應的Interval
Use: 記錄讀取變量的位置, 有對應的Interval
Kill: 記錄寄存器值會被覆蓋的位置, 常見於call時標記caller save registers被覆蓋
BB: 記錄BasicBlock的位置
FixedReg: 記錄當前位置使用了固定的寄存器
ExpUse: 記錄離開當前block時存活且進入後繼block時也存活的變量(exposed use)
ParamDef: 記錄函數開頭傳入(定義)的參數變量
DummyDef: 記錄函數開頭未定義的參數變量
ZeroInit: 記錄函數開頭須要0初始化的變量
KillGCRefs: 記錄須要確保當前寄存器中無GC引用(對象或者struct的指針)的位置

能夠參考上一篇文章中說明LSRA的圖片:

LinearScan::doLinearScan包含如下的處理:

調用setFrameType設置當前是否應該使用Frame Pointer
- 使用Frame Pointer表示須要使用rbp保存進入函數時的rsp值, 須要清除全部節點的寄存器候選中的rbp
調用initMaxSpill初始化用於記錄spill層數的數組
- 數組maxSpill有兩個元素, 一個記錄int的最大spill層數, 另外一個記錄float的最大spill層數
調用buildIntervals構建LSRA算法須要的數據結構
- 構建Interval, RefPosition, LocationInfo
調用initVarRegMaps設置進入和離開block時變量使用的寄存器
- 枚舉BasicBlock
  - 設置inVarToRegMaps[blockIndex] = new regNumber[跟蹤的變量數量]
  - 設置outVarToRegMaps[blockIndex] = new regNumber[跟蹤的變量數量]
  - 枚舉跟蹤的變量數量
    - 設置inVarToRegMaps[blockIndex][regMapIndex] = REG_STK(默認經過棧傳遞)
    - 設置outVarToRegMap[blockIndex][regMapIndex] = REG_STK(默認經過棧傳遞)
- 由於JIT須要確保若是變量在寄存器中, 離開block時變量所在的寄存器和進入後繼block時變量所在的寄存器一致
調用allocateRegisters分配寄存器
- 這個函數包含了LSRA算法的主要處理, 如下是簡化過的流程, 完整的請看個人JIT筆記
- 創建一個寄存器索引physRegs[寄存器數量], 索引寄存器 => (上次使用寄存器的RefPosition, 是否正在使用)
- 枚舉Interval, 若是是傳入的函數參數則設置isActive = true
- 枚舉RefPosition
  - 若是RefPosition是讀取(Use)
    - 若是當前無正在分配的寄存器則標記爲reload(把值從棧reload到寄存器)
  - 若是RefPosition要求使用固定的寄存器(例如Kill)
    - 讓寄存器對應的Interval讓出寄存器並設置爲inactive
  - 若是RefPosition是最後一次讀取(Use)
    - 標記下一輪處理Interval讓出寄存器並設置爲inactive
  - 若是RefPosition是讀取(Use)或者寫入(Def)且未分配寄存器
    - 調用tryAllocateFreeReg分配一個寄存器(論文中的First Pass)
    - 若是分配失敗則調用allocateBusyReg再次嘗試分配一個寄存器(論文中的Second Pass)
      - 必要時會讓原來的寄存器對應的Interval讓出寄存器(把值從寄存器spill到棧, 而後變爲inactive)
    - 分配成功時Interval變爲active
- (若是一個變量對應的Interval從未讓出過寄存器(spill), 則這個變量能夠一直使用寄存器保存而不須要訪問棧)
- (反過來講若是一個Interval讓出過寄存器(spill), 且該Interval不是本地變量, 則須要增長一個內部臨時變量)
調用resolveRegisters解決block之間寄存器的差別
- 上面的分配是線性的, 並未考慮到flowgraph, 這個函數會確保離開block時變量所在的寄存器和進入後繼block時變量所在的寄存器一致
- 根據以前分配的結果給節點(GenTree)設置使用的寄存器
- 若是須要從新從棧讀取值則插入GT_RELOAD節點
- 設置進入block時變量所在的寄存器索引inVarToRegMaps
- 設置離開block時變量所在的寄存器索引outVarToRegMap
- 調用resolveEdges
  - 若是block的後繼block有多個前繼block, 例如(A => B, C => B), 則須要在A中解決
    - 若是block結束時變量的寄存器跟後繼block的寄存器一致, 則無需resolution
    - 若是block結束時變量的寄存器跟後繼block的寄存器不一致, 但全部後繼block的寄存器都相同
      - 在block結束前插入GT_COPY節點, 複製來源寄存器到目標寄存器(或者來源寄存器到堆棧到目標寄存器)
    - 若是block結束時變量的寄存器跟後繼block的寄存器不一致, 且不是全部後繼block的寄存器都相同
      - 在block和後繼block之間插入一個新block, 新block中插入GT_COPY節點用於複製到目標寄存器
  - 若是block只有一個前繼block, 例如(A => B), 則能夠在B中解決
    - 對於不一致的寄存器在block開頭插入GT_COPY節點
- 對於從未spill過的本地變量, 設置它能夠不放在棧上(lvRegister = true, lvOnFrame = false)
- 對於非本地變量的spill, 根據maxSpill[int]和maxSpill[float]調用tmpPreAllocateTemps建立指定數量的內部臨時變量

通過這個階段後, LIR中須要寄存器的節點都會獲得明確的寄存器, 讀取或者寫入本地變量的節點也會明確目標是棧仍是某個寄存器.

PHASE_RA_ASSIGN_VARS

由於舊的JIT後端不支持LSRA, 這個階段負責給舊的JIT後端(JIT32)分配寄存器, 包含如下的代碼:

lvaTrackedFixed = true; // We cannot add any new tracked variables after this point.
// For the classic JIT32 at this point lvaSortAgain can be set and raAssignVars() will call lvaSortOnly()

// Now do "classic" register allocation.
raAssignVars();
EndPhase(PHASE_RA_ASSIGN_VARS);

由於通常的CoreCLR不會執行這個階段, 這裏就不詳細分析了.

PHASE_GENERATE_CODE

從這個階段開始就屬於CodeGen了, CodeGen的入口以下:

/* Generate code */

codeGen->genGenerateCode(methodCodePtr, methodCodeSize);

genGenerateCode包含了三個階段:

PHASE_GENERATE_CODE: 負責根據LIR生成彙編指令
PHASE_EMIT_CODE: 根據彙編指令寫入可執行的機器代碼
PHASE_EMIT_GCEH: 寫入函數的附加信息(函數頭, GC信息, 例外信息等)

CodeGen會使用如下的數據類型:

instrDesc: 彙編指令的數據, 一個instrDesc實例對應一條彙編指令
insGroup: 彙編指令的組, 一個insGroup包含一個或多個instrDesc, 跳轉指令的目標只能是IG的第一條指令

如下是上一篇文章中的圖片說明:

PHASE_GENERATE_CODE階段包含了如下的處理:

調用lvaAssignFrameOffsets給各個本地變量分配棧偏移值
- 會分兩步計算
  - 第一步設置一個虛擬的初始偏移值0, 而後以這個0爲基準設置各個變量的偏移值, 參數爲正數本地變量爲負數
  - 第二步根據是否使用frame pointer調整各個偏移值
- 計算完畢後會設置compLclFrameSize, 表明進入函數時須要分配的大小(例如sub rsp, 0x80)
調用emitBegFN預留函數的prolog所使用的IG
- LIR只包含了函數體, 函數的prolog須要一個單獨的IG保存
調用genCodeForBBlist處理BasicBlock
- 若是block是小函數(funclet)的第一個block, 則預留小函數的prolog所使用的IG
- 以LIR順序枚舉block中的節點, 調用genCodeForTreeNode根據節點添加彙編指令
  - GT_CNS_INT: 若是常量的值是0, 生成xor targetReg, targetReg, 不然生成mov, targetReg, imm
  - GT_NEG: 若是來源寄存器跟目標寄存器不一致則生成mov targetReg, sourceReg, 而後生成neg targetReg
  - GT_LCL_VAR: 若是本地變量已經在寄存器則能夠不處理, 不然生成從棧讀取到寄存器的指令, 例如mov targetReg, [rbp-offset]
  - GT_STORE_LCL_VAR: 若是本地變量已經在相同的寄存器則不處理, 若是在不一樣的寄存器則添加複製寄存器的指令, 不然生成從寄存器保存到棧的指令
  - 更多的類型能夠參考個人JIT筆記
- 判斷block的跳轉類型
  - BBJ_ALWAYS: 添加jmp指令
  - BBJ_RETURN: 預留函數的epilog使用的IG
  - BBJ_THROW: 添加int 3指令(這個指令不會被執行)
  - BBJ_CALLFINALLY: 添加mov rcx, pspsym; call finally-funclet; jmp finally-return;的指令
  - BBJ_EHCATCHRET: 移動block的目標地址(返回地址)到rax, 而後預留小函數的epilog使用的IG
  - BBJ_EHFINALLYRET, BBJ_EHFILTERRET: 預留小函數的epilog使用的IG
調用genGeneratePrologsAndEpilogs添加prolog和epilog中的指令
- 調用genFnProlog生成主函數的prolog
  - 若是須要使用Frame Pointer, 則添加push rbp; mov rbp, rsp
  - push修改過的Callee Saved Register
  - 添加分配棧空間的指令, 例如sub rsp, size, 並添加確認棧空間的虛擬內存(全部頁)可訪問的指令
  - 添加清零棧空間的指令(本地變量的初始值是0)
  - 若是使用了小函數(funclet), 則添加mov [lvaPSPSym], rsp
  - 若是使用了Generic Context參數則添加保存它到本地變量的指令
  - 若是使用了GS Cookie則添加設置GS Cookie值的指令
- 調用emitGeneratePrologEpilog生成主函數的epilog和小函數的prolog和epilog
  - 枚舉以前預留的IG列表
    - IGPT_PROLOG: 上面已經生成過, 這裏能夠跳過
    - IGPT_EPILOG: 調用genFnEpilog生成主函數的epilog
      - pop以前prolog裏面push過的Callee Saved Register
      - 若是使用Frame Pointer且是x86, 則添加mov esp, ebp; pop ebp;
      - 若是使用Frame Pointer且是x64, 則添加add rsp, size; pop rbp或者lea rsp, [rsp+size]; pop rbp;
      - 若是不使用Frame Pointer, 則添加add rsp, size或者lea rsp, [rsp+size]
      - 若是是tail call則添加call addr, 若是是fast tail call則添加jmp rax, 不然添加ret
    - IGPT_FUNCLET_PROLOG:
      - 添加push rbp
      - push修改過的Callee Saved Register
      - 添加分配棧空間的指令, 例如sub rsp, size
      - 添加繼承PSPSym並恢復主函數rbp的指令, 例如mov rbp, [rcx+20h]; mov [rsp+20h], rbp; lea rbp,[rbp+40h];
    - IGPT_FUNCLET_EPILOG:
      - 添加釋放棧空間的指令, 例如add rsp, size
      - pop以前prolog裏面push過的Callee Saved Register
      - 添加pop rbp
      - 添加ret

PHASE_EMIT_CODE

上一個階段生成了彙編指令, 但這些指令是經過instrDesc保存在insGroup的數據結構, 並非可執行的機器代碼.
這個階段負責根據instrDesc列表寫入實際可執行的機器代碼.

如下是上一篇文章中的圖片說明:

生成的結構以下, 包含函數代碼, 函數頭和真函數頭:

這個階段的主要處理在emitEndCodeGen函數中, 包含如下的處理:

調用CEEJitInfo::allocMem分配保存可執行機器代碼的內存
- 調用EEJitManager::allocCode
  - 調用EEJitManager::allocCodeRaw
    - 獲取CodeHeap(chunk)的列表, 若是空間不足則調用EEJitManager::NewCodeHeap分配一個新的chunk
      - 調用ClrVirtualAllocExecutable
    - 若是是動態函數, 這裏會分配"函數頭+函數代碼+真函數頭"的大小並返回指向"函數代碼"的指針
    - 若是不是動態函數, 這裏會分配"函數頭+函數代碼"的大小並返回函數代碼的指針
  - 若是不是動態函數, 調用pMD->GetLoaderAllocator()->GetLowFrequencyHeap()->AllocMem分配真函數頭
    - 這裏分配的區域只有PAGE_READWRITE, 不可被執行
  - 設置"函數頭"中的指針指向"真函數頭"
  - 調用NibbleMapSet設置Nibble Map, 用於定位函數的開始地址
    - Nibble Map在函數所在chunk(HeapList)的pHdrMap成員中, 是一個DWORD的數組, 一個DWORD包含8個Nibble格式以下
    - [ [ NIBBLE(4bit), NIBBLE, ...(8個) ], [ NIBBLE, NIBBLE, ...(8個) ], ... ]
    - 例如函數的開始地址是0x7fff7ce80078, 所在chunk(HeapList)的基礎地址是0x7fff7ce80000, 則偏移值是120
    - Nibble的值是((120 % 32) / 4) + 1 = 7
    - Nibble存放在第120 / 32 / 8 = 0個DWORD中的第120 / 32 = 3個Nibble
    - 也就是DWORD的值會&= 0xfff0ffff而後|= 0x00070000
    - Nibble Map會能夠根據當前PC查找函數的開始地址和對應的函數頭, 對於調試和GC都是必要的信息
枚舉IG(insGroup)列表
- 記錄IG開始時有哪些GC引用(對象或者struct的指針)在棧和寄存器上, 添加到gcInfo的列表中
- 枚舉IG中的指令(instrDesc)
  - 調用emitIssue1Instr編碼指令
    - 調用emitOutputInstr(x86/x64版本)
      - 判斷指令的類型並寫入指令, 指令的類型有
        
        無參數的指令, 例如nop
        
        帶一個常量的指令, 例如jge, loop, ret
        
        帶跳轉目標(label)的指令, 例如jmp
        
        帶函數或者函數指針的指令, 例如call
        
        帶單個寄存器的指令, 例如inc, dec
        
        帶兩個寄存器的指令, 例如mov
        
        第一個參數是寄存器, 第二個參數是內存的指令, 例如mov
        
        更多的處理能夠參考個人JIT筆記
  - 寫入指令的同時會更新gcInfo的列表
    - 例如從函數地址+x開始寄存器rax中包含GC引用, 從函數地址+x1開始寄存器rax不包含GC引用等

這個階段完成了對函數中機器代碼的寫入, 接下來就是最後一個階段.

PHASE_EMIT_GCEH

這個階段負責寫入函數相關的信息, 也就是上面"真函數頭"中的信息.
"真函數頭"的類型是_hpRealCodeHdr, 包含如下的信息:

phdrDebugInfo: PC到IL offset的索引
phdrJitEHInfo: EH Clause的數組
phdrJitGCInfo: GC掃描棧和寄存器使用的信息
phdrMDesc: 函數的MethodDesc
nUnwindInfos: unwindInfos的數量
unindInfos: unwind信息(棧回滾信息)

DebugInfo

phdrDebugInfo是一個DWORD的數組, 格式是Nibble Stream, 以4 bit爲單位保存數字.
例如 0xa9 0xa0 0x03 表明 80, 19 兩個數字:

0xa9 = 0b1010'1001 (最高位的1表明還須要繼續讀取下一個nibble)
0xa0 = 0b1010'0000 (最高位的0表示當前數字已結束)
0x03 = 0b0000'0011
001 010 000 => 80
010 011 => 19

數字列表的結構是:

header, 包含兩個數字, 第一個是offset mapping編碼後的長度(bytes), 第二個是native vars編碼後的長度(bytes)
offset mapping
- offset mapping 的數量
- native offset, 寫入與前一條記錄的偏移值
- il offset
- source 標記(flags), 有SOURCE_TYPE_INVALID, SEQUENCE_POINT, STACK_EMPTY等
native vars (內部變量所在的scope的信息)
- native vars 的數量
- startOffset scope的開始偏移值
- endOffset scope的結束偏移值, 寫入距離start的delta
- var number 變量的序號
- var type (reg仍是stack)
- 後面的信息根據var type而定, 具體參考DoNativeVarInfo

IDE能夠根據DebugInfo知道下斷點的時候應該把斷點設在哪一個內存地址, 步過的時候應該在哪一個內存地址停下來等.

EHInfo

phdrJitEHInfo是指向CorILMethod_Sect_FatFormat結構體的指針, 包含了EH Clause的數量和EE_ILEXCEPTION_CLAUSE的數組.

使用如下的C#代碼:

var x = GetString();
try {
    Console.WriteLine(x);
    throw new Exception("abc");
} catch (Exception ex) {
    Console.WriteLine(ex);
    Console.WriteLine(x);
}

能夠生成如下的彙編代碼:

IN0016: 000000 push     rbp
IN0017: 000001 push     rbx
IN0018: 000002 sub      rsp, 24
IN0019: 000006 lea      rbp, [rsp+20H]
IN001a: 00000B mov      qword ptr [V06 rbp-20H], rsp
G_M21556_IG02:        ; offs=00000FH, size=0009H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
IN0001: 00000F call     ConsoleApplication.Program:GetString():ref
IN0002: 000014 mov      gword ptr [V01 rbp-10H], rax
G_M21556_IG03:        ; offs=000018H, size=0043H, gcVars=0000000000000001 {V01}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
IN0003: 000018 mov      rdi, gword ptr [V01 rbp-10H]
IN0004: 00001C call     System.Console:WriteLine(ref)
IN0005: 000021 mov      rdi, 0x7F78892D3CE8
IN0006: 00002B call     CORINFO_HELP_NEWSFAST
IN0007: 000030 mov      rbx, rax
IN0008: 000033 mov      edi, 1
IN0009: 000038 mov      rsi, 0x7F78881BCE70
IN000a: 000042 call     CORINFO_HELP_STRCNS
IN000b: 000047 mov      rsi, rax
IN000c: 00004A mov      rdi, rbx
IN000d: 00004D call     System.Exception:.ctor(ref):this
IN000e: 000052 mov      rdi, rbx
IN000f: 000055 call     CORINFO_HELP_THROW
IN0010: 00005A int3     
G_M21556_IG04:        ; offs=00005BH, size=0007H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, epilog, nogc
IN001b: 00005B lea      rsp, [rbp-08H]
IN001c: 00005F pop      rbx
IN001d: 000060 pop      rbp
IN001e: 000061 ret      
G_M21556_IG05:        ; func=01, offs=000062H, size=000EH, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, funclet prolog, nogc
IN001f: 000062 push     rbp
IN0020: 000063 push     rbx
IN0021: 000064 push     rax
IN0022: 000065 mov      rbp, qword ptr [rdi]
IN0023: 000068 mov      qword ptr [rsp], rbp
IN0024: 00006C lea      rbp, [rbp+20H]
G_M21556_IG06:        ; offs=000070H, size=0018H, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, isz
IN0011: 000070 mov      rdi, rsi
IN0012: 000073 call     System.Console:WriteLine(ref)
IN0013: 000078 mov      rdi, gword ptr [V01 rbp-10H]
IN0014: 00007C call     System.Console:WriteLine(ref)
IN0015: 000081 lea      rax, G_M21556_IG04
G_M21556_IG07:        ; offs=000088H, size=0007H, funclet epilog, nogc, emitadd
IN0025: 000088 add      rsp, 8
IN0026: 00008C pop      rbx
IN0027: 00008D pop      rbp
IN0028: 00008E ret

用lldb來分析這個函數的EHInfo能夠獲得:

(lldb) p *codePtr
(void *) $1 = 0x00007fff7ceef920
(lldb) p *(CodeHeader*)(0x00007fff7ceef920-8)
(CodeHeader) $2 = {
  pRealCodeHeader = 0x00007fff7cf35c78
}
(lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf35c78)
(_hpRealCodeHdr) $3 = {
  phdrDebugInfo = 0x0000000000000000
  phdrJitEHInfo = 0x00007fff7cf35ce0
  phdrJitGCInfo = 0x0000000000000000
  phdrMDesc = 0x00007fff7baf9200
  nUnwindInfos = 2
  unwindInfos = {}
}
(lldb) me re -s8 -c20 -fx 0x00007fff7cf35ce0-8
0x7fff7cf35cd8: 0x0000000000000001 0x0000000000002040
0x7fff7cf35ce8: 0x0000001800000000 0x000000620000005b
0x7fff7cf35cf8: 0x000000000000008f 0x000000000100000e
0x7fff7cf35d08: 0x0000000000000030 0x0000000000000001
0x7fff7cf35d18: 0x00007ffff628f550 0x0000000000000b4a
0x7fff7cf35d28: 0x0000000000000000 0x0000000000000000
0x7fff7cf35d38: 0x0000000000000000 0x0000000000000000
0x7fff7cf35d48: 0x0000000000000000 0x0000000000000000
0x7fff7cf35d58: 0x0000000000000000 0x0000000000000000
0x7fff7cf35d68: 0x0000000000000000 0x0000000000000000

0x0000000000000001:
phdrJitEHInfo - sizeof(size_t) is num clauses, here is 1

0x0000000000002040:
memeber from base class IMAGE_COR_ILMETHOD_SECT_FAT
Kind = 0x40 = CorILMethod_Sect_FatFormat
DataSize = 0x20 = 32 = 1 * sizeof(EE_ILEXCEPTION_CLAUSE)

(lldb) p ((EE_ILEXCEPTION_CLAUSE*)(0x00007fff7cf35ce0+8))[0]
(EE_ILEXCEPTION_CLAUSE) $29 = {
  Flags = COR_ILEXCEPTION_CLAUSE_NONE
  TryStartPC = 24
  TryEndPC = 91
  HandlerStartPC = 98
  HandlerEndPC = 143
   = (TypeHandle = 0x000000000100000e, ClassToken = 16777230, FilterOffset = 16777230)
}

(lldb) sos Token2EE * 0x000000000100000e
Module:      00007fff7bc04000
Assembly:    System.Private.CoreLib.ni.dll
<invalid module token>
--------------------------------------
Module:      00007fff7baf6e70
Assembly:    coreapp_jit.dll
Token:       000000000100000E
MethodTable: 00007fff7cc0dce8
EEClass:     00007fff7bcb9400
Name:         mdToken: 0100000e (/home/ubuntu/git/coreapp_jitnew/bin/Release/netcoreapp1.1/ubuntu.16.04-x64/publish/coreapp_jit.dll)

(lldb) dumpmt 00007fff7cc0dce8
EEClass:         00007FFF7BCB9400
Module:          00007FFF7BC04000
Name:            System.Exception
mdToken:         0000000002000249
File:            /home/ubuntu/git/coreapp_jitnew/bin/Release/netcoreapp1.1/ubuntu.16.04-x64/publish/System.Private.CoreLib.ni.dll
BaseSize:        0x98
ComponentSize:   0x0
Slots in VTable: 51
Number of IFaces in IFaceMap: 2

能夠看到EE_ILEXCEPTION_CLAUSE包含了try開始和結束的PC地址, handler開始和結束的PC地址, 和指向捕捉例外類型(或者filter函數)的指針.
CLR能夠根據EHInfo知道例外拋出時應該調用哪一個catch和finally.

GCInfo

phdrJitGCInfo是一個bit數組, 它的編碼很是複雜, 這裏我給出一個實際解析GCInfo的例子.

C#代碼和彙編代碼和上面的EHInfo同樣, 使用LLDB分析能夠獲得:

(lldb) p *codePtr
(void *) $1 = 0x00007fff7cee3920
(lldb) p *(CodeHeader*)(0x00007fff7cee3920-8)
(CodeHeader) $2 = {
  pRealCodeHeader = 0x00007fff7cf29c78
}
(lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf29c78)
(_hpRealCodeHdr) $3 = {
  phdrDebugInfo = 0x0000000000000000
  phdrJitEHInfo = 0x00007fff7cf29ce0
  phdrJitGCInfo = 0x00007fff7cf29d28 "\x91\x81G"
  phdrMDesc = 0x00007fff7baed200
  nUnwindInfos = 2
  unwindInfos = {}
}
(lldb) me re -s8 -c20 -fx 0x00007fff7cf29d28
0x7fff7cf29d28: 0x1963d80000478191 0x171f412003325ca8
0x7fff7cf29d38: 0xee92864c5ffe0280 0x1c5c1c1f09bea536
0x7fff7cf29d48: 0xed8a93e5c6872932 0x00000000000000c4
0x7fff7cf29d58: 0x000000000000002a 0x0000000000000001
0x7fff7cf29d68: 0x00007ffff628f550 0x0000000000000b2e
0x7fff7cf29d78: 0x0000000000000000 0x0000000000000000
0x7fff7cf29d88: 0x0000000000000000 0x0000000000000000
0x7fff7cf29d98: 0x0000000000000000 0x0000000000000000
0x7fff7cf29da8: 0x0000000000000000 0x0000000000000000
0x7fff7cf29db8: 0x0000000000000000 0x0000000000000000

對bit數組的解析以下:

10001001
1: use fat encoding
0: no var arg
0: no security object
0: no gc cookie
1: have pspsym stack slot
0 0: no generic context parameter
1: have stack base register

1000000
1: wants report only leaf
0: no edit and continue preserved area
0: no reverse pinvoke frame
0 0 0 0: return kind is RT_Scalar

1'11100010
0 10001111: code length is 143

0000000
0 000000: pspsym stack slot is 0

0'0000000
0 000: stack base register is rbp (rbp is 5, normalize function will ^5 so it's 0)
0 000: size of stack outgoing and scratch area is 0

0'000110
0 00: 0 call sites
1 0 0 1: 2 interruptible ranges

11'11000
0 001111: interruptible range 1 begins from 15

110'10011000'000
1 001011 0 000001: interruptible range 1 finished at 91 (15 + 75 + 1)

10101'00
0 010101: interruptible range 2 begins from 112 (91 + 21)

111010'01001100
0 010111: interruptible range 2 finished at 136 (112 + 23 + 1)
1: have register slots
1 00 0 01: 4 register slots

110000
1: have stack slots
0 01: 1 tracked stack slots
0 0: 0 untracked stack slots

00'0000010
0 000: register slot 1 is rax(0)
00: register slot 1 flag is GC_SLOT_IS_REGISTER(8 & 0b11 = 0)
0 10: register slot 2 is rbx(3) (0 + 2 + 1)

0'10000
0 10: register slot 3 is rsi(6) (3 + 2 + 1)
0 00: register slot 4 is rdi(7) (6 + 0 + 1)

010'11111000
01: stack slot 1 base on GC_FRAMEREG_REL(2)
0 111110: stack slot 1 offset is -16 (-16 / 8 = -2)
00: stack slot 1 flag is GC_SLOT_BASE(0)

111 01000
111: num bits per pointer is 7

00000001
0 0000001: chunk 0's bit offset is 0 (1-1)

01000000: chunk 1's bit offset is 63 (64-1)

011111
011111: chunk 0 could be live slot list, simple format, all could live

11'111
11111: chunk 0 final state, all slot lives

1 1010'00
1 000101: transition of register slot 1(rax) at 0x14 (20 = 15 + 5), becomes live

110010'01100001
1 001001: transition of register slot 1(rax) at 0x18 (24 = 15 + 9), becomes dead
1 100001: transition of register slot 1(rax) at 0x30 (48 = 15 + 33), becomes live

01001001
0: terminator, no more transition of register slot 1(rax) in this chunk
1 100100: transition of register slot 2(rbx) at 0x33 (51 = 15 + 36), becomes live

01110111
0: terminator, no more transition of register slot 2(rbx) in this chunk
1 111110: transition of register slot 3(rsi) at 0x4d (77 = 15 + 62), becomes live

01101100
0: terminator, no more transition of register slot 3(rsi) in this chunk
1 001101: transition of register slot 4(rdi) at 0x1c (28 = 15 + 13), becomes live

1010010
1 010010: transition of register slot 4(rdi) at 0x21 (33 = 15 + 18), becomes dead

1'0111110
1 111110: transition of register slot 4(rdi) at 0x4d (77 = 15 + 62), becomes live
0: terminator, no more transition of register slot 4(rdi) in this chunk

1'1001000
1 001001: transition of stack slot 1(rbp-16) at 0x18 (24 = 15 + 9), becomes live
0: terminator, no more transition of stack slot 1(rbp-16) in this chunk

0'11111
0 11111: chunk 1 could be live slot list, simple format, all could live

000'00
00000: chunk 1 final state, all slot dead

111000'00
1 000011: transition of register slot 1(rax) at 0x52 (15 + 64 + 3), becomes dead
0: terminator, no more transition of register slot 1(rax) in this chunk

111010'00
1: 001011: transition of register slot 2(rbx) at 0x5a (15 + 64 + 11), becomes dead
0: terminator, no more transition of register slot 2(rbx) in this chunk

111000'01001100
1 000011: transition of register slot 3(rsi) at 0x52 (15 + 64 + 3), becomes dead
1 001100: transition of register slot 3(rsi) at 0x70 (0x70 + (64+12 - (0x5b-0xf))), becomes live

10010100
1 010100: transition of register slot 3(rsi) at 0x78 (0x70 + (64+20 - (0x5b-0xf))), becomes dead
0: terminator, no more transition of register slot 3(rsi) in this chunk

1110000
1: 000011: transition of register slot 4(rdi) at 0x52 (15 + 64 + 3), becomes dead

1'011000
1 000110: transition of register slot 4(rdi) at 0x55 (15 + 64 + 6), becomes live

11'10100
1 001011: transition of register slot 4(rdi) at 0x5a (15 + 64 + 11), becomes dead

111'1100
1: 001111: transition of register slot 4(rdi) at 0x73 (0x70 + (64+15 - (0x5b-0xf))), becomes live

1001'010
1 010100: transition of register slot 4(rdi) at 0x78 (0x70 + (64+20 - (0x5b-0xf))), becomes dead

10001'10
1 011000: transition of register slot 4(rdi) at 0x7c (0x70 + (64+24 - (0x5b-0xf))), becomes live

110111'00
1 011101: transition of register slot 4(rdi) at 0x81 (0x70 + (64+29 - (0x5b-0xf))), becomes dead
0: terminator, no more transition of register slot 4(rdi) in this chunk

100011'00
1 011000: transition of stack slot 1(rbp-16) at 0x7c (0x70 + (64+24 - (0x5b-0xf))), becomes dead
0: terminator, no more transition of stack slot 1(rbp-16) in this chunk

CLR在執行GC的時候, 會中止線程並獲得當前中止的PC地址,
而後根據PC地址和Nibble Map獲取到函數頭,
再根據函數頭中的GCInfo就能夠獲取到當前執行函數中有哪些棧地址和寄存器包含了根對象.

由於GCInfo記錄了函數運行過程(可中斷的部分)中的全部GC引用的位置和生命週期,
CoreCLR中須要使用這樣複雜的編碼來減小它的大小.

UnwindInfo

unwindInfos是一個長度爲nUnwindInfos的數組, 類型是RUNTIME_FUNCTION.
nUnwindInfos的值等於主函數 + 小函數(funclet)的數量.
RUNTIME_FUNCTION中又保存了指向UNWIND_INFO的偏移值, UNWIND_INFO保存了函數對棧指針的操做.

這裏我也給出一個實際分析的例子, 使用如下的C#代碼:

var x = GetString();
try {
    Console.WriteLine(x);
    throw new Exception("abc");
} catch (Exception ex) {
    Console.WriteLine(ex);
    Console.WriteLine(x);
} finally {
    Console.WriteLine("finally");
}

能夠生成如下的彙編代碼:

G_M21556_IG01:        ; func=00, offs=000000H, size=000FH, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG

IN001e: 000000 push     rbp
IN001f: 000001 push     rbx
IN0020: 000002 sub      rsp, 24
IN0021: 000006 lea      rbp, [rsp+20H]
IN0022: 00000B mov      qword ptr [V06 rbp-20H], rsp

G_M21556_IG02:        ; offs=00000FH, size=0009H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref

IN0001: 00000F call     ConsoleApplication.Program:GetString():ref
IN0002: 000014 mov      gword ptr [V01 rbp-10H], rax

G_M21556_IG03:        ; offs=000018H, size=0043H, gcVars=0000000000000001 {V01}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref

IN0003: 000018 mov      rdi, gword ptr [V01 rbp-10H]
IN0004: 00001C call     System.Console:WriteLine(ref)
IN0005: 000021 mov      rdi, 0x7F94DDF9CCE8
IN0006: 00002B call     CORINFO_HELP_NEWSFAST
IN0007: 000030 mov      rbx, rax
IN0008: 000033 mov      edi, 1
IN0009: 000038 mov      rsi, 0x7F94DCE85E70
IN000a: 000042 call     CORINFO_HELP_STRCNS
IN000b: 000047 mov      rsi, rax
IN000c: 00004A mov      rdi, rbx
IN000d: 00004D call     System.Exception:.ctor(ref):this
IN000e: 000052 mov      rdi, rbx
IN000f: 000055 call     CORINFO_HELP_THROW
IN0010: 00005A int3     

G_M21556_IG04:        ; offs=00005BH, size=0001H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref

IN0011: 00005B nop      

G_M21556_IG05:        ; offs=00005CH, size=0008H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref

IN0012: 00005C mov      rdi, rsp
IN0013: 00005F call     G_M21556_IG11

G_M21556_IG06:        ; offs=000064H, size=0001H, nogc, emitadd

IN0014: 000064 nop      

G_M21556_IG07:        ; offs=000065H, size=0007H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, epilog, nogc

IN0023: 000065 lea      rsp, [rbp-08H]
IN0024: 000069 pop      rbx
IN0025: 00006A pop      rbp
IN0026: 00006B ret      

G_M21556_IG08:        ; func=01, offs=00006CH, size=000EH, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, funclet prolog, nogc

IN0027: 00006C push     rbp
IN0028: 00006D push     rbx
IN0029: 00006E push     rax
IN002a: 00006F mov      rbp, qword ptr [rdi]
IN002b: 000072 mov      qword ptr [rsp], rbp
IN002c: 000076 lea      rbp, [rbp+20H]

G_M21556_IG09:        ; offs=00007AH, size=0018H, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, isz

IN0015: 00007A mov      rdi, rsi
IN0016: 00007D call     System.Console:WriteLine(ref)
IN0017: 000082 mov      rdi, gword ptr [V01 rbp-10H]
IN0018: 000086 call     System.Console:WriteLine(ref)
IN0019: 00008B lea      rax, G_M21556_IG04

G_M21556_IG10:        ; offs=000092H, size=0007H, funclet epilog, nogc, emitadd

IN002d: 000092 add      rsp, 8
IN002e: 000096 pop      rbx
IN002f: 000097 pop      rbp
IN0030: 000098 ret      

G_M21556_IG11:        ; func=02, offs=000099H, size=000EH, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, funclet prolog, nogc

IN0031: 000099 push     rbp
IN0032: 00009A push     rbx
IN0033: 00009B push     rax
IN0034: 00009C mov      rbp, qword ptr [rdi]
IN0035: 00009F mov      qword ptr [rsp], rbp
IN0036: 0000A3 lea      rbp, [rbp+20H]

G_M21556_IG12:        ; offs=0000A7H, size=0013H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref

IN001a: 0000A7 mov      rdi, 0x7F94C8001068
IN001b: 0000B1 mov      rdi, gword ptr [rdi]
IN001c: 0000B4 call     System.Console:WriteLine(ref)
IN001d: 0000B9 nop      

G_M21556_IG13:        ; offs=0000BAH, size=0007H, funclet epilog, nogc, emitadd

IN0037: 0000BA add      rsp, 8
IN0038: 0000BE pop      rbx
IN0039: 0000BF pop      rbp
IN003a: 0000C0 ret

使用LLDB分析能夠獲得:

(lldb) p *codePtr
(void *) $0 = 0x00007fff7ceee920
(lldb) p *(CodeHeader*)(0x00007fff7ceee920-8)
(CodeHeader) $1 = {
  pRealCodeHeader = 0x00007fff7cf34c78
}
(lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf34c78)
(_hpRealCodeHdr) $2 = {
  phdrDebugInfo = 0x0000000000000000
  phdrJitEHInfo = 0x0000000000000000
  phdrJitGCInfo = 0x0000000000000000
  phdrMDesc = 0x00007fff7baf8200
  nUnwindInfos = 3
  unwindInfos = {}
}
(lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[0]
(RUNTIME_FUNCTION) $3 = (BeginAddress = 2304, EndAddress = 2412, UnwindData = 2500)
(lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[1]
(RUNTIME_FUNCTION) $4 = (BeginAddress = 2412, EndAddress = 2457, UnwindData = 2516)
(lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[2]
(RUNTIME_FUNCTION) $5 = (BeginAddress = 2457, EndAddress = 2497, UnwindData = 2532)

first unwind info:
(lldb) p (void*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2304) 
(void *) $13 = 0x00007fff7ceee920
(lldb) p (void*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2412) 
(void *) $14 = 0x00007fff7ceee98c
# range is [0, 0x6c)
(lldb) p *(UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500)
(UNWIND_INFO) $16 = {
  Version = '\x01'
  Flags = '\x03'
  SizeOfProlog = '\x06'
  CountOfUnwindCodes = '\x03'
  FrameRegister = '\0'
  FrameOffset = '\0'
  UnwindCode = {
    [0] = {
       = (CodeOffset = '\x06', UnwindOp = '\x02', OpInfo = '\x02')
      EpilogueCode = (OffsetLow = '\x06', UnwindOp = '\x02', OffsetHigh = '\x02')
      FrameOffset = 8710
    }
  }
}
(lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[0]
(UNWIND_CODE) $17 = {
   = (CodeOffset = '\x06', UnwindOp = '\x02', OpInfo = '\x02')
  EpilogueCode = (OffsetLow = '\x06', UnwindOp = '\x02', OffsetHigh = '\x02')
  FrameOffset = 8710
}
(lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[1]
(UNWIND_CODE) $18 = {
   = (CodeOffset = '\x02', UnwindOp = '\0', OpInfo = '\x03')
  EpilogueCode = (OffsetLow = '\x02', UnwindOp = '\0', OffsetHigh = '\x03')
  FrameOffset = 12290
}
(lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[2]
(UNWIND_CODE) $19 = {
   = (CodeOffset = '\x01', UnwindOp = '\0', OpInfo = '\x05')
  EpilogueCode = (OffsetLow = '\x01', UnwindOp = '\0', OffsetHigh = '\x05')
  FrameOffset = 20481
}

上面的UNWIND_CODE可能有點難懂, 能夠結合COMPlus_JitDump輸出的信息分析:

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0x00006c (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x06
  CountOfUnwindCodes: 3
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 2 * 8 + 8 = 24 = 0x18
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
allocUnwindInfo(pHotCode=0x00007F94DE27E920, pColdCode=0x0000000000000000, startOffset=0x0, endOffset=0x6c, unwindSize=0xa, pUnwindBlock=0x0000000002029516, funKind=0 (main function))
Unwind Info:
  >> Start offset   : 0x00006c (not in unwind data)
  >>   End offset   : 0x000099 (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x03
  CountOfUnwindCodes: 3
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 0 * 8 + 8 = 8 = 0x08
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
allocUnwindInfo(pHotCode=0x00007F94DE27E920, pColdCode=0x0000000000000000, startOffset=0x6c, endOffset=0x99, unwindSize=0xa, pUnwindBlock=0x0000000002029756, funKind=1 (handler))
Unwind Info:
  >> Start offset   : 0x000099 (not in unwind data)
  >>   End offset   : 0x0000c1 (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x03
  CountOfUnwindCodes: 3
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 0 * 8 + 8 = 8 = 0x08
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

以第一個RUNTIME_FUNCTION(主函數)爲例, 它包含了3個UnwindCode, 分別記錄了

push rbp
push rbx
sub rsp, 24

CLR查找調用鏈的時候, 例如A => B => C須要知道C的調用者,
能夠根據當前PC獲取當前Frame的頂部 => 獲取Return Address => 根據Return Address獲取上一個Frame的頂部 => 循環獲得全部調用者.
這個流程也叫Stack Walking(或Stack Crawling).

GC查找根對象時也須要根據Unwind信息查找調用鏈中的全部函數.

參考連接

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/ryujit-overview.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/porting-ryujit.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/viewing-jit-dumps.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/project-docs/clr-configuration-knobs.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/debugging-instructions.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/clr-abi.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/finally-optimizations.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/jit-call-morphing.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/type-system.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/type-loader.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/method-descriptor.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/virtual-stub-dispatch.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/jit-call-morphing.md
https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes(v=vs.110).aspx
https://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/designandimplementationofgenerics.pdf
https://www.cs.rice.edu/~keith/EMBED/dom.pdf
https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf
http://aakinshin.net/ru/blog/dotnet/typehandle/
https://en.wikipedia.org/wiki/List_of_CIL_instructions
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.arn0008c/index.html
http://wiki.osdev.org/X86-64_Instruction_Encoding
https://github.com/dotnet/coreclr/issues/12383
https://github.com/dotnet/coreclr/issues/14414
http://ref.x86asm.net/
https://www.onlinedisassembler.com/odaweb/