CoreCLR源碼探索(二) new是什麼

時間 2019-11-19

標籤 coreclr 源碼探索 new 什麼简体版

原文原文鏈接

前一篇咱們看到了CoreCLR中對Object的定義，這一篇咱們將會看CoreCLR中對new的定義和處理
new對於.Net程序員們來講一樣是耳熟能詳的關鍵詞，咱們天天都會用到new，然而new到底是什麼？html

由於篇幅限制和避免難度跳的過高，這一篇將不會詳細講解如下的內容，請耐心等待後續的文章node

GC如何分配內存
JIT如何解析IL
JIT如何生成機器碼

使用到的名詞和縮寫

如下的內容將會使用到一些名詞和縮寫，若是碰到看不懂的能夠到這裏來對照linux

BasicBlock: 在同一個分支(Branch)的一羣指令，使用雙向鏈表鏈接
GenTree: 語句樹，節點類型以GT開頭
Importation: 從BasicBlock生成GenTree的過程
Lowering: 具體化語句樹，讓語句樹的各個節點能夠明確的轉換到機器碼
SSA: Static Single Assignment
R2R: Ready To Run
Phases: JIT編譯IL到機器碼通過的各個階段
JIT: Just In Time
CEE: CLR Execute Engine
ee: Execute Engine
EH: Exception Handling
Cor: CoreCLR
comp: Compiler
fg: FlowGraph
imp: Import
LDLOCA: Load Local Variable
gt: Generate
hlp: Help
Ftn: Function
MP: Multi Process
CER: Constrained Execution Regions
TLS: Thread Local Storage

.Net中的三種new

請看圖中的代碼和生成的IL，咱們能夠看到儘管一樣是new，卻生成了三種不一樣的IL代碼c++

對class的new，IL指令是newobj
對array的new，IL指令是newarr
對struct的new，由於myStruct已經在本地變量裏面了，new的時候僅僅是調用ldloca加載而後調用構造函數

咱們先來看newobj和newarr這兩個指令在coreclr中是怎麼定義的
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/inc/opcode.def#L153git

OPDEF(CEE_NEWOBJ, "newobj", VarPop, PushRef, InlineMethod, IObjModel, 1, 0xFF, 0x73, CALL)
OPDEF(CEE_NEWARR, "newarr", PopI, PushRef, InlineType, IObjModel, 1, 0xFF, 0x8D, NEXT)

咱們能夠看到這兩個指令的定義，名稱分別是CEE_NEWOBJ和CEE_NEWARR，請記住這兩個名稱程序員

第一種new(對class的new)生成了什麼機器碼

接下來咱們將看看coreclr是如何把CEE_NEWOBJ指令變爲機器碼的
在講解以前請先大概瞭解JIT的工做流程，JIT編譯按函數爲單位，當調用函數時會自動觸發JIT編譯github

把函數的IL轉換爲BasicBlock(基本代碼塊)
從BasicBlock(基本代碼塊)生成GenTree(語句樹)
對GenTree(語句樹)進行Morph(變形)
對GenTree(語句樹)進行Lowering(具體化)
根據GenTree(語句樹)生成機器碼

下面的代碼雖然進過努力的提取，但仍然比較長，請耐心閱讀算法

咱們從JIT的入口函數開始看，這個函數會被EE(運行引擎)調用
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/inc/corjit.h#L350
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/ee_il_dll.cpp#L279
注: 按微軟文檔中說CILJit是32位上的實現，PreJit是64位上的實現，但實際我找不到PreJit在哪裏windows

CorJitResult CILJit::compileMethod(
    ICorJitInfo* compHnd, CORINFO_METHOD_INFO* methodInfo, unsigned flags, BYTE** entryAddress, ULONG* nativeSizeOfCode)
{
    // 省略部分代碼......
    assert(methodInfo->ILCode);
    result = jitNativeCode(methodHandle, methodInfo->scope, compHnd, methodInfo, &methodCodePtr, nativeSizeOfCode,
                           &jitFlags, nullptr);
    // 省略部分代碼......
    return CorJitResult(result);
}

jitNativeCode是一個負責使用JIT編譯單個函數的靜態函數，會在內部爲編譯的函數建立單獨的Compiler實例
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/compiler.cpp#L6075數組

int jitNativeCode(CORINFO_METHOD_HANDLE methodHnd,
                  CORINFO_MODULE_HANDLE classPtr,
                  COMP_HANDLE           compHnd,
                  CORINFO_METHOD_INFO*  methodInfo,
                  void**                methodCodePtr,
                  ULONG*                methodCodeSize,
                  JitFlags*             compileFlags,
                  void*                 inlineInfoPtr)
{
    // 省略部分代碼......
    pParam->pComp->compInit(pParam->pAlloc, pParam->inlineInfo);
    pParam->pComp->jitFallbackCompile = pParam->jitFallbackCompile;
    // Now generate the code
    pParam->result =
        pParam->pComp->compCompile(pParam->methodHnd, pParam->classPtr, pParam->compHnd, pParam->methodInfo,
                                   pParam->methodCodePtr, pParam->methodCodeSize, pParam->compileFlags);
    // 省略部分代碼......
    return result;
}

Compiler::compCompile是Compiler類提供的入口函數，做用一樣是編譯函數
注意這個函數有7個參數，等一會還會有一個同名但只有3個參數的函數
這個函數主要調用了Compiler::compCompileHelper函數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/compiler.cpp#L4693

int Compiler::compCompile(CORINFO_METHOD_HANDLE methodHnd,
                          CORINFO_MODULE_HANDLE classPtr,
                          COMP_HANDLE           compHnd,
                          CORINFO_METHOD_INFO*  methodInfo,
                          void**                methodCodePtr,
                          ULONG*                methodCodeSize,
                          JitFlags*             compileFlags)
{
    // 省略部分代碼......
    pParam->result = pParam->pThis->compCompileHelper(pParam->classPtr, pParam->compHnd, pParam->methodInfo,
                                                      pParam->methodCodePtr, pParam->methodCodeSize,
                                                      pParam->compileFlags, pParam->instVerInfo);
    // 省略部分代碼......
    return param.result;
}

讓咱們繼續看Compiler::compCompileHelper
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/compiler.cpp#L5294

int Compiler::compCompileHelper(CORINFO_MODULE_HANDLE            classPtr,
                                COMP_HANDLE                      compHnd,
                                CORINFO_METHOD_INFO*             methodInfo,
                                void**                           methodCodePtr,
                                ULONG*                           methodCodeSize,
                                JitFlags*                        compileFlags,
                                CorInfoInstantiationVerification instVerInfo)
{
    // 省略部分代碼......
    // 初始化本地變量表
    lvaInitTypeRef();
    
    // 省略部分代碼......
    // 查找全部BasicBlock
    fgFindBasicBlocks();

    // 省略部分代碼......
    // 調用3個參數的compCompile函數，注意不是7個函數的compCompile函數
    compCompile(methodCodePtr, methodCodeSize, compileFlags);

    // 省略部分代碼......
    return CORJIT_OK;
}

如今到了3個參數的compCompile，這個函數被微軟認爲是JIT最被感興趣的入口函數
你能夠額外閱讀一下微軟的JIT介紹文檔
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/compiler.cpp#L4078

//*********************************************************************************************
// #Phases
//
// This is the most interesting 'toplevel' function in the JIT.  It goes through the operations of
// importing, morphing, optimizations and code generation.  This is called from the EE through the
// code:CILJit::compileMethod function.
//
// For an overview of the structure of the JIT, see:
//   https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md
//
void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, JitFlags* compileFlags)
{
    // 省略部分代碼......
    // 轉換BasicBlock(基本代碼塊)到GenTree(語句樹)
    fgImport();

    // 省略部分代碼......
    // 這裏會進行各個處理步驟(Phases)，如Inline和優化等
    
    // 省略部分代碼......
    // 轉換GT_ALLOCOBJ節點到GT_CALL節點(分配內存=調用幫助函數)
    ObjectAllocator objectAllocator(this);
    objectAllocator.Run();

    // 省略部分代碼......
    // 建立本地變量表和計算各個變量的引用計數
    lvaMarkLocalVars();

    // 省略部分代碼......
    // 具體化語句樹
    Lowering lower(this, m_pLinearScan); // PHASE_LOWERING
    lower.Run();

    // 省略部分代碼......
    // 生成機器碼
    codeGen->genGenerateCode(methodCodePtr, methodCodeSize);
}

到這裏你應該大概知道JIT在整體上作了什麼事情
接下來咱們來看Compiler::fgImport函數，這個函數負責把BasicBlock(基本代碼塊)轉換到GenTree(語句樹)
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/flowgraph.cpp#L6663

void Compiler::fgImport()
{
    // 省略部分代碼......
    impImport(fgFirstBB);
    // 省略部分代碼......
}

再看Compiler::impImport
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/importer.cpp#L9207

void Compiler::impImport(BasicBlock* method)
{
    // 省略部分代碼......
    /* Import blocks in the worker-list until there are no more */
    while (impPendingList)
    {
        PendingDsc* dsc = impPendingList;
        impPendingList  = impPendingList->pdNext;
        // 省略部分代碼......
        /* Now import the block */
        impImportBlock(dsc->pdBB);
    }
}

再看Compiler::impImportBlock
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/importer.cpp#L15321

//***************************************************************
// Import the instructions for the given basic block.  Perform
// verification, throwing an exception on failure.  Push any successor blocks that are enabled for the first
// time, or whose verification pre-state is changed.
void Compiler::impImportBlock(BasicBlock* block)
{
    // 省略部分代碼......
    pParam->pThis->impImportBlockCode(pParam->block);

}

在接下來的Compiler::impImportBlockCode函數裏面咱們終於能夠看到對CEE_NEWOBJ指令的處理了
這個函數有5000多行，推薦直接搜索case CEE_NEWOBJ來看如下的部分
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/importer.cpp#L9207

/*****************************************************************************
 *  Import the instr for the given basic block
 */
void Compiler::impImportBlockCode(BasicBlock* block)
{
    // 省略部分代碼......
    // 處理CEE_NEWOBJ指令
    case CEE_NEWOBJ:
        // 在這裏微軟給出了有三種狀況
        // 一種是對象是array，一種是對象有活動的長度（例如string），一種是普通的class
        // 在這裏咱們只分析第三種狀況
        // There are three different cases for new
        // Object size is variable (depends on arguments)
        //      1) Object is an array (arrays treated specially by the EE)
        //      2) Object is some other variable sized object (e.g. String)
        //      3) Class Size can be determined beforehand (normal case)
        // In the first case, we need to call a NEWOBJ helper (multinewarray)
        // in the second case we call the constructor with a '0' this pointer
        // In the third case we alloc the memory, then call the constuctor
        
        // 省略部分代碼......
        // 建立一個GT_ALLOCOBJ類型的GenTree(語句樹)節點，用於分配內存
        op1 = gtNewAllocObjNode(info.compCompHnd->getNewHelper(&resolvedToken, info.compMethodHnd),
                                resolvedToken.hClass, TYP_REF, op1);
        
        // 省略部分代碼......
        // 由於GT_ALLOCOBJ僅負責分配內存，咱們還須要調用構造函數
        // 這裏複用了CEE_CALL指令的處理
        goto CALL;

        // 省略部分代碼......
        CALL: // memberRef should be set.
        
            // 省略部分代碼......
            // 建立一個GT_CALL類型的GenTree(語句樹)節點，用於調用構造函數
            callTyp = impImportCall(opcode, &resolvedToken, constraintCall ? &constrainedResolvedToken : nullptr,
                                    newObjThisPtr, prefixFlags, &callInfo, opcodeOffs);

請記住上面代碼中新建的兩個GenTree(語句樹)節點

節點GT_ALLOCOBJ用於分配內存
節點GT_CALL用於調用構造函數

在上面的代碼咱們能夠看到在生成GT_ALLOCOBJ類型的節點時還傳入了一個newHelper參數，這個newHelper正是分配內存函數的一個標識(索引值)
在CoreCLR中有不少HelperFunc(幫助函數)供JIT生成的代碼調用
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jitinterface.cpp#L5894

CorInfoHelpFunc CEEInfo::getNewHelper(CORINFO_RESOLVED_TOKEN * pResolvedToken, CORINFO_METHOD_HANDLE callerHandle)
{
    // 省略部分代碼......
    MethodTable* pMT = VMClsHnd.AsMethodTable();
    
    // 省略部分代碼......
    result = getNewHelperStatic(pMT);
    
    // 省略部分代碼......
    return result;
}

看CEEInfo::getNewHelperStatic
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jitinterface.cpp#L5941

CorInfoHelpFunc CEEInfo::getNewHelperStatic(MethodTable * pMT)
{
    // 省略部分代碼......
    // 這裏有不少判斷，例如是不是Com對象或擁有析構函數，默認會返回CORINFO_HELP_NEWFAST
    // Slow helper is the default
    CorInfoHelpFunc helper = CORINFO_HELP_NEWFAST;
    
    // 省略部分代碼......
    return helper;
}

到這裏，咱們能夠知道新建的兩個節點帶有如下的信息

GT_ALLOCOBJ節點
- 分配內存的幫助函數標識，默認是CORINFO_HELP_NEWFAST
GT_CALL節點
- 構造函數的句柄

在使用fgImport生成了GenTree(語句樹)之後，還不能直接用這個樹來生成機器代碼，須要通過不少步的變換
其中的一步變換會把GT_ALLOCOBJ節點轉換爲GT_CALL節點，由於分配內存其實是一個對JIT專用的幫助函數的調用
這個變換在ObjectAllocator中實現，ObjectAllocator是JIT編譯過程當中的一個階段(Phase)
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/objectalloc.cpp#L27

void ObjectAllocator::DoPhase()
{
    // 省略部分代碼......
    MorphAllocObjNodes();
}

MorphAllocObjNodes用於查找全部節點，若是是GT_ALLOCOBJ則進行轉換
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/objectalloc.cpp#L63

void ObjectAllocator::MorphAllocObjNodes()
{
    // 省略部分代碼......
    for (GenTreeStmt* stmt = block->firstStmt(); stmt; stmt = stmt->gtNextStmt)
    {
        // 省略部分代碼......
        bool canonicalAllocObjFound = false;

        // 省略部分代碼......
        if (op2->OperGet() == GT_ALLOCOBJ)
            canonicalAllocObjFound = true;
        
        // 省略部分代碼......
        if (canonicalAllocObjFound)
        {
            // 省略部分代碼......
            op2 = MorphAllocObjNodeIntoHelperCall(asAllocObj);
        }
    }
}

MorphAllocObjNodeIntoHelperCall的定義
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/objectalloc.cpp#L152

// MorphAllocObjNodeIntoHelperCall: Morph a GT_ALLOCOBJ node into an
//                                  allocation helper call.
GenTreePtr ObjectAllocator::MorphAllocObjNodeIntoHelperCall(GenTreeAllocObj* allocObj)
{
    // 省略部分代碼......
    GenTreePtr helperCall = comp->fgMorphIntoHelperCall(allocObj, allocObj->gtNewHelper, comp->gtNewArgList(op1));
    return helperCall;
}

fgMorphIntoHelperCall的定義
這個函數轉換GT_ALLOCOBJ節點到GT_CALL節點，而且獲取指向分配內存的函數的指針
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/morph.cpp#L61

GenTreePtr Compiler::fgMorphIntoHelperCall(GenTreePtr tree, int helper, GenTreeArgList* args)
{
    tree->ChangeOper(GT_CALL);
    tree->gtFlags |= GTF_CALL;
    
    // 省略部分代碼......
    // 若是GT_ALLOCOBJ中幫助函數的標識是CORINFO_HELP_NEWFAST，這裏就是eeFindHelper(CORINFO_HELP_NEWFAST)
    // eeFindHelper會把幫助函數的表示轉換爲幫助函數的句柄
    tree->gtCall.gtCallType            = CT_HELPER;
    tree->gtCall.gtCallMethHnd         = eeFindHelper(helper);
    
    // 省略部分代碼......
    tree = fgMorphArgs(tree->AsCall());
    return tree;
}

到這裏，咱們能夠知道新建的兩個節點變成了這樣

GT_CALL節點 (調用幫助函數)
- 分配內存的幫助函數的句柄
GT_CALL節點 (調用Managed函數)
- 構造函數的句柄

接下來JIT還會對GenTree(語句樹)作出大量處理，這裏省略說明，接下來咱們來看機器碼的生成
函數CodeGen::genCallInstruction負責把GT_CALL節點轉換爲彙編
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/codegenxarch.cpp#L5934

// Produce code for a GT_CALL node
void CodeGen::genCallInstruction(GenTreePtr node)
{
    // 省略部分代碼......
    if (callType == CT_HELPER)
    {
        // 把句柄轉換爲幫助函數的句柄，默認是CORINFO_HELP_NEWFAST
        helperNum = compiler->eeGetHelperNum(methHnd);
        // 獲取指向幫助函數的指針
        // 這裏等於調用compiler->compGetHelperFtn(CORINFO_HELP_NEWFAST, ...)
        addr = compiler->compGetHelperFtn(helperNum, (void**)&pAddr);
    }
    else
    {
        // 調用普通函數
        // Direct call to a non-virtual user function.
        addr = call->gtDirectCallAddress;
    }
}

咱們來看下compGetHelperFtn究竟把CORINFO_HELP_NEWFAST轉換到了什麼函數
compGetHelperFtn的定義
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/compiler.hpp#L1907

void* Compiler::compGetHelperFtn(CorInfoHelpFunc ftnNum,        /* IN  */
                                 void**          ppIndirection) /* OUT */
{
    // 省略部分代碼......
    addr = info.compCompHnd->getHelperFtn(ftnNum, ppIndirection);
    return addr;
}

getHelperFtn的定義
這裏咱們能夠看到獲取了hlpDynamicFuncTable這個函數表中的函數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jitinterface.cpp#L10369

void* CEEJitInfo::getHelperFtn(CorInfoHelpFunc    ftnNum,         /* IN  */
                               void **            ppIndirection)  /* OUT */
{
    // 省略部分代碼......
    pfnHelper = hlpDynamicFuncTable[dynamicFtnNum].pfnHelper;

    // 省略部分代碼......
    result = (LPVOID)GetEEFuncEntryPoint(pfnHelper);
    return result;
}

hlpDynamicFuncTable函數表使用了jithelpers.h中的定義，其中CORINFO_HELP_NEWFAST對應的函數以下
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/inc/jithelpers.h#L78

JITHELPER(CORINFO_HELP_NEWFAST,                     JIT_New,    CORINFO_HELP_SIG_REG_ONLY)

能夠看到對應了JIT_New，這個就是JIT生成的代碼調用分配內存的函數了，JIT_New的定義以下
須要注意的是函數表中的JIT_New在知足必定條件時會被替換爲更快的實現，但做用和JIT_New是同樣的，這一塊將在後面說起
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jithelpers.cpp#L2908

HCIMPL1(Object*, JIT_New, CORINFO_CLASS_HANDLE typeHnd_)
{
    // 省略部分代碼......
    MethodTable *pMT = typeHnd.AsMethodTable();
    
    // 省略部分代碼......
    // AllocateObject是分配內存的函數，這個函數供CoreCLR的內部代碼或非託管代碼調用
    // JIT_New是對這個函數的一個包裝，僅供JIT生成的代碼調用
    newobj = AllocateObject(pMT);
    
    // 省略部分代碼......
    return(OBJECTREFToObject(newobj));
}
HCIMPLEND

總結:
JIT從CEE_NEWOBJ生成了兩段代碼，一段是調用JIT_New函數分配內存的代碼，一段是調用構造函數的代碼

第二種new(對array的new)生成了什麼機器碼

咱們來看一下CEE_NEWARR指令是怎樣處理的，由於前面已經花了很大篇幅介紹對CEE_NEWOBJ的處理，這裏僅列出不一樣的部分
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/importer.cpp#L13334

/*****************************************************************************
 *  Import the instr for the given basic block
 */
void Compiler::impImportBlockCode(BasicBlock* block)
{
    // 省略部分代碼......
    // 處理CEE_NEWARR指令
    case CEE_NEWARR:

        // 省略部分代碼......
        args = gtNewArgList(op1, op2);

        // 生成GT_CALL類型的節點調用幫助函數
        /* Create a call to 'new' */
        // Note that this only works for shared generic code because the same helper is used for all
        // reference array types
        op1 = gtNewHelperCallNode(info.compCompHnd->getNewArrHelper(resolvedToken.hClass), TYP_REF, 0, args);
}

咱們能夠看到CEE_NEWARR直接生成了GT_CALL節點，不像CEE_NEWOBJ須要進一步的轉換
getNewArrHelper返回了調用的幫助函數，咱們來看一下getNewArrHelper
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jitinterface.cpp#L6035

/***********************************************************************/
// <REVIEW> this only works for shared generic code because all the
// helpers are actually the same. If they were different then things might
// break because the same helper would end up getting used for different but
// representation-compatible arrays (e.g. one with a default constructor
// and one without) </REVIEW>
CorInfoHelpFunc CEEInfo::getNewArrHelper (CORINFO_CLASS_HANDLE arrayClsHnd)
{
    // 省略部分代碼......
    TypeHandle arrayType(arrayClsHnd);
    result = getNewArrHelperStatic(arrayType);
    
    // 省略部分代碼......
    return result;
}

再看getNewArrHelperStatic，咱們能夠看到通常狀況下會返回CORINFO_HELP_NEWARR_1_OBJ
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jitinterface.cpp#L6060

CorInfoHelpFunc CEEInfo::getNewArrHelperStatic(TypeHandle clsHnd)
{
    // 省略部分代碼......
    if (CorTypeInfo::IsGenericVariable(elemType))
    {
        result = CORINFO_HELP_NEWARR_1_OBJ;
    }
    else if (CorTypeInfo::IsObjRef(elemType))
    {
        // It is an array of object refs
        result = CORINFO_HELP_NEWARR_1_OBJ;
    }
    else
    {
        // These cases always must use the slow helper
        // 省略部分代碼......
    }
    return result;
{

CORINFO_HELP_NEWARR_1_OBJ對應的函數以下
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/inc/jithelpers.h#L86

DYNAMICJITHELPER(CORINFO_HELP_NEWARR_1_OBJ, JIT_NewArr1,CORINFO_HELP_SIG_REG_ONLY)

能夠看到對應了JIT_NewArr1這個包裝給JIT調用的幫助函數
和JIT_New同樣，在知足必定條件時會被替換爲更快的實現
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/jithelpers.cpp#L3303

HCIMPL2(Object*, JIT_NewArr1, CORINFO_CLASS_HANDLE arrayTypeHnd_, INT_PTR size)
{
    // 省略部分代碼......
    CorElementType elemType = pArrayClassRef->GetArrayElementTypeHandle().GetSignatureCorElementType();
    
    if (CorTypeInfo::IsPrimitiveType(elemType)
    {
        // 省略部分代碼......
        // 若是類型是基元類型(int, double等)則使用更快的FastAllocatePrimitiveArray函數
        newArray = FastAllocatePrimitiveArray(pArrayClassRef->GetMethodTable(), static_cast<DWORD>(size), bAllocateInLargeHeap);
    }
    else
    {
        // 省略部分代碼......
        // 默認使用AllocateArrayEx函數
        INT32 size32 = (INT32)size;
        newArray = AllocateArrayEx(typeHnd, &size32, 1);
    }
    
    // 省略部分代碼......
    return(OBJECTREFToObject(newArray));
}
HCIMPLEND

總結:
JIT從CEE_NEWARR只生成了一段代碼，就是調用JIT_NewArr1函數的代碼

第三種new(對struct的new)生成了什麼機器碼

這種new會在棧(stack)分配內存，因此不須要調用任何分配內存的函數
在一開始的例子中，myStruct在編譯時就已經定義爲一個本地變量，對本地變量的須要的內存會在函數剛進入的時候一併分配
這裏咱們先來看本地變量所須要的內存是怎麼計算的

先看Compiler::lvaAssignVirtualFrameOffsetsToLocals
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/lclvars.cpp#L4863

/*****************************************************************************
 *  lvaAssignVirtualFrameOffsetsToLocals() : Assign virtual stack offsets to
 *  locals, temps, and anything else.  These will all be negative offsets
 *  (stack grows down) relative to the virtual '0'/return address
 */
void Compiler::lvaAssignVirtualFrameOffsetsToLocals()
{
    // 省略部分代碼......
    for (cur = 0; alloc_order[cur]; cur++)
    {
        // 省略部分代碼......
        for (lclNum = 0, varDsc = lvaTable; lclNum < lvaCount; lclNum++, varDsc++)
        {
            // 省略部分代碼......
            // Reserve the stack space for this variable
            stkOffs = lvaAllocLocalAndSetVirtualOffset(lclNum, lvaLclSize(lclNum), stkOffs);
        }
    }
}

再看Compiler::lvaAllocLocalAndSetVirtualOffset
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/lclvars.cpp#L5537

int Compiler::lvaAllocLocalAndSetVirtualOffset(unsigned lclNum, unsigned size, int stkOffs)
{
    // 省略部分代碼......
    /* Reserve space on the stack by bumping the frame size */
    lvaIncrementFrameSize(size);
    stkOffs -= size;
    lvaTable[lclNum].lvStkOffs = stkOffs;

    // 省略部分代碼......
    return stkOffs;
}

再看Compiler::lvaIncrementFrameSize
咱們能夠看到最終會加到compLclFrameSize這個變量中，這個變量就是當前函數總共須要在棧(Stack)分配的內存大小
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/lclvars.cpp#L3528

inline void Compiler::lvaIncrementFrameSize(unsigned size)
{
    if (size > MAX_FrameSize || compLclFrameSize + size > MAX_FrameSize)
    {
        BADCODE("Frame size overflow");
    }
    compLclFrameSize += size;
}

如今來看生成機器碼的代碼，在棧分配內存的代碼會在CodeGen::genFnProlog生成
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/codegencommon.cpp#L8140

void CodeGen::genFnProlog()
{
    // 省略部分代碼......
    // ARM64和其餘平臺的調用時機不同，可是參數同樣
    genAllocLclFrame(compiler->compLclFrameSize, initReg, &initRegZeroed, intRegState.rsCalleeRegArgMaskLiveIn);
}

再看CodeGen::genAllocLclFrame，這裏就是分配棧內存的代碼了，簡單的rsp(esp)減去了frameSize
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/jit/codegencommon.cpp#L5846

/*-----------------------------------------------------------------------------
 *
 *  Probe the stack and allocate the local stack frame: subtract from SP.
 *  On ARM64, this only does the probing; allocating the frame is done when callee-saved registers are saved.
 */
void CodeGen::genAllocLclFrame(unsigned frameSize, regNumber initReg, bool* pInitRegZeroed, regMaskTP maskArgRegsLiveIn)
{
    // 省略部分代碼......
    //      sub esp, frameSize   6
    inst_RV_IV(INS_sub, REG_SPBASE, frameSize, EA_PTRSIZE);
}

總結:
JIT對struct的new會生成統一在棧分配內存的代碼，因此你在IL中看不到new struct的指令
調用構造函數的代碼會從後面的call指令生成

第一種new(對class的new)作了什麼

從上面的分析咱們能夠知道第一種new先調用JIT_New分配內存，而後調用構造函數
在上面JIT_New的源代碼中能夠看到，JIT_New內部調用了AllocateObject

先看AllocateObject函數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L931

// AllocateObject will throw OutOfMemoryException so don't need to check
// for NULL return value from it.
OBJECTREF AllocateObject(MethodTable *pMT
#ifdef FEATURE_COMINTEROP
                         , bool fHandleCom
#endif
    )
{
    // 省略部分代碼......
    Object     *orObject = NULL;
    
    // 若是類型有重要的析構函數，預編譯全部相關的函數(詳細能夠搜索CER)
    // 同一個類型只會處理一次
    if (pMT->HasCriticalFinalizer())
        PrepareCriticalFinalizerObject(pMT);

    // 省略部分代碼......
    DWORD baseSize = pMT->GetBaseSize();

    // 調用gc的幫助函數分配內存，若是須要向8對齊則調用AllocAlign8，不然調用Alloc
    if (pMT->RequiresAlign8())
    {
        // 省略部分代碼......
        orObject = (Object *) AllocAlign8(baseSize,
                                          pMT->HasFinalizer(),
                                          pMT->ContainsPointers(),
                                          pMT->IsValueType());
    }
    else
    {
        orObject = (Object *) Alloc(baseSize,
                                    pMT->HasFinalizer(),
                                    pMT->ContainsPointers());
    }

    // 檢查同步塊索引(SyncBlock)是否爲0
    // verify zero'd memory (at least for sync block)
    _ASSERTE( orObject->HasEmptySyncBlockInfo() );

    // 設置類型信息(MethodTable)
    if ((baseSize >= LARGE_OBJECT_SIZE))
    {
        orObject->SetMethodTableForLargeObject(pMT);
        GCHeap::GetGCHeap()->PublishObject((BYTE*)orObject);
    }
    else
    {
        orObject->SetMethodTable(pMT);
    }
    
    // 省略部分代碼......
    return UNCHECKED_OBJECTREF_TO_OBJECTREF(oref);
}

再看Alloc函數
源代碼:

// There are only three ways to get into allocate an object.
//     * Call optimized helpers that were generated on the fly. This is how JIT compiled code does most
//         allocations, however they fall back code:Alloc, when for all but the most common code paths. These
//         helpers are NOT used if profiler has asked to track GC allocation (see code:TrackAllocations)
//     * Call code:Alloc - When the jit helpers fall back, or we do allocations within the runtime code
//         itself, we ultimately call here.
//     * Call code:AllocLHeap - Used very rarely to force allocation to be on the large object heap.
//     
// While this is a choke point into allocating an object, it is primitive (it does not want to know about
// MethodTable and thus does not initialize that poitner. It also does not know if the object is finalizable
// or contains pointers. Thus we quickly wrap this function in more user-friendly ones that know about
// MethodTables etc. (see code:FastAllocatePrimitiveArray code:AllocateArrayEx code:AllocateObject)
// 
// You can get an exhaustive list of code sites that allocate GC objects by finding all calls to
// code:ProfilerObjectAllocatedCallback (since the profiler has to hook them all).
inline Object* Alloc(size_t size, BOOL bFinalize, BOOL bContainsPointers )
{
    // 省略部分代碼......
    // We don't want to throw an SO during the GC, so make sure we have plenty
    // of stack before calling in.
    INTERIOR_STACK_PROBE_FOR(GetThread(), static_cast<unsigned>(DEFAULT_ENTRY_PROBE_AMOUNT * 1.5));
    if (GCHeapUtilities::UseAllocationContexts())
        retVal = GCHeapUtilities::GetGCHeap()->Alloc(GetThreadAllocContext(), size, flags);
    else
        retVal = GCHeapUtilities::GetGCHeap()->Alloc(size, flags);

    if (!retVal)
    {
        ThrowOutOfMemory();
    }

    END_INTERIOR_STACK_PROBE;
    return retVal;
}

總結:
第一種new作的事情主要有

調用JIT_New
- 從GCHeap中申請一塊內存
- 設置類型信息(MethodTable)
- 同步塊索引默認爲0，不須要設置
調用構造函數

第二種new(對array的new)作了什麼

第二種new只調用了JIT_NewArr1，從上面JIT_NewArr1的源代碼能夠看到
若是元素的類型是基元類型(int, double等)則會調用FastAllocatePrimitiveArray，不然會調用AllocateArrayEx

先看FastAllocatePrimitiveArray函數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L563

/*
 * Allocates a single dimensional array of primitive types.
 */
OBJECTREF   FastAllocatePrimitiveArray(MethodTable* pMT, DWORD cElements, BOOL bAllocateInLargeHeap)
{
    // 省略部分代碼......
    // 檢查元素數量不能大於一個硬性限制
    SIZE_T componentSize = pMT->GetComponentSize();
    if (cElements > MaxArrayLength(componentSize))
        ThrowOutOfMemory();

    // 檢查總大小不能溢出
    S_SIZE_T safeTotalSize = S_SIZE_T(cElements) * S_SIZE_T(componentSize) + S_SIZE_T(pMT->GetBaseSize());
    if (safeTotalSize.IsOverflow())
        ThrowOutOfMemory();

    size_t totalSize = safeTotalSize.Value();

    // 省略部分代碼......
    // 調用gc的幫助函數分配內存
    ArrayBase* orObject;
    if (bAllocateInLargeHeap)
    {
        orObject = (ArrayBase*) AllocLHeap(totalSize, FALSE, FALSE);
    }
    else 
    {
        ArrayTypeDesc *pArrayR8TypeDesc = g_pPredefinedArrayTypes[ELEMENT_TYPE_R8];
        if (DATA_ALIGNMENT < sizeof(double) && pArrayR8TypeDesc != NULL && pMT == pArrayR8TypeDesc->GetMethodTable() && totalSize < LARGE_OBJECT_SIZE - MIN_OBJECT_SIZE) 
        {
            // Creation of an array of doubles, not in the large object heap.
            // We want to align the doubles to 8 byte boundaries, but the GC gives us pointers aligned
            // to 4 bytes only (on 32 bit platforms). To align, we ask for 12 bytes more to fill with a
            // dummy object.
            // If the GC gives us a 8 byte aligned address, we use it for the array and place the dummy
            // object after the array, otherwise we put the dummy object first, shifting the base of
            // the array to an 8 byte aligned address.
            // Note: on 64 bit platforms, the GC always returns 8 byte aligned addresses, and we don't
            // execute this code because DATA_ALIGNMENT < sizeof(double) is false.

            _ASSERTE(DATA_ALIGNMENT == sizeof(double)/2);
            _ASSERTE((MIN_OBJECT_SIZE % sizeof(double)) == DATA_ALIGNMENT);   // used to change alignment
            _ASSERTE(pMT->GetComponentSize() == sizeof(double));
            _ASSERTE(g_pObjectClass->GetBaseSize() == MIN_OBJECT_SIZE);
            _ASSERTE(totalSize < totalSize + MIN_OBJECT_SIZE);
            orObject = (ArrayBase*) Alloc(totalSize + MIN_OBJECT_SIZE, FALSE, FALSE);

            Object *orDummyObject;
            if((size_t)orObject % sizeof(double))
            {
                orDummyObject = orObject;
                orObject = (ArrayBase*) ((size_t)orObject + MIN_OBJECT_SIZE);
            }
            else
            {
                orDummyObject = (Object*) ((size_t)orObject + totalSize);
            }
            _ASSERTE(((size_t)orObject % sizeof(double)) == 0);
            orDummyObject->SetMethodTable(g_pObjectClass);
        }
        else
        {
            orObject = (ArrayBase*) Alloc(totalSize, FALSE, FALSE);
            bPublish = (totalSize >= LARGE_OBJECT_SIZE);
        }
    }

    // 設置類型信息(MethodTable)
    // Initialize Object
    orObject->SetMethodTable( pMT );
    _ASSERTE(orObject->GetMethodTable() != NULL);
    
    // 設置數組長度
    orObject->m_NumComponents = cElements;

    // 省略部分代碼......
    return( ObjectToOBJECTREF((Object*)orObject) );
}

再看AllocateArrayEx函數，這個函數比起上面的函數多出了對多維數組的處理
JIT_NewArr1調用AllocateArrayEx時傳了3個參數，剩下2個參數是可選參數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L282

// Handles arrays of arbitrary dimensions
//
// If dwNumArgs is set to greater than 1 for a SZARRAY this function will recursively 
// allocate sub-arrays and fill them in.  
//
// For arrays with lower bounds, pBounds is <lower bound 1>, <count 1>, <lower bound 2>, ...
OBJECTREF AllocateArrayEx(TypeHandle arrayType, INT32 *pArgs, DWORD dwNumArgs, BOOL bAllocateInLargeHeap 
                          DEBUG_ARG(BOOL bDontSetAppDomain))
{
    // 省略部分代碼......
    ArrayBase * orArray = NULL;

    // 省略部分代碼......
    // 調用gc的幫助函數分配內存
    if (bAllocateInLargeHeap)
    {
        orArray = (ArrayBase *) AllocLHeap(totalSize, FALSE, pArrayMT->ContainsPointers());
        // 設置類型信息(MethodTable)
        orArray->SetMethodTableForLargeObject(pArrayMT);
    }
    else
    {
#ifdef FEATURE_64BIT_ALIGNMENT
        MethodTable *pElementMT = arrayDesc->GetTypeParam().GetMethodTable();
        if (pElementMT->RequiresAlign8() && pElementMT->IsValueType())
        {
            // This platform requires that certain fields are 8-byte aligned (and the runtime doesn't provide
            // this guarantee implicitly, e.g. on 32-bit platforms). Since it's the array payload, not the
            // header that requires alignment we need to be careful. However it just so happens that all the
            // cases we care about (single and multi-dim arrays of value types) have an even number of DWORDs
            // in their headers so the alignment requirements for the header and the payload are the same.
            _ASSERTE(((pArrayMT->GetBaseSize() - SIZEOF_OBJHEADER) & 7) == 0);
            orArray = (ArrayBase *) AllocAlign8(totalSize, FALSE, pArrayMT->ContainsPointers(), FALSE);
        }
        else
#endif
        {
            orArray = (ArrayBase *) Alloc(totalSize, FALSE, pArrayMT->ContainsPointers());
        }
        // 設置類型信息(MethodTable)
        orArray->SetMethodTable(pArrayMT);
    }
    
    // 設置數組長度
    // Initialize Object
    orArray->m_NumComponents = cElements;

    // 省略部分代碼......
    return ObjectToOBJECTREF((Object *) orArray);
}

總結:
第二種new作的事情主要有

調用JIT_NewArr1
- 從GCHeap中申請一塊內存
- 設置類型信息(MethodTable)
- 設置數組長度(m_NumComponents)
- 不會調用構造函數，因此全部內容都會爲0（全部成員都會爲默認值）

第三種new(對struct的new)作了什麼

對struct的new不會從GCHeap申請內存，也不會設置類型信息(MethodTable)，因此能夠直接進入總結

總結:
第三種new作的事情主要有

在進入函數時統一從棧(Stack)分配內存
- 分配的內存不會包含同步塊索引(SyncBlock)和類型信息(MethodTable)
調用構造函數

驗證第一種new(對class的new)

打開VS反彙編和內存窗口，讓咱們來看看第一種new實際作了什麼事情

第一種new的反彙編結果以下，一共有兩個call

00007FF919570B53  mov         rcx,7FF9194161A0h  // 設置第一個參數(指向MethodTable的指針)
00007FF919570B5D  call        00007FF97905E350  // 調用分配內存的函數，默認是JIT_New
00007FF919570B62  mov         qword ptr [rbp+38h],rax  // 把地址設置到臨時變量(rbp+38)
00007FF919570B66  mov         r8,37BFC73068h  
00007FF919570B70  mov         r8,qword ptr [r8]  // 設置第三個參數("hello")
00007FF919570B73  mov         rcx,qword ptr [rbp+38h]  // 設置第一個參數(this)
00007FF919570B77  mov         edx,12345678h  // 設置第二個參數(0x12345678)
00007FF919570B7C  call        00007FF9195700B8  // 調用構造函數
00007FF919570B81  mov         rcx,qword ptr [rbp+38h]  
00007FF919570B85  mov         qword ptr [rbp+50h],rcx  // 把臨時變量複製到myClass變量中

第一個call是分配內存使用的幫助函數，默認調用JIT_New
可是這裏實際調用的不是JIT_New而是JIT_TrialAllocSFastMP_InlineGetThread函數，這是一個優化版本容許分配上下文中快速分配內存
咱們來看一下JIT_TrialAllocSFastMP_InlineGetThread函數的定義

源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/amd64/JitHelpers_InlineGetThread.asm#L59

; IN: rcx: MethodTable*
; OUT: rax: new object
LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT
        mov     edx, [rcx + OFFSET__MethodTable__m_BaseSize] // 從MethodTable獲取須要分配的內存大小，放到edx

        ; m_BaseSize is guaranteed to be a multiple of 8.
        PATCHABLE_INLINE_GETTHREAD r11, JIT_TrialAllocSFastMP_InlineGetThread__PatchTLSOffset
        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__alloc_limit] // 獲取分配上下文的限制地址，放到r10
        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] // 獲取分配上下文的當前地址，放到rax

        add     rdx, rax // 地址 + 須要分配的內存大小，放到rdx

        cmp     rdx, r10 // 判斷是否能夠從分配上下文分配內存
        ja      AllocFailed // if (rdx > r10)

        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx // 設置新的當前地址
        mov     [rax], rcx // 給剛剛分配到的內存設置MethodTable

ifdef _DEBUG
        call    DEBUG_TrialAllocSetAppDomain_NoScratchArea
endif ; _DEBUG

        ret // 分配成功，返回

    AllocFailed:
        jmp     JIT_NEW // 分配失敗，調用默認的JIT_New函數
LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT

能夠當分配上下文未用完時會從分配上下文中分配，但用完時會調用JIT_New作更多的處理
第二個call調用構造函數，call的地址和下面的地址不一致多是由於中間有一層包裝，目前還未解明包裝中的處理

最後一個call調用的是JIT_WriteBarrier

驗證第二種new(對array的new)

反彙編能夠看到第二種new只有一個call

00007FF919570B93  mov         rcx,7FF9195B4CFAh  // 設置第一個參數(指向MethodTable的指針)
00007FF919570B9D  mov         edx,378h  // 設置第二個參數(數組的大小)
00007FF919570BA2  call        00007FF97905E440  // 調用分配內存的函數，默認是JIT_NewArr1
00007FF919570BA7  mov         qword ptr [rbp+30h],rax  // 設置到臨時變量(rbp+30)
00007FF919570BAB  mov         rcx,qword ptr [rbp+30h]  
00007FF919570BAF  mov         qword ptr [rbp+48h],rcx  // 把臨時變量複製到myArray變量中

call實際調用的是JIT_NewArr1VC_MP_InlineGetThread這個函數
和JIT_TrialAllocSFastMP_InlineGetThread同樣，一樣是從分配上下文中快速分配內存的函數
源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/amd64/JitHelpers_InlineGetThread.asm#L207
具體代碼這裏就再也不分析，有興趣的能夠去閱讀上面的源代碼

驗證第三種new(對struct的new)

對struct的new會在函數進入的時候從棧分配內存，這裏是減小rsp寄存器(棧頂)的值

00007FF919570B22  push        rsi  // 保存原rsi
00007FF919570B23  sub         rsp,60h  // 從棧分配內存
00007FF919570B27  mov         rbp,rsp  // 複製值到rbp
00007FF919570B2A  mov         rsi,rcx  // 保存原rcx到rsi
00007FF919570B2D  lea         rdi,[rbp+28h]  // rdi = rbp+28，有28個字節須要清零
00007FF919570B31  mov         ecx,0Eh  // rcx = 14 (計數)
00007FF919570B36  xor         eax,eax  // eax = 0
00007FF919570B38  rep stos    dword ptr [rdi]  // 把eax的值(short)設置到rdi直到rcx爲0，總共清空14*2=28個字節
00007FF919570B3A  mov         rcx,rsi  // 恢復原rcx

由於分配的內存已經在棧裏面，後面只須要直接調構造函數

00007FF919570BBD  lea         rcx,[rbp+40h]  // 第一個參數 (this)
00007FF919570BC1  mov         edx,55667788h  // 第二個參數 (0x55667788)
00007FF919570BC6  call        00007FF9195700A0 // 調用構造函數

構造函數的反編譯

中間有一個call 00007FF97942E260調用的是JIT_DbgIsJustMyCode

在函數結束時會自動釋放從棧分配的內存，在最後會讓rsp = rbp + 0x60，這樣rsp就恢復原值了