x264源代碼簡單分析：宏塊分析（Analysis）部分-幀內宏塊（Intra）

時間 2020-12-26

原文原文鏈接

本文轉自：http://blog.csdn.net/leixiaohua1020/article/details/45870269 歡迎訪問原處！

本文記錄x264的 x264_slice_write()函數中調用的x264_macroblock_analyse()的源代碼。x264_macroblock_analyse()對應着x264中的分析模塊。分析模塊主要完成了下面2個方面的功能：

（1）對於幀內宏塊，分析幀內預測模式
（2）對於幀間宏塊，進行運動估計，分析幀間預測模式

由於分析模塊比較複雜，因此分成兩篇文章記錄其中的源代碼：本文記錄幀內宏塊預測模式的分析，下一篇文章記錄幀間宏塊預測模式的分析。

函數調用關係圖

宏塊分析（Analysis）部分的源代碼在整個x264中的位置如下圖所示。

單擊查看更清晰的圖片

宏塊分析（Analysis）部分的函數調用關係如下圖所示。

單擊查看更清晰的圖片

從圖中可以看出，分析模塊的x264_macroblock_analyse()調用瞭如下函數（只列舉了幾個有代表性的函數）：

x264_mb_analyse_init()：Analysis模塊初始化。
x264_mb_analyse_intra()：Intra宏塊幀內預測模式分析。
x264_macroblock_probe_pskip()：分析是否是skip模式。
x264_mb_analyse_inter_p16x16()：P16x16宏塊幀間預測模式分析。
x264_mb_analyse_inter_p8x8()：P8x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_p16x8()：P16x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_b16x16()：B16x16宏塊幀間預測模式分析。
x264_mb_analyse_inter_b8x8()：B8x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_b16x8()：B16x8宏塊幀間預測模式分析。

本文重點分析其中幀內宏塊（Intra宏塊）的分析函數x264_mb_analyse_intra()。下一篇文章再對x264_mb_analyse_inter_p16x16()等一系列幀間宏塊的分析函數。

x264_slice_write()

x264_slice_write()是x264項目的核心，它完成了編碼了一個Slice的工作。有關該函數的分析可以參考文章《 x264源代碼簡單分析：x264_slice_write() 》。本文分析其調用的x264_mb_analyse()函數。

x264_macroblock_analyse()

x264_macroblock_analyse()用於分析宏塊的預測模式。該函數的定義位於encoder\analyse.c，如下所示。

[cpp]view plain copy 
    
 /**************************************************************************** 
  * 分析-幀內預測模式選擇、幀間運動估計等 
  * 
  * 註釋和處理：雷霄驊 
  * http://blog.csdn.net/leixiaohua1020 
  * [email protected] 
  ****************************************************************************/  
 void x264_macroblock_analyse( x264_t *h )  
 {  
     x264_mb_analysis_t analysis;  
     int i_cost = COST_MAX;  
     //通過碼率控制方法，獲取本宏塊QP  
     h->mb.i_qp = x264_ratecontrol_mb_qp( h );  
     /* If the QP of this MB is within 1 of the previous MB, code the same QP as the previous MB, 
      * to lower the bit cost of the qp_delta.  Don't do this if QPRD is enabled. */  
     if( h->param.rc.i_aq_mode && h->param.analyse.i_subpel_refine < 10 )  
         h->mb.i_qp = abs(h->mb.i_qp - h->mb.i_last_qp) == 1 ? h->mb.i_last_qp : h->mb.i_qp;  
   
     if( h->param.analyse.b_mb_info )  
         h->fdec->effective_qp[h->mb.i_mb_xy] = h->mb.i_qp; /* Store the real analysis QP. */  
     //初始化  
     x264_mb_analyse_init( h, &analysis, h->mb.i_qp );  
   
     /*--------------------------- Do the analysis ---------------------------*/  
     //I幀：只使用幀內預測，分別計算亮度16x16（4種）和4x4（9種）所有模式的代價值，選出代價最小的模式  
   
     //P幀：計算幀內模式和幀間模式（ P Slice允許有Intra宏塊和P宏塊；同理B幀也支持Intra宏塊）。  
     //對P幀的每一種分割進行幀間預測，得到最佳的運動矢量及最佳匹配塊。  
     //幀間預測過程：選出最佳矢量——>找到最佳的整像素點——>找到最佳的二分之一像素點——>找到最佳的1/4像素點  
     //然後取代價最小的爲最佳MV和分割方式  
     //最後從幀內模式和幀間模式中選擇代價比較小的方式（有可能沒有找到很好的匹配塊，這時候就直接使用幀內預測而不是幀間預測）。  
   
     if( h->sh.i_type == SLICE_TYPE_I )  
     {  
         //I slice  
         //通過一系列幀內預測模式（16x16的4種,4x4的9種）代價的計算得出代價最小的最優模式  
 intra_analysis:  
         if( analysis.i_mbrd )  
             x264_mb_init_fenc_cache( h, analysis.i_mbrd >= 2 );  
         //幀內預測分析  
         //從16×16的SAD,4個8×8的SAD和，16個4×4SAD中選出最優方式  
         x264_mb_analyse_intra( h, &analysis, COST_MAX );  
         if( analysis.i_mbrd )  
             x264_intra_rd( h, &analysis, COST_MAX );  
         //分析結果都存儲在analysis結構體中  
         //開銷  
         i_cost = analysis.i_satd_i16x16;  
         h->mb.i_type = I_16x16;  
         //如果I4x4或者I8x8開銷更小的話就拷貝  
         //copy if little  
         COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, h->mb.i_type, I_4x4 );  
         COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, h->mb.i_type, I_8x8 );  
         //畫面極其特殊的時候，纔有可能用到PCM  
         if( analysis.i_satd_pcm < i_cost )  
             h->mb.i_type = I_PCM;  
   
         else if( analysis.i_mbrd >= 2 )  
             x264_intra_rd_refine( h, &analysis );  
     }  
     else if( h->sh.i_type == SLICE_TYPE_P )  
     {  
         //P slice  
   
         int b_skip = 0;  
   
         h->mc.prefetch_ref( h->mb.pic.p_fref[0][0][h->mb.i_mb_x&3], h->mb.pic.i_stride[0], 0 );  
   
         analysis.b_try_skip = 0;  
         if( analysis.b_force_intra )  
         {  
             if( !h->param.analyse.b_psy )  
             {  
                 x264_mb_analyse_init_qp( h, &analysis, X264_MAX( h->mb.i_qp - h->mb.ip_offset, h->param.rc.i_qp_min ) );  
                 goto intra_analysis;  
             }  
         }  
         else  
         {  
             /* Special fast-skip logic using information from mb_info. */  
             if( h->fdec->mb_info && (h->fdec->mb_info[h->mb.i_mb_xy]&X264_MBINFO_CONSTANT) )  
             {  
                 if( !SLICE_MBAFF && (h->fdec->i_frame - h->fref[0][0]->i_frame) == 1 && !h->sh.b_weighted_pred &&  
                     h->fref[0][0]->effective_qp[h->mb.i_mb_xy] <= h->mb.i_qp )  
                 {  
                     h->mb.i_partition = D_16x16;  
                     /* Use the P-SKIP MV if we can... */  
                     if( !M32(h->mb.cache.pskip_mv) )  
                     {  
                         b_skip = 1;  
                         h->mb.i_type = P_SKIP;  
                     }  
                     /* Otherwise, just force a 16x16 block. */  
                     else  
                     {  
                         h->mb.i_type = P_L0;  
                         analysis.l0.me16x16.i_ref = 0;  
                         M32( analysis.l0.me16x16.mv ) = 0;  
                     }  
                     goto skip_analysis;  
                 }  
                 /* Reset the information accordingly */  
                 else if( h->param.analyse.b_mb_info_update )  
                     h->fdec->mb_info[h->mb.i_mb_xy] &= ~X264_MBINFO_CONSTANT;  
             }  
   
             int skip_invalid = h->i_thread_frames > 1 && h->mb.cache.pskip_mv[1] > h->mb.mv_max_spel[1];  
             /* If the current macroblock is off the frame, just skip it. */  
             if( HAVE_INTERLACED && !MB_INTERLACED && h->mb.i_mb_y * 16 >= h->param.i_height && !skip_invalid )  
                 b_skip = 1;  
             /* Fast P_SKIP detection */  
             else if( h->param.analyse.b_fast_pskip )  
             {  
                 if( skip_invalid )  
                     // FIXME don't need to check this if the reference frame is done  
                     {}  
                 else if( h->param.analyse.i_subpel_refine >= 3 )  
                     analysis.b_try_skip = 1;  
                 else if( h->mb.i_mb_type_left[0] == P_SKIP ||  
                          h->mb.i_mb_type_top == P_SKIP ||  
                          h->mb.i_mb_type_topleft == P_SKIP ||  
                          h->mb.i_mb_type_topright == P_SKIP )  
                     b_skip = x264_macroblock_probe_pskip( h );//檢查是否是Skip類型  
             }  
         }  
   
         h->mc.prefetch_ref( h->mb.pic.p_fref[0][0][h->mb.i_mb_x&3], h->mb.pic.i_stride[0], 1 );  
   
         if( b_skip )  
         {  
             h->mb.i_type = P_SKIP;  
             h->mb.i_partition = D_16x16;  
             assert( h->mb.cache.pskip_mv[1] <= h->mb.mv_max_spel[1] || h->i_thread_frames == 1 );  
 skip_analysis:  
             /* Set up MVs for future predictors */  
             for( int i = 0; i < h->mb.pic.i_fref[0]; i++ )  
                 M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;  
         }  
         else  
         {  
             const unsigned int flags = h->param.analyse.inter;  
             int i_type;  
             int i_partition;  
             int i_satd_inter, i_satd_intra;  
   
             x264_mb_analyse_load_costs( h, &analysis );  
             /* 
              * 16x16 幀間預測宏塊分析-P 
              * 
              * +--------+--------+ 
              * |                 | 
              * |                 | 
              * |                 | 
              * +        +        + 
              * |                 | 
              * |                 | 
              * |                 | 
              * +--------+--------+ 
              * 
              */  
             x264_mb_analyse_inter_p16x16( h, &analysis );  
   
             if( h->mb.i_type == P_SKIP )  
             {  
                 for( int i = 1; i < h->mb.pic.i_fref[0]; i++ )  
                     M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;  
                 return;  
             }  
   
             if( flags & X264_ANALYSE_PSUB16x16 )  
             {  
                 if( h->param.analyse.b_mixed_references )  
                     x264_mb_analyse_inter_p8x8_mixed_ref( h, &analysis );  
                 else{  
                     /* 
                      * 8x8幀間預測宏塊分析-P 
                      * +--------+ 
                      * |        | 
                      * |        | 
                      * |        | 
                      * +--------+ 
                      */  
                     x264_mb_analyse_inter_p8x8( h, &analysis );  
                 }  
             }  
   
             /* Select best inter mode */  
             i_type = P_L0;  
             i_partition = D_16x16;  
             i_cost = analysis.l0.me16x16.cost;  
   
             //如果8x8的代價值小於16x16  
             //則進行8x8子塊分割的處理  
   
             //處理的數據源自於l0  
             if( ( flags & X264_ANALYSE_PSUB16x16 ) && (!analysis.b_early_terminate ||  
                 analysis.l0.i_cost8x8 < analysis.l0.me16x16.cost) )  
             {  
                 i_type = P_8x8;  
                 i_partition = D_8x8;  
                 i_cost = analysis.l0.i_cost8x8;  
   
                 /* Do sub 8x8 */  
                 if( flags & X264_ANALYSE_PSUB8x8 )  
                 {  
                     for( int i = 0; i < 4; i++ )  
                     {  
                         //8x8塊的子塊的分析  
                         /* 
                          * 4x4 
                          * +----+----+ 
                          * |    |    | 
                          * +----+----+ 
                          * |    |    | 
                          * +----+----+ 
                          * 
                          */  
                         x264_mb_analyse_inter_p4x4( h, &analysis, i );  
                         int i_thresh8x4 = analysis.l0.me4x4[i][1].cost_mv + analysis.l0.me4x4[i][2].cost_mv;  
                         //如果4x4小於8x8  
                         //則再分析8x4，4x8的代價  
                         if( !analysis.b_early_terminate || analysis.l0.i_cost4x4[i] < analysis.l0.me8x8[i].cost + i_thresh8x4 )  
                         {  
                             int i_cost8x8 = analysis.l0.i_cost4x4[i];  
                             h->mb.i_sub_partition[i] = D_L0_4x4;  
                             /* 
                              * 8x4 
                              * +----+----+ 
                              * |         | 
                              * +----+----+ 
                              * |         | 
                              * +----+----+ 
                              * 
                              */  
                             //如果8x4小於8x8  
                             x264_mb_analyse_inter_p8x4( h, &analysis, i );  
                             COPY2_IF_LT( i_cost8x8, analysis.l0.i_cost8x4[i],  
                                          h->mb.i_sub_partition[i], D_L0_8x4 );  
                             /* 
                              * 4x8 
                              * +----+----+ 
                              * |    |    | 
                              * +    +    + 
                              * |    |    | 
                              * +----+----+ 
                              * 
                              */  
                             //如果4x8小於8x8  
                             x264_mb_analyse_inter_p4x8( h, &analysis, i );  
                             COPY2_IF_LT( i_cost8x8, analysis.l0.i_cost4x8[i],  
                                          h->mb.i_sub_partition[i], D_L0_4x8 );  
   
                             i_cost += i_cost8x8 - analysis.l0.me8x8[i].cost;  
                         }  
                         x264_mb_cache_mv_p8x8( h, &analysis, i );  
                     }  
                     analysis.l0.i_cost8x8 = i_cost;  
                 }  
             }  
   
             /* Now do 16x8/8x16 */  
             int i_thresh16x8 = analysis.l0.me8x8[1].cost_mv + analysis.l0.me8x8[2].cost_mv;  
   
             //前提要求8x8的代價值小於16x16  
             if( ( flags & X264_ANALYSE_PSUB16x16 ) && (!analysis.b_early_terminate ||  
                 analysis.l0.i_cost8x8 < analysis.l0.me16x16.cost + i_thresh16x8) )  
             {  
                 int i_avg_mv_ref_cost = (analysis.l0.me8x8[2].cost_mv + analysis.l0.me8x8[2].i_ref_cost  
                                       + analysis.l0.me8x8[3].cost_mv + analysis.l0.me8x8[3].i_ref_cost + 1) >> 1;  
                 analysis.i_cost_est16x8[1] = analysis.i_satd8x8[0][2] + analysis.i_satd8x8[0][3] + i_avg_mv_ref_cost;  
                 /* 
                  * 16x8 宏塊劃分 
                  * 
                  * +--------+--------+ 
                  * |        |        | 
                  * |        |        | 
                  * |        |        | 
                  * +--------+--------+ 
                  * 
                  */  
                 x264_mb_analyse_inter_p16x8( h, &analysis, i_cost );  
                 COPY3_IF_LT( i_cost, analysis.l0.i_cost16x8, i_type, P_L0, i_partition, D_16x8 );  
   
                 i_avg_mv_ref_cost = (analysis.l0.me8x8[1].cost_mv + analysis.l0.me8x8[1].i_ref_cost  
                                   + analysis.l0.me8x8[3].cost_mv + analysis.l0.me8x8[3].i_ref_cost + 1) >> 1;  
                 analysis.i_cost_est8x16[1] = analysis.i_satd8x8[0][1] + analysis.i_satd8x8[0][3] + i_avg_mv_ref_cost;  
                 /* 
                  * 8x16 宏塊劃分 
                  * 
                  * +--------+ 
                  * |        | 
                  * |        | 
                  * |        | 
                  * +--------+ 
                  * |        | 
                  * |        | 
                  * |        | 
                  * +--------+ 
                  * 
                  */  
                 x264_mb_analyse_inter_p8x16( h, &analysis, i_cost );  
                 COPY3_IF_LT( i_cost, analysis.l0.i_cost8x16, i_type, P_L0, i_partition, D_8x16 );  
             }  
   
             h->mb.i_partition = i_partition;  
   
             /* refine qpel */  
             //亞像素精度搜索  
             //FIXME mb_type costs?  
             if( analysis.i_mbrd || !h->mb.i_subpel_refine )  
             {  
                 /* refine later */  
             }  
             else if( i_partition == D_16x16 )  
             {  
                 x264_me_refine_qpel( h, &analysis.l0.me16x16 );  
                 i_cost = analysis.l0.me16x16.cost;  
             }  
             else if( i_partition == D_16x8 )  
             {  
                 x264_me_refine_qpel( h, &analysis.l0.me16x8[0] );  
                 x264_me_refine_qpel( h, &analysis.l0.me16x8[1] );  
                 i_cost = analysis.l0.me16x8[0].cost + analysis.l0.me16x8[1].cost;  
             }  
             else if( i_partition == D_8x16 )  
             {  
                 x264_me_refine_qpel( h, &analysis.l0.me8x16[0] );  
                 x264_me_refine_qpel( h, &analysis.l0.me8x16[1] );  
                 i_cost = analysis.l0.me8x16[0].cost + analysis.l0.me8x16[1].cost;  
             }  
             else if( i_partition == D_8x8 )  
             {  
                 i_cost = 0;  
                 for( int i8x8 = 0; i8x8 < 4; i8x8++ )  
                 {  
                     switch( h->mb.i_sub_partition[i8x8] )  
                     {  
                         case D_L0_8x8:  
                             x264_me_refine_qpel( h, &analysis.l0.me8x8[i8x8] );  
                             i_cost += analysis.l0.me8x8[i8x8].cost;  
                             break;  
                         case D_L0_8x4:  
                             x264_me_refine_qpel( h, &analysis.l0.me8x4[i8x8][0] );  
                             x264_me_refine_qpel( h, &analysis.l0.me8x4[i8x8][1] );  
                             i_cost += analysis.l0.me8x4[i8x8][0].cost +  
                                       analysis.l0.me8x4[i8x8][1].cost;  
                             break;  
                         case D_L0_4x8:  
                             x264_me_refine_qpel( h, &analysis.l0.me4x8[i8x8][0] );  
                             x264_me_refine_qpel( h, &analysis.l0.me4x8[i8x8][1] );  
                             i_cost += analysis.l0.me4x8[i8x8][0].cost +  
                                       analysis.l0.me4x8[i8x8][1].cost;  
                             break;  
   
                         case D_L0_4x4:  
                             x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][0] );  
                             x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][1] );  
                             x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][2] );  
                             x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][3] );  
                             i_cost += analysis.l0.me4x4[i8x8][0].cost +  
                                       analysis.l0.me4x4[i8x8][1].cost +  
                                       analysis.l0.me4x4[i8x8][2].cost +  
                                       analysis.l0.me4x4[i8x8][3].cost;  
                             break;  
                         default:  
                             x264_log( h, X264_LOG_ERROR, "internal error (!8x8 && !4x4)\n" );  
                             break;  
                     }  
                 }  
             }  
   
             if( h->mb.b_chroma_me )  
             {  
                 if( CHROMA444 )  
                 {  
                     x264_mb_analyse_intra( h, &analysis, i_cost );  
                     x264_mb_analyse_intra_chroma( h, &analysis );  
                 }  
                 else  
                 {  
                     x264_mb_analyse_intra_chroma( h, &analysis );  
                     x264_mb_analyse_intra( h, &analysis, i_cost - analysis.i_satd_chroma );  
                 }  
                 analysis.i_satd_i16x16 += analysis.i_satd_chroma;  
                 analysis.i_satd_i8x8   += analysis.i_satd_chroma;  
                 analysis.i_satd_i4x4   += analysis.i_satd_chroma;  
             }  
             else  
                 x264_mb_analyse_intra( h, &analysis, i_cost );//P Slice中也允許有Intra宏塊，所以也要進行分析  
   
             i_satd_inter = i_cost;  
             i_satd_intra = X264_MIN3( analysis.i_satd_i16x16,  
                                       analysis.i_satd_i8x8,  
                                       analysis.i_satd_i4x4 );  
   
             if( analysis.i_mbrd )  
             {  
                 x264_mb_analyse_p_rd( h, &analysis, X264_MIN(i_satd_inter, i_satd_intra) );  
                 i_type = P_L0;  
                 i_partition = D_16x16;  
                 i_cost = analysis.l0.i_rd16x16;  
                 COPY2_IF_LT( i_cost, analysis.l0.i_cost16x8, i_partition, D_16x8 );  
                 COPY2_IF_LT( i_cost, analysis.l0.i_cost8x16, i_partition, D_8x16 );  
                 COPY3_IF_LT( i_cost, analysis.l0.i_cost8x8, i_partition, D_8x8, i_type, P_8x8 );  
                 h->mb.i_type = i_type;  
                 h->mb.i_partition = i_partition;  
                 if( i_cost < COST_MAX )  
                     x264_mb_analyse_transform_rd( h, &analysis, &i_satd_inter, &i_cost );  
                 x264_intra_rd( h, &analysis, i_satd_inter * 5/4 + 1 );  
             }  
             //獲取最小的代價  
             COPY2_IF_LT( i_cost, analysis.i_satd_i16x16, i_type, I_16x16 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, i_type, I_8x8 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, i_type, I_4x4 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_pcm, i_type, I_PCM );  
   
             h->mb.i_type = i_type;  
   
             if( analysis.b_force_intra && !IS_INTRA(i_type) )  
             {  
                 /* Intra masking: copy fdec to fenc and re-encode the block as intra in order to make it appear as if 
                  * it was an inter block. */  
                 x264_analyse_update_cache( h, &analysis );  
                 x264_macroblock_encode( h );  
                 for( int p = 0; p < (CHROMA444 ? 3 : 1); p++ )  
                     h->mc.copy[PIXEL_16x16]( h->mb.pic.p_fenc[p], FENC_STRIDE, h->mb.pic.p_fdec[p], FDEC_STRIDE, 16 );  
                 if( !CHROMA444 )  
                 {  
                     int height = 16 >> CHROMA_V_SHIFT;  
                     h->mc.copy[PIXEL_8x8]  ( h->mb.pic.p_fenc[1], FENC_STRIDE, h->mb.pic.p_fdec[1], FDEC_STRIDE, height );  
                     h->mc.copy[PIXEL_8x8]  ( h->mb.pic.p_fenc[2], FENC_STRIDE, h->mb.pic.p_fdec[2], FDEC_STRIDE, height );  
                 }  
                 x264_mb_analyse_init_qp( h, &analysis, X264_MAX( h->mb.i_qp - h->mb.ip_offset, h->param.rc.i_qp_min ) );  
                 goto intra_analysis;  
             }  
   
             if( analysis.i_mbrd >= 2 && h->mb.i_type != I_PCM )  
             {  
                 if( IS_INTRA( h->mb.i_type ) )  
                 {  
                     x264_intra_rd_refine( h, &analysis );  
                 }  
                 else if( i_partition == D_16x16 )  
                 {  
                     x264_macroblock_cache_ref( h, 0, 0, 4, 4, 0, analysis.l0.me16x16.i_ref );  
                     analysis.l0.me16x16.cost = i_cost;  
                     x264_me_refine_qpel_rd( h, &analysis.l0.me16x16, analysis.i_lambda2, 0, 0 );  
                 }  
                 else if( i_partition == D_16x8 )  
                 {  
                     h->mb.i_sub_partition[0] = h->mb.i_sub_partition[1] =  
                     h->mb.i_sub_partition[2] = h->mb.i_sub_partition[3] = D_L0_8x8;  
                     x264_macroblock_cache_ref( h, 0, 0, 4, 2, 0, analysis.l0.me16x8[0].i_ref );  
                     x264_macroblock_cache_ref( h, 0, 2, 4, 2, 0, analysis.l0.me16x8[1].i_ref );  
                     x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[0], analysis.i_lambda2, 0, 0 );  
                     x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[1], analysis.i_lambda2, 8, 0 );  
                 }  
                 else if( i_partition == D_8x16 )  
                 {  
                     h->mb.i_sub_partition[0] = h->mb.i_sub_partition[1] =  
                     h->mb.i_sub_partition[2] = h->mb.i_sub_partition[3] = D_L0_8x8;  
                     x264_macroblock_cache_ref( h, 0, 0, 2, 4, 0, analysis.l0.me8x16[0].i_ref );  
                     x264_macroblock_cache_ref( h, 2, 0, 2, 4, 0, analysis.l0.me8x16[1].i_ref );  
                     x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[0], analysis.i_lambda2, 0, 0 );  
                     x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[1], analysis.i_lambda2, 4, 0 );  
                 }  
                 else if( i_partition == D_8x8 )  
                 {  
                     x264_analyse_update_cache( h, &analysis );  
                     for( int i8x8 = 0; i8x8 < 4; i8x8++ )  
                     {  
                         if( h->mb.i_sub_partition[i8x8] == D_L0_8x8 )  
                         {  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me8x8[i8x8], analysis.i_lambda2, i8x8*4, 0 );  
                         }  
                         else if( h->mb.i_sub_partition[i8x8] == D_L0_8x4 )  
                         {  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me8x4[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me8x4[i8x8][1], analysis.i_lambda2, i8x8*4+2, 0 );  
                         }  
                         else if( h->mb.i_sub_partition[i8x8] == D_L0_4x8 )  
                         {  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x8[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x8[i8x8][1], analysis.i_lambda2, i8x8*4+1, 0 );  
                         }  
                         else if( h->mb.i_sub_partition[i8x8] == D_L0_4x4 )  
                         {  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][1], analysis.i_lambda2, i8x8*4+1, 0 );  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][2], analysis.i_lambda2, i8x8*4+2, 0 );  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][3], analysis.i_lambda2, i8x8*4+3, 0 );  
                         }  
                     }  
                 }  
             }  
         }  
     }  
     else if( h->sh.i_type == SLICE_TYPE_B )//B Slice的時候  
     {  
         int i_bskip_cost = COST_MAX;  
         int b_skip = 0;  
   
         if( analysis.i_mbrd )  
             x264_mb_init_fenc_cache( h, analysis.i_mbrd >= 2 );  
   
         h->mb.i_type = B_SKIP;  
         if( h->mb.b_direct_auto_write )  
         {  
             /* direct=auto heuristic: prefer whichever mode allows more Skip macroblocks */  
             for( int i = 0; i < 2; i++ )  
             {  
                 int b_changed = 1;  
                 h->sh.b_direct_spatial_mv_pred ^= 1;  
                 analysis.b_direct_available = x264_mb_predict_mv_direct16x16( h, i && analysis.b_direct_available ? &b_changed : NULL );  
                 if( analysis.b_direct_available )  
                 {  
                     if( b_changed )  
                     {  
                         x264_mb_mc( h );  
                         b_skip = x264_macroblock_probe_bskip( h );  
                     }  
                     h->stat.frame.i_direct_score[ h->sh.b_direct_spatial_mv_pred ] += b_skip;  
                 }  
                 else  
                     b_skip = 0;  
             }  
         }  
         else  
             analysis.b_direct_available = x264_mb_predict_mv_direct16x16( h, NULL );  
   
         analysis.b_try_skip = 0;  
         if( analysis.b_direct_available )  
         {  
             if( !h->mb.b_direct_auto_write )  
                 x264_mb_mc( h );  
             /* If the current macroblock is off the frame, just skip it. */  
             if( HAVE_INTERLACED && !MB_INTERLACED && h->mb.i_mb_y * 16 >= h->param.i_height )  
                 b_skip = 1;  
             else if( analysis.i_mbrd )  
             {  
                 i_bskip_cost = ssd_mb( h );  
                 /* 6 = minimum cavlc cost of a non-skipped MB */  
                 b_skip = h->mb.b_skip_mc = i_bskip_cost <= ((6 * analysis.i_lambda2 + 128) >> 8);  
             }  
             else if( !h->mb.b_direct_auto_write )  
             {  
                 /* Conditioning the probe on neighboring block types 
                  * doesn't seem to help speed or quality. */  
                 analysis.b_try_skip = x264_macroblock_probe_bskip( h );  
                 if( h->param.analyse.i_subpel_refine < 3 )  
                     b_skip = analysis.b_try_skip;  
             }  
             /* Set up MVs for future predictors */  
             if( b_skip )  
             {  
                 for( int i = 0; i < h->mb.pic.i_fref[0]; i++ )  
                     M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;  
                 for( int i = 0; i < h->mb.pic.i_fref[1]; i++ )  
                     M32( h->mb.mvr[1][i][h->mb.i_mb_xy] ) = 0;  
             }  
         }  
   
         if( !b_skip )  
         {  
             const unsigned int flags = h->param.analyse.inter;  
             int i_type;  
             int i_partition;  
             int i_satd_inter;  
             h->mb.b_skip_mc = 0;  
             h->mb.i_type = B_DIRECT;  
   
             x264_mb_analyse_load_costs( h, &analysis );  
   
             /* select best inter mode */  
             /* direct must be first */  
             if( analysis.b_direct_available )  
                 x264_mb_analyse_inter_direct( h, &analysis );  
             /* 
              * 16x16 幀間預測宏塊分析-B 
              * 
              * +--------+--------+ 
              * |                 | 
              * |                 | 
              * |                 | 
              * +        +        + 
              * |                 | 
              * |                 | 
              * |                 | 
              * +--------+--------+ 
              * 
              */  
             x264_mb_analyse_inter_b16x16( h, &analysis );  
   
             if( h->mb.i_type == B_SKIP )  
             {  
                 for( int i = 1; i < h->mb.pic.i_fref[0]; i++ )  
                     M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;  
                 for( int i = 1; i < h->mb.pic.i_fref[1]; i++ )  
                     M32( h->mb.mvr[1][i][h->mb.i_mb_xy] ) = 0;  
                 return;  
             }  
   
             i_type = B_L0_L0;  
             i_partition = D_16x16;  
             i_cost = analysis.l0.me16x16.cost;  
             COPY2_IF_LT( i_cost, analysis.l1.me16x16.cost, i_type, B_L1_L1 );  
             COPY2_IF_LT( i_cost, analysis.i_cost16x16bi, i_type, B_BI_BI );  
             COPY2_IF_LT( i_cost, analysis.i_cost16x16direct, i_type, B_DIRECT );  
   
             if( analysis.i_mbrd && analysis.b_early_terminate && analysis.i_cost16x16direct <= i_cost * 33/32 )  
             {  
                 x264_mb_analyse_b_rd( h, &analysis, i_cost );  
                 if( i_bskip_cost < analysis.i_rd16x16direct &&  
                     i_bskip_cost < analysis.i_rd16x16bi &&  
                     i_bskip_cost < analysis.l0.i_rd16x16 &&  
                     i_bskip_cost < analysis.l1.i_rd16x16 )  
                 {  
                     h->mb.i_type = B_SKIP;  
                     x264_analyse_update_cache( h, &analysis );  
                     return;  
                 }  
             }  
   
             if( flags & X264_ANALYSE_BSUB16x16 )  
             {  
   
                 /* 
                  * 8x8 幀間預測宏塊分析-B 
                  * +--------+ 
                  * |        | 
                  * |        | 
                  * |        | 
                  * +--------+ 
                  * 
                  */  
   
                 if( h->param.analyse.b_mixed_references )  
                     x264_mb_analyse_inter_b8x8_mixed_ref( h, &analysis );  
                 else  
                     x264_mb_analyse_inter_b8x8( h, &analysis );  
   
                 COPY3_IF_LT( i_cost, analysis.i_cost8x8bi, i_type, B_8x8, i_partition, D_8x8 );  
   
                 /* Try to estimate the cost of b16x8/b8x16 based on the satd scores of the b8x8 modes */  
                 int i_cost_est16x8bi_total = 0, i_cost_est8x16bi_total = 0;  
                 int i_mb_type, i_partition16x8[2], i_partition8x16[2];  
                 for( int i = 0; i < 2; i++ )  
                 {  
                     int avg_l0_mv_ref_cost, avg_l1_mv_ref_cost;  
                     int i_l0_satd, i_l1_satd, i_bi_satd, i_best_cost;  
                     // 16x8  
                     i_best_cost = COST_MAX;  
                     i_l0_satd = analysis.i_satd8x8[0][i*2] + analysis.i_satd8x8[0][i*2+1];  
                     i_l1_satd = analysis.i_satd8x8[1][i*2] + analysis.i_satd8x8[1][i*2+1];  
                     i_bi_satd = analysis.i_satd8x8[2][i*2] + analysis.i_satd8x8[2][i*2+1];  
                     avg_l0_mv_ref_cost = ( analysis.l0.me8x8[i*2].cost_mv + analysis.l0.me8x8[i*2].i_ref_cost  
                                          + analysis.l0.me8x8[i*2+1].cost_mv + analysis.l0.me8x8[i*2+1].i_ref_cost + 1 ) >> 1;  
                     avg_l1_mv_ref_cost = ( analysis.l1.me8x8[i*2].cost_mv + analysis.l1.me8x8[i*2].i_ref_cost  
                                          + analysis.l1.me8x8[i*2+1].cost_mv + analysis.l1.me8x8[i*2+1].i_ref_cost + 1 ) >> 1;  
                     COPY2_IF_LT( i_best_cost, i_l0_satd + avg_l0_mv_ref_cost, i_partition16x8[i], D_L0_8x8 );  
                     COPY2_IF_LT( i_best_cost, i_l1_satd + avg_l1_mv_ref_cost, i_partition16x8[i], D_L1_8x8 );  
                     COPY2_IF_LT( i_best_cost, i_bi_satd + avg_l0_mv_ref_cost + avg_l1_mv_ref_cost, i_partition16x8[i], D_BI_8x8 );  
                     analysis.i_cost_est16x8[i] = i_best_cost;  
   
                     // 8x16  
                     i_best_cost = COST_MAX;  
                     i_l0_satd = analysis.i_satd8x8[0][i] + analysis.i_satd8x8[0][i+2];  
                     i_l1_satd = analysis.i_satd8x8[1][i] + analysis.i_satd8x8[1][i+2];  
                     i_bi_satd = analysis.i_satd8x8[2][i] + analysis.i_satd8x8[2][i+2];  
                     avg_l0_mv_ref_cost = ( analysis.l0.me8x8[i].cost_mv + analysis.l0.me8x8[i].i_ref_cost  
                                          + analysis.l0.me8x8[i+2].cost_mv + analysis.l0.me8x8[i+2].i_ref_cost + 1 ) >> 1;  
                     avg_l1_mv_ref_cost = ( analysis.l1.me8x8[i].cost_mv + analysis.l1.me8x8[i].i_ref_cost  
                                          + analysis.l1.me8x8[i+2].cost_mv + analysis.l1.me8x8[i+2].i_ref_cost + 1 ) >> 1;  
                     COPY2_IF_LT( i_best_cost, i_l0_satd + avg_l0_mv_ref_cost, i_partition8x16[i], D_L0_8x8 );  
                     COPY2_IF_LT( i_best_cost, i_l1_satd + avg_l1_mv_ref_cost, i_partition8x16[i], D_L1_8x8 );  
                     COPY2_IF_LT( i_best_cost, i_bi_satd + avg_l0_mv_ref_cost + avg_l1_mv_ref_cost, i_partition8x16[i], D_BI_8x8 );  
                     analysis.i_cost_est8x16[i] = i_best_cost;  
                 }  
                 i_mb_type = B_L0_L0 + (i_partition16x8[0]>>2) * 3 + (i_partition16x8[1]>>2);  
                 analysis.i_cost_est16x8[1] += analysis.i_lambda * i_mb_b16x8_cost_table[i_mb_type];  
                 i_cost_est16x8bi_total = analysis.i_cost_est16x8[0] + analysis.i_cost_est16x8[1];  
                 i_mb_type = B_L0_L0 + (i_partition8x16[0]>>2) * 3 + (i_partition8x16[1]>>2);  
                 analysis.i_cost_est8x16[1] += analysis.i_lambda * i_mb_b16x8_cost_table[i_mb_type];  
                 i_cost_est8x16bi_total = analysis.i_cost_est8x16[0] + analysis.i_cost_est8x16[1];  
   
                 /* We can gain a little speed by checking the mode with the lowest estimated cost first */  
                 int try_16x8_first = i_cost_est16x8bi_total < i_cost_est8x16bi_total;  
                 if( try_16x8_first && (!analysis.b_early_terminate || i_cost_est16x8bi_total < i_cost) )  
                 {  
                     x264_mb_analyse_inter_b16x8( h, &analysis, i_cost );  
                     COPY3_IF_LT( i_cost, analysis.i_cost16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );  
                 }  
                 if( !analysis.b_early_terminate || i_cost_est8x16bi_total < i_cost )  
                 {  
                     x264_mb_analyse_inter_b8x16( h, &analysis, i_cost );  
                     COPY3_IF_LT( i_cost, analysis.i_cost8x16bi, i_type, analysis.i_mb_type8x16, i_partition, D_8x16 );  
                 }  
                 if( !try_16x8_first && (!analysis.b_early_terminate || i_cost_est16x8bi_total < i_cost) )  
                 {  
                     x264_mb_analyse_inter_b16x8( h, &analysis, i_cost );  
                     COPY3_IF_LT( i_cost, analysis.i_cost16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );  
                 }  
             }  
   
             if( analysis.i_mbrd || !h->mb.i_subpel_refine )  
             {  
                 /* refine later */  
             }  
             /* refine qpel */  
             else if( i_partition == D_16x16 )  
             {  
                 analysis.l0.me16x16.cost -= analysis.i_lambda * i_mb_b_cost_table[B_L0_L0];  
                 analysis.l1.me16x16.cost -= analysis.i_lambda * i_mb_b_cost_table[B_L1_L1];  
                 if( i_type == B_L0_L0 )  
                 {  
                     x264_me_refine_qpel( h, &analysis.l0.me16x16 );  
                     i_cost = analysis.l0.me16x16.cost  
                            + analysis.i_lambda * i_mb_b_cost_table[B_L0_L0];  
                 }  
                 else if( i_type == B_L1_L1 )  
                 {  
                     x264_me_refine_qpel( h, &analysis.l1.me16x16 );  
                     i_cost = analysis.l1.me16x16.cost  
                            + analysis.i_lambda * i_mb_b_cost_table[B_L1_L1];  
                 }  
                 else if( i_type == B_BI_BI )  
                 {  
                     x264_me_refine_qpel( h, &analysis.l0.bi16x16 );  
                     x264_me_refine_qpel( h, &analysis.l1.bi16x16 );  
                 }  
             }  
             else if( i_partition == D_16x8 )  
             {  
                 for( int i = 0; i < 2; i++ )  
                 {  
                     if( analysis.i_mb_partition16x8[i] != D_L1_8x8 )  
                         x264_me_refine_qpel( h, &analysis.l0.me16x8[i] );  
                     if( analysis.i_mb_partition16x8[i] != D_L0_8x8 )  
                         x264_me_refine_qpel( h, &analysis.l1.me16x8[i] );  
                 }  
             }  
             else if( i_partition == D_8x16 )  
             {  
                 for( int i = 0; i < 2; i++ )  
                 {  
                     if( analysis.i_mb_partition8x16[i] != D_L1_8x8 )  
                         x264_me_refine_qpel( h, &analysis.l0.me8x16[i] );  
                     if( analysis.i_mb_partition8x16[i] != D_L0_8x8 )  
                         x264_me_refine_qpel( h, &analysis.l1.me8x16[i] );  
                 }  
             }  
             else if( i_partition == D_8x8 )  
             {  
                 for( int i = 0; i < 4; i++ )  
                 {  
                     x264_me_t *m;  
                     int i_part_cost_old;  
                     int i_type_cost;  
                     int i_part_type = h->mb.i_sub_partition[i];  
                     int b_bidir = (i_part_type == D_BI_8x8);  
   
                     if( i_part_type == D_DIRECT_8x8 )  
                         continue;  
                     if( x264_mb_partition_listX_table[0][i_part_type] )  
                     {  
                         m = &analysis.l0.me8x8[i];  
                         i_part_cost_old = m->cost;  
                         i_type_cost = analysis.i_lambda * i_sub_mb_b_cost_table[D_L0_8x8];  
                         m->cost -= i_type_cost;  
                         x264_me_refine_qpel( h, m );  
                         if( !b_bidir )  
                             analysis.i_cost8x8bi += m->cost + i_type_cost - i_part_cost_old;  
                     }  
                     if( x264_mb_partition_listX_table[1][i_part_type] )  
                     {  
                         m = &analysis.l1.me8x8[i];  
                         i_part_cost_old = m->cost;  
                         i_type_cost = analysis.i_lambda * i_sub_mb_b_cost_table[D_L1_8x8];  
                         m->cost -= i_type_cost;  
                         x264_me_refine_qpel( h, m );  
                         if( !b_bidir )  
                             analysis.i_cost8x8bi += m->cost + i_type_cost - i_part_cost_old;  
                     }  
                     /* TODO: update mvp? */  
                 }  
             }  
   
             i_satd_inter = i_cost;  
   
             if( analysis.i_mbrd )  
             {  
                 x264_mb_analyse_b_rd( h, &analysis, i_satd_inter );  
                 i_type = B_SKIP;  
                 i_cost = i_bskip_cost;  
                 i_partition = D_16x16;  
                 COPY2_IF_LT( i_cost, analysis.l0.i_rd16x16, i_type, B_L0_L0 );  
                 COPY2_IF_LT( i_cost, analysis.l1.i_rd16x16, i_type, B_L1_L1 );  
                 COPY2_IF_LT( i_cost, analysis.i_rd16x16bi, i_type, B_BI_BI );  
                 COPY2_IF_LT( i_cost, analysis.i_rd16x16direct, i_type, B_DIRECT );  
                 COPY3_IF_LT( i_cost, analysis.i_rd16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );  
                 COPY3_IF_LT( i_cost, analysis.i_rd8x16bi, i_type, analysis.i_mb_type8x16, i_partition, D_8x16 );  
                 COPY3_IF_LT( i_cost, analysis.i_rd8x8bi, i_type, B_8x8, i_partition, D_8x8 );  
   
                 h->mb.i_type = i_type;  
                 h->mb.i_partition = i_partition;  
             }  
   
             if( h->mb.b_chroma_me )  
             {  
                 if( CHROMA444 )  
                 {  
                     x264_mb_analyse_intra( h, &analysis, i_satd_inter );  
                     x264_mb_analyse_intra_chroma( h, &analysis );  
                 }  
                 else  
                 {  
                     x264_mb_analyse_intra_chroma( h, &analysis );  
                     x264_mb_analyse_intra( h, &analysis, i_satd_inter - analysis.i_satd_chroma );  
                 }  
                 analysis.i_satd_i16x16 += analysis.i_satd_chroma;  
                 analysis.i_satd_i8x8   += analysis.i_satd_chroma;  
                 analysis.i_satd_i4x4   += analysis.i_satd_chroma;  
             }  
             else  
                 x264_mb_analyse_intra( h, &analysis, i_satd_inter );  
   
             if( analysis.i_mbrd )  
             {  
                 x264_mb_analyse_transform_rd( h, &analysis, &i_satd_inter, &i_cost );  
                 x264_intra_rd( h, &analysis, i_satd_inter * 17/16 + 1 );  
             }  
   
             COPY2_IF_LT( i_cost, analysis.i_satd_i16x16, i_type, I_16x16 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, i_type, I_8x8 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, i_type, I_4x4 );  
             COPY2_IF_LT( i_cost, analysis.i_satd_pcm, i_type, I_PCM );  
   
             h->mb.i_type = i_type;  
             h->mb.i_partition = i_partition;  
   
             if( analysis.i_mbrd >= 2 && IS_INTRA( i_type ) && i_type != I_PCM )  
                 x264_intra_rd_refine( h, &analysis );  
             if( h->mb.i_subpel_refine >= 5 )  
                 x264_refine_bidir( h, &analysis );  
   
             if( analysis.i_mbrd >= 2 && i_type > B_DIRECT && i_type < B_SKIP )  
             {  
                 int i_biweight;  
                 x264_analyse_update_cache( h, &analysis );  
   
                 if( i_partition == D_16x16 )  
                 {  
                     if( i_type == B_L0_L0 )  
                     {  
                         analysis.l0.me16x16.cost = i_cost;  
                         x264_me_refine_qpel_rd( h, &analysis.l0.me16x16, analysis.i_lambda2, 0, 0 );  
                     }  
                     else if( i_type == B_L1_L1 )  
                     {  
                         analysis.l1.me16x16.cost = i_cost;  
                         x264_me_refine_qpel_rd( h, &analysis.l1.me16x16, analysis.i_lambda2, 0, 1 );  
                     }  
                     else if( i_type == B_BI_BI )  
                     {  
                         i_biweight = h->mb.bipred_weight[analysis.l0.bi16x16.i_ref][analysis.l1.bi16x16.i_ref];  
                         x264_me_refine_bidir_rd( h, &analysis.l0.bi16x16, &analysis.l1.bi16x16, i_biweight, 0, analysis.i_lambda2 );  
                     }  
                 }  
                 else if( i_partition == D_16x8 )  
                 {  
                     for( int i = 0; i < 2; i++ )  
                     {  
                         h->mb.i_sub_partition[i*2] = h->mb.i_sub_partition[i*2+1] = analysis.i_mb_partition16x8[i];  
                         if( analysis.i_mb_partition16x8[i] == D_L0_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[i], analysis.i_lambda2, i*8, 0 );  
                         else if( analysis.i_mb_partition16x8[i] == D_L1_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l1.me16x8[i], analysis.i_lambda2, i*8, 1 );  
                         else if( analysis.i_mb_partition16x8[i] == D_BI_8x8 )  
                         {  
                             i_biweight = h->mb.bipred_weight[analysis.l0.me16x8[i].i_ref][analysis.l1.me16x8[i].i_ref];  
                             x264_me_refine_bidir_rd( h, &analysis.l0.me16x8[i], &analysis.l1.me16x8[i], i_biweight, i*2, analysis.i_lambda2 );  
                         }  
                     }  
                 }  
                 else if( i_partition == D_8x16 )  
                 {  
                     for( int i = 0; i < 2; i++ )  
                     {  
                         h->mb.i_sub_partition[i] = h->mb.i_sub_partition[i+2] = analysis.i_mb_partition8x16[i];  
                         if( analysis.i_mb_partition8x16[i] == D_L0_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[i], analysis.i_lambda2, i*4, 0 );  
                         else if( analysis.i_mb_partition8x16[i] == D_L1_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l1.me8x16[i], analysis.i_lambda2, i*4, 1 );  
                         else if( analysis.i_mb_partition8x16[i] == D_BI_8x8 )  
                         {  
                             i_biweight = h->mb.bipred_weight[analysis.l0.me8x16[i].i_ref][analysis.l1.me8x16[i].i_ref];  
                             x264_me_refine_bidir_rd( h, &analysis.l0.me8x16[i], &analysis.l1.me8x16[i], i_biweight, i, analysis.i_lambda2 );  
                         }  
                     }  
                 }  
                 else if( i_partition == D_8x8 )  
                 {  
                     for( int i = 0; i < 4; i++ )  
                     {  
                         if( h->mb.i_sub_partition[i] == D_L0_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l0.me8x8[i], analysis.i_lambda2, i*4, 0 );  
                         else if( h->mb.i_sub_partition[i] == D_L1_8x8 )  
                             x264_me_refine_qpel_rd( h, &analysis.l1.me8x8[i], analysis.i_lambda2, i*4, 1 );  
                         else if( h->mb.i_sub_partition[i] == D_BI_8x8 )  
                         {  
                             i_biweight = h->mb.bipred_weight[analysis.l0.me8x8[i].i_ref][analysis.l1.me8x8[i].i_ref];  
                             x264_me_refine_bidir_rd( h, &analysis.l0.me8x8[i], &analysis.l1.me8x8[i], i_biweight, i, analysis.i_lambda2 );  
                         }  
                     }  
                 }  
             }  
         }  
     }  
   
     x264_analyse_update_cache( h, &analysis );  
   
     /* In rare cases we can end up qpel-RDing our way back to a larger partition size 
      * without realizing it.  Check for this and account for it if necessary. */  
     if( analysis.i_mbrd >= 2 )  
     {  
         /* Don't bother with bipred or 8x8-and-below, the odds are incredibly low. */  
         static const uint8_t check_mv_lists[X264_MBTYPE_MAX] = {[P_L0]=1, [B_L0_L0]=1, [B_L1_L1]=2};  
         int list = check_mv_lists[h->mb.i_type] - 1;  
         if( list >= 0 && h->mb.i_partition != D_16x16 &&  
             M32( &h->mb.cache.mv[list][x264_scan8[0]] ) == M32( &h->mb.cache.mv[list][x264_scan8[12]] ) &&  
             h->mb.cache.ref[list][x264_scan8[0]] == h->mb.cache.ref[list][x264_scan8[12]] )  
                 h->mb.i_partition = D_16x16;  
     }  
   
     if( !analysis.i_mbrd )  
         x264_mb_analyse_transform( h );  
   
     if( analysis.i_mbrd == 3 && !IS_SKIP(h->mb.i_type) )  
         x264_mb_analyse_qp_rd( h, &analysis );  
   
     h->mb.b_trellis = h->param.analyse.i_trellis;  
     h->mb.b_noise_reduction = h->mb.b_noise_reduction || (!!h->param.analyse.i_noise_reduction && !IS_INTRA( h->mb.i_type ));  
   
     if( !IS_SKIP(h->mb.i_type) && h->mb.i_psy_trellis && h->param.analyse.i_trellis == 1 )  
         x264_psy_trellis_init( h, 0 );  
     if( h->mb.b_trellis == 1 || h->mb.b_noise_reduction )  
         h->mb.i_skip_intra = 0;  
 }  

儘管x264_macroblock_analyse()的源代碼比較長，但是它的邏輯比較清晰，如下所示：

（1）如果當前是I Slice，調用x264_mb_analyse_intra()進行Intra宏塊的幀內預測模式分析。
（2）如果當前是P Slice，則進行下面流程的分析：
a)調用x264_macroblock_probe_pskip()分析是否爲Skip宏塊，如果是的話則不再進行下面分析。

b)調用x264_mb_analyse_inter_p16x16()分析P16x16幀間預測的代價。

c)調用x264_mb_analyse_inter_p8x8()分析P8x8幀間預測的代價。

d)如果P8x8代價值小於P16x16，則依次對4個8x8的子宏塊分割進行判斷：

i.調用x264_mb_analyse_inter_p4x4()分析P4x4幀間預測的代價。

ii.如果P4x4代價值小於P8x8，則調用 x264_mb_analyse_inter_p8x4()和x264_mb_analyse_inter_p4x8()分析P8x4和P4x8幀間預測的代價。

e)如果P8x8代價值小於P16x16，調用x264_mb_analyse_inter_p16x8()和x264_mb_analyse_inter_p8x16()分析P16x8和P8x16幀間預測的代價。

f)此外還要調用x264_mb_analyse_intra()，檢查當前宏塊作爲Intra宏塊編碼的代價是否小於作爲P宏塊編碼的代價（P Slice中也允許有Intra宏塊）。
（3）如果當前是B Slice，則進行和P Slice類似的處理。

本文記錄這一流程中Intra宏塊的幀內預測模式分析函數x264_mb_analyse_intra()。

x264_mb_analyse_intra()

x264_mb_analyse_intra()用於對Intra宏塊進行幀內預測模式的分析。該函數的定義位於encoder\analyse.c，如下所示。

[cpp]view plain copy 
    
 //幀內預測分析-從16x16的SAD,4個8x8的SAD和，16個4x4SAD中選出最優方式  
 static void x264_mb_analyse_intra( x264_t *h, x264_mb_analysis_t *a, int i_satd_inter )  
 {  
     const unsigned int flags = h->sh.i_type == SLICE_TYPE_I ? h->param.analyse.intra : h->param.analyse.inter;  
     //計算  
     //p_fenc是編碼幀  
     pixel *p_src = h->mb.pic.p_fenc[0];  
     //p_fdec是重建幀  
     pixel *p_dst = h->mb.pic.p_fdec[0];  
   
     static const int8_t intra_analysis_shortcut[2][2][2][5] =  
     {  
         {{{I_PRED_4x4_HU, -1, -1, -1, -1},  
           {I_PRED_4x4_DDL, I_PRED_4x4_VL, -1, -1, -1}},  
          {{I_PRED_4x4_DDR, I_PRED_4x4_HD, I_PRED_4x4_HU, -1, -1},  
           {I_PRED_4x4_DDL, I_PRED_4x4_DDR, I_PRED_4x4_VR, I_PRED_4x4_VL, -1}}},  
         {{{I_PRED_4x4_HU, -1, -1, -1, -1},  
           {-1, -1, -1, -1, -1}},  
          {{I_PRED_4x4_DDR, I_PRED_4x4_HD, I_PRED_4x4_HU, -1, -1},  
           {I_PRED_4x4_DDR, I_PRED_4x4_VR, -1, -1, -1}}},  
     };  
   
     int idx;  
     int lambda = a->i_lambda;  
   
     /*---------------- Try all mode and calculate their score ---------------*/  
     /* Disabled i16x16 for AVC-Intra compat */  
     //幀內16x16  
     if( !h->param.i_avcintra_class )  
     {  
         //獲得可用的幀內預測模式-針對幀內16x16  
         /* 
          * 16x16塊 
          * 
          * +--------+--------+ 
          * |                 | 
          * |                 | 
          * |                 | 
          * +        +        + 
          * |                 | 
          * |                 | 
          * |                 | 
          * +--------+--------+ 
          * 
          */  
         //左側是否有可用數據？上方是否有可用數據？  
         const int8_t *predict_mode = predict_16x16_mode_available( h->mb.i_neighbour_intra );  
   
         /* Not heavily tuned */  
         static const uint8_t i16x16_thresh_lut[11] = { 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4 };  
         int i16x16_thresh = a->b_fast_intra ? (i16x16_thresh_lut[h->mb.i_subpel_refine]*i_satd_inter)>>1 : COST_MAX;  
   
         if( !h->mb.b_lossless && predict_mode[3] >= 0 )  
         {  
             h->pixf.intra_mbcmp_x3_16x16( p_src, p_dst, a->i_satd_i16x16_dir );  
             a->i_satd_i16x16_dir[0] += lambda * bs_size_ue(0);  
             a->i_satd_i16x16_dir[1] += lambda * bs_size_ue(1);  
             a->i_satd_i16x16_dir[2] += lambda * bs_size_ue(2);  
             COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[0], a->i_predict16x16, 0 );  
             COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[1], a->i_predict16x16, 1 );  
             COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[2], a->i_predict16x16, 2 );  
   
             /* Plane is expensive, so don't check it unless one of the previous modes was useful. */  
             if( a->i_satd_i16x16 <= i16x16_thresh )  
             {  
                 h->predict_16x16[I_PRED_16x16_P]( p_dst );  
                 a->i_satd_i16x16_dir[I_PRED_16x16_P] = h->pixf.mbcmp[PIXEL_16x16]( p_dst, FDEC_STRIDE, p_src, FENC_STRIDE );  
                 a->i_satd_i16x16_dir[I_PRED_16x16_P] += lambda * bs_size_ue(3);  
                 COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[I_PRED_16x16_P], a->i_predict16x16, 3 );  
             }  
         }  
         else  
         {  
             //遍歷所有的可用的Intra16x16幀內預測模式  
             //最多4種  
             for( ; *predict_mode >= 0; predict_mode++ )  
             {  
                 int i_satd;  
                 int i_mode = *predict_mode;  
   
                 //幀內預測彙編函數：根據左邊和上邊的像素計算出預測值  
                 /* 
                  * 幀內預測舉例 
                  * Vertical預測方式 
                  *    |X1 X2 ... X16 
                  *  --+--------------- 
                  *    |X1 X2 ... X16 
                  *    |X1 X2 ... X16 
                  *    |.. .. ... X16 
                  *    |X1 X2 ... X16 
                  * 
                  * Horizontal預測方式 
                  *    | 
                  *  --+--------------- 
                  *  X1| X1  X1 ...  X1 
                  *  X2| X2  X2 ...  X2 
                  *  ..| ..  .. ...  .. 
                  * X16|X16 X16 ... X16 
                  * 
                  * DC預測方式 
                  *    |X1 X2 ... X16 
                  *  --+--------------- 
                  * X17| 
                  * X18|     Y 
                  *  ..| 
                  * X32| 
                  * 
                  * Y=(X1+X2+X3+X4+...+X31+X32)/32 
                  * 
                  */  
                 if( h->mb.b_lossless )  
                     x264_predict_lossless_16x16( h, 0, i_mode );  
                 else  
                     h->predict_16x16[i_mode]( p_dst );//計算結果存儲在p_dst重建幀中  
   
                 //計算SAD或者是SATD（SATD(transformed)是經過Hadamard變換之後的SAD）  
                 //即編碼代價  
                 //數據位於p_dst和p_src  
                 i_satd = h->pixf.mbcmp[PIXEL_16x16]( p_dst, FDEC_STRIDE, p_src, FENC_STRIDE ) +  
                          lambda * bs_size_ue( x264_mb_pred_mode16x16_fix[i_mode] );  
   
                 //COPY2_IF_LT()函數的意思是「copy if little」。即如果值更小（代價更小），就拷貝。  
                 //宏定義展開後如下所示  
                 //if((i_satd)<(a->i_satd_i16x16))  
                 //{  
                 //    (a->i_satd_i16x16)=(i_satd);  
                 //    (a->i_predict16x16)=(i_mode);  
                 //}  
                 COPY2_IF_LT( a->i_satd_i16x16, i_satd, a->i_predict16x16, i_mode );  
                 //每種模式的代價都會存儲  
                 a->i_satd_i16x16_dir[i_mode] = i_satd;  
             }  
         }  
   
         if( h->sh.i_type == SLICE_TYPE_B )  
             /* cavlc mb type prefix */  
             a->i_satd_i16x16 += lambda * i_mb_b_cost_table[I_16x16];  
   
         if( a->i_satd_i16x16 > i16x16_thresh )  
             return;  
     }  
   
     uint16_t *cost_i4x4_mode = (uint16_t*)ALIGN((intptr_t)x264_cost_i4x4_mode,64) + a->i_qp*32 + 8;  
     /* 8x8 prediction selection */  
     //幀內8x8（沒研究過）  
     if( flags & X264_ANALYSE_I8x8 )  
     {  
         ALIGNED_ARRAY_32( pixel, edge,[36] );  
         x264_pixel_cmp_t sa8d = (h->pixf.mbcmp[0] == h->pixf.satd[0]) ? h->pixf.sa8d[PIXEL_8x8] : h->pixf.mbcmp[PIXEL_8x8];  
         int i_satd_thresh = a->i_mbrd ? COST_MAX : X264_MIN( i_satd_inter, a->i_satd_i16x16 );  
   
         // FIXME some bias like in i4x4?  
         int i_cost = lambda * 4; /* base predmode costs */  
         h->mb.i_cbp_luma = 0;  
   
         if( h->sh.i_type == SLICE_TYPE_B )  
             i_cost += lambda * i_mb_b_cost_table[I_8x8];  
   
         for( idx = 0;; idx++ )  
         {  
             int x = idx&1;  
             int y = idx>>1;  
             pixel *p_src_by = p_src + 8*x + 8*y*FENC_STRIDE;  
             pixel *p_dst_by = p_dst + 8*x + 8*y*FDEC_STRIDE;  
             int i_best = COST_MAX;  
             int i_pred_mode = x264_mb_predict_intra4x4_mode( h, 4*idx );  
   
             const int8_t *predict_mode = predict_8x8_mode_available( a->b_avoid_topright, h->mb.i_neighbour8[idx], idx );  
             h->predict_8x8_filter( p_dst_by, edge, h->mb.i_neighbour8[idx], ALL_NEIGHBORS );  
   
             if( h->pixf.intra_mbcmp_x9_8x8 && predict_mode[8] >= 0 )  
             {  
                 /* No shortcuts here. The SSSE3 implementation of intra_mbcmp_x9 is fast enough. */  
                 i_best = h->pixf.intra_mbcmp_x9_8x8( p_src_by, p_dst_by, edge, cost_i4x4_mode-i_pred_mode, a->i_satd_i8x8_dir[idx] );  
                 i_cost += i_best & 0xffff;  
                 i_best >>= 16;  
                 a->i_predict8x8[idx] = i_best;  
                 if( idx == 3 || i_cost > i_satd_thresh )  
                     break;  
                 x264_macroblock_cache_intra8x8_pred( h, 2*x, 2*y, i_best );  
             }  
             else  
             {  
                 if( !h->mb.b_lossless && predict_mode[5] >= 0 )  
                 {  
                     ALIGNED_ARRAY_16( int32_t, satd,[9] );  
                     h->pixf.intra_mbcmp_x3_8x8( p_src_by, edge, satd );  
                     int favor_vertical = satd[I_PRED_4x4_H] > satd[I_PRED_4x4_V];  
                     satd[i_pred_mode] -= 3 * lambda;  
                     for( int i = 2; i >= 0; i-- )  
                     {  
                         int cost = satd[i];  
                         a->i_satd_i8x8_dir[idx][i] = cost + 4 * lambda;  
                         COPY2_IF_LT( i_best, cost, a->i_predict8x8[idx], i );  
                     }  
   
                     /* Take analysis shortcuts: don't analyse modes that are too 
                      * far away direction-wise from the favored mode. */  
                     if( a->i_mbrd < 1 + a->b_fast_intra )  
                         predict_mode = intra_analysis_shortcut[a->b_avoid_topright][predict_mode[8] >= 0][favor_vertical];  
                     else  
                         predict_mode += 3;  
                 }  
   
                 for( ; *predict_mode >= 0 && (i_best >= 0 || a->i_mbrd >= 2); predict_mode++ )  
                 {  
                     int i_satd;  
                     int i_mode = *predict_mode;  
   
                     if( h->mb.b_lossless )  
                         x264_predict_lossless_8x8( h, p_dst_by, 0, idx, i_mode, edge );  
                     else  
                         h->predict_8x8[i_mode]( p_dst_by, edge );  
   
                     i_satd = sa8d( p_dst_by, FDEC_STRIDE, p_src_by, FENC_STRIDE );  
                     if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) )  
                         i_satd -= 3 * lambda;  
   
                     COPY2_IF_LT( i_best, i_satd, a->i_predict8x8[idx], i_mode );  
                     a->i_satd_i8x8_dir[idx][i_mode] = i_satd + 4 * lambda;  
                 }  
                 i_cost += i_best + 3*lambda;  
   
                 if( idx == 3 || i_cost > i_satd_thresh )  
                     break;  
                 if( h->mb.b_lossless )  
                     x264_predict_lossless_8x8( h, p_dst_by, 0, idx, a->i_predict8x8[idx], edge );  
                 else  
                     h->predict_8x8[a->i_predict8x8[idx]]( p_dst_by, edge );  
                 x264_macroblock_cache_intra8x8_pred( h, 2*x, 2*y, a->i_predict8x8[idx] );  
             }  
             /* we need to encode this block now (for next ones) */  
             x264_mb_encode_i8x8( h, 0, idx, a->i_qp, a->i_predict8x8[idx], edge, 0 );  
         }  
   
         if( idx == 3 )  
         {  
             a->i_satd_i8x8 = i_cost;  
             if( h->mb.i_skip_intra )  
             {  
                 h->mc.copy[PIXEL_16x16]( h->mb.pic.i8x8_fdec_buf, 16, p_dst, FDEC_STRIDE, 16 );  
                 h->mb.pic.i8x8_nnz_buf[0] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 0]] );  
                 h->mb.pic.i8x8_nnz_buf[1] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 2]] );  
                 h->mb.pic.i8x8_nnz_buf[2] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 8]] );  
                 h->mb.pic.i8x8_nnz_buf[3] = M32( &h->mb.cache.non_zero_count[x264_scan8[10]] );  
                 h->mb.pic.i8x8_cbp = h->mb.i_cbp_luma;  
                 if( h->mb.i_skip_intra == 2 )  
                     h->mc.memcpy_aligned( h->mb.pic.i8x8_dct_buf, h->dct.luma8x8, sizeof(h->mb.pic.i8x8_dct_buf) );  
             }  
         }  
         else  
         {  
             static const uint16_t cost_div_fix8[3] = {1024,512,341};  
             a->i_satd_i8x8 = COST_MAX;  
             i_cost = (i_cost * cost_div_fix8[idx]) >> 8;  
         }  
         /* Not heavily tuned */  
         static const uint8_t i8x8_thresh[11] = { 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6 };  
         if( a->b_early_terminate && X264_MIN(i_cost, a->i_satd_i16x16) > (i_satd_inter*i8x8_thresh[h->mb.i_subpel_refine])>>2 )  
             return;  
     }  
   
     /* 4x4 prediction selection */  
     //幀內4x4  
     if( flags & X264_ANALYSE_I4x4 )  
     {  
         /* 
          * 16x16 宏塊被劃分爲16個4x4子塊 
          * 
          * +----+----+----+----+ 
          * |    |    |    |    | 
          * +----+----+----+----+ 
          * |    |    |    |    | 
          * +----+----+----+----+ 
          * |    |    |    |    | 
          * +----+----+----+----+ 
          * |    |    |    |    | 
          * +----+----+----+----+ 
          * 
          */  
         int i_cost = lambda * (24+16); /* 24from JVT (SATD0), 16 from base predmode costs */  
         int i_satd_thresh = a->b_early_terminate ? X264_MIN3( i_satd_inter, a->i_satd_i16x16, a->i_satd_i8x8 ) : COST_MAX;  
         h->mb.i_cbp_luma = 0;  
   
         if( a->b_early_terminate && a->i_mbrd )  
             i_satd_thresh = i_satd_thresh * (10-a->b_fast_intra)/8;  
   
         if( h->sh.i_type == SLICE_TYPE_B )  
             i_cost += lambda * i_mb_b_cost_table[I_4x4];  
         //循環所有的4x4塊  
         for( idx = 0;; idx++ )  
         {  
             //編碼幀中的像素  
             //block_idx_xy_fenc[]記錄了4x4小塊在p_fenc中的偏移地址  
             pixel *p_src_by = p_src + block_idx_xy_fenc[idx];  
             //重建幀中的像素  
             //block_idx_xy_fdec[]記錄了4x4小塊在p_fdec中的偏移地址  
             pixel *p_dst_by = p_dst + block_idx_xy_fdec[idx];  
   
             int i_best = COST_MAX;  
             int i_pred_mode = x264_mb_predict_intra4x4_mode( h, idx );  
             //獲得可用的幀內預測模式-針對幀內4x4  
             //左側是否有可用數據？上方是否有可用數據？  
             const int8_t *predict_mode = predict_4x4_mode_available( a->b_avoid_topright, h->mb.i_neighbour4[idx], idx );  
   
             if( (h->mb.i_neighbour4[idx] & (MB_TOPRIGHT|MB_TOP)) == MB_TOP )  
                 /* emulate missing topright samples */  
                 MPIXEL_X4( &p_dst_by[4 - FDEC_STRIDE] ) = PIXEL_SPLAT_X4( p_dst_by[3 - FDEC_STRIDE] );  
   
             if( h->pixf.intra_mbcmp_x9_4x4 && predict_mode[8] >= 0 )  
             {  
                 /* No shortcuts here. The SSSE3 implementation of intra_mbcmp_x9 is fast enough. */  
                 i_best = h->pixf.intra_mbcmp_x9_4x4( p_src_by, p_dst_by, cost_i4x4_mode-i_pred_mode );  
                 i_cost += i_best & 0xffff;  
                 i_best >>= 16;  
                 a->i_predict4x4[idx] = i_best;  
                 if( i_cost > i_satd_thresh || idx == 15 )  
                     break;  
                 h->mb.cache.intra4x4_pred_mode[x264_scan8[idx]] = i_best;  
             }  
             else  
             {  
                 if( !h->mb.b_lossless && predict_mode[5] >= 0 )  
                 {  
                     ALIGNED_ARRAY_16( int32_t, satd,[9] );  
   
                     h->pixf.intra_mbcmp_x3_4x4( p_src_by, p_dst_by, satd );  
                     int favor_vertical = satd[I_PRED_4x4_H] > satd[I_PRED_4x4_V];  
                     satd[i_pred_mode] -= 3 * lambda;  
                     i_best = satd[I_PRED_4x4_DC]; a->i_predict4x4[idx] = I_PRED_4x4_DC;  
                     COPY2_IF_LT( i_best, satd[I_PRED_4x4_H], a->i_predict4x4[idx], I_PRED_4x4_H );  
                     COPY2_IF_LT( i_best, satd[I_PRED_4x4_V], a->i_predict4x4[idx], I_PRED_4x4_V );  
   
                     /* Take analysis shortcuts: don't analyse modes that are too 
                      * far away direction-wise from the favored mode. */  
                     if( a->i_mbrd < 1 + a->b_fast_intra )  
                         predict_mode = intra_analysis_shortcut[a->b_avoid_topright][predict_mode[8] >= 0][favor_vertical];  
                     else  
                         predict_mode += 3;  
                 }  
   
                 if( i_best > 0 )  
                 {  
                     //遍歷所有Intra4x4幀內模式，最多9種  
                     for( ; *predict_mode >= 0; predict_mode++ )  
                     {  
                         int i_satd;  
                         int i_mode = *predict_mode;  
                         /* 
                          * 4x4幀內預測舉例 
                          * 
                          * Vertical預測方式 
                          *   |X1 X2 X3 X4 
                          * --+----------- 
                          *   |X1 X2 X3 X4 
                          *   |X1 X2 X3 X4 
                          *   |X1 X2 X3 X4 
                          *   |X1 X2 X3 X4 
                          * 
                          * Horizontal預測方式 
                          *   | 
                          * --+----------- 
                          * X5|X5 X5 X5 X5 
                          * X6|X6 X6 X6 X6 
                          * X7|X7 X7 X7 X7 
                          * X8|X8 X8 X8 X8 
                          * 
                          * DC預測方式 
                          *   |X1 X2 X3 X4 
                          * --+----------- 
                          * X5| 
                          * X6|     Y 
                          * X7| 
                          * X8| 
                          * 
                          * Y=(X1+X2+X3+X4+X5+X6+X7+X8)/8 
                          * 
                          */  
                         if( h->mb.b_lossless )  
                             x264_predict_lossless_4x4( h, p_dst_by, 0, idx, i_mode );  
                         else  
                             h->predict_4x4[i_mode]( p_dst_by );//幀內預測彙編函數-存儲在重建幀中  
   
                         //計算SAD或者是SATD（SATD（Transformed）是經過Hadamard變換之後的SAD）  
                         //即編碼代價  
                         //p_src_by編碼幀，p_dst_by重建幀  
                         i_satd = h->pixf.mbcmp[PIXEL_4x4]( p_dst_by, FDEC_STRIDE, p_src_by, FENC_STRIDE );  
                         if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) )  
                         {  
                             i_satd -= lambda * 3;  
                             if( i_satd <= 0 )  
                             {  
                                 i_best = i_satd;  
                                 a->i_predict4x4[idx] = i_mode;  
                                 break;  
                             }  
                         }  
                         //COPY2_IF_LT()函數的意思是「copy if little」。即如果值更小（代價更小），就拷貝。  
                         //宏定義展開後如下所示  
                         //if((i_satd)<(i_best))  
                         //{  
                         //    (i_best)=(i_satd);  
                         //    (a->i_predict4x4[idx])=(i_mode);  
                         //}  
   
                         //看看代價是否更小  
                         //i_best中存儲了最小的代價值  
                         //i_predict4x4[idx]中存儲了代價最小的預測模式（idx爲4x4小塊的序號）  
                         COPY2_IF_LT( i_best, i_satd, a->i_predict4x4[idx], i_mode );  
                     }  
                 }  
                 //累加各個4x4塊的代價（累加每個塊的最小代價）  
                 i_cost += i_best + 3 * lambda;  
                 if( i_cost > i_satd_thresh || idx == 15 )  
                     break;  
                 if( h->mb.b_lossless )  
                     x264_predict_lossless_4x4( h, p_dst_by, 0, idx, a->i_predict4x4[idx] );  
                 else  
                     h->predict_4x4[a->i_predict4x4[idx]]( p_dst_by );  
   
                 /* 
                  * 將mode填充至intra4x4_pred_mode_cache 
                  * 
                  * 用簡單圖形表示intra4x4_pred_mode_cache如下。數字代表填充順序（一共填充16次） 
                  *   | 
                  * --+------------------- 
                  *   | 0 0 0 0  0  0  0  0 
                  *   | 0 0 0 0  1  2  5  6 
                  *   | 0 0 0 0  3  4  7  8 
                  *   | 0 0 0 0  9 10 13 14 
                  *   | 0 0 0 0 11 12 15 16 
                  * 
                  */  
                 h->mb.cache.intra4x4_pred_mode[x264_scan8[idx]] = a->i_predict4x4[idx];  
             }  
             /* we need to encode this block now (for next ones) */  
             x264_mb_encode_i4x4( h, 0, idx, a->i_qp, a->i_predict4x4[idx], 0 );  
         }  
         if( idx == 15 )//處理最後一個4x4小塊（一共16個塊）  
         {  
             //開銷（累加完的）  
             a->i_satd_i4x4 = i_cost;  
             if( h->mb.i_skip_intra )  
             {  
                 h->mc.copy[PIXEL_16x16]( h->mb.pic.i4x4_fdec_buf, 16, p_dst, FDEC_STRIDE, 16 );  
                 h->mb.pic.i4x4_nnz_buf[0] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 0]] );  
                 h->mb.pic.i4x4_nnz_buf[1] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 2]] );  
                 h->mb.pic.i4x4_nnz_buf[2] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 8]] );  
                 h->mb.pic.i4x4_nnz_buf[3] = M32( &h->mb.cache.non_zero_count[x264_scan8[10]] );  
                 h->mb.pic.i4x4_cbp = h->mb.i_cbp_luma;  
                 if( h->mb.i_skip_intra == 2 )  
                     h->mc.memcpy_aligned( h->mb.pic.i4x4_dct_buf, h->dct.luma4x4, sizeof(h->mb.pic.i4x4_dct_buf) );  
             }  
         }  
         else  
             a->i_satd_i4x4 = COST_MAX;  
     }  
 }  

總體說來x264_mb_analyse_intra()通過計算Intra16x16，Intra8x8（暫時沒有研究），Intra4x4這3中幀內預測模式的代價，比較後得到最佳的幀內預測模式。該函數的等流程大致如下：

（1）進行Intra16X16模式的預測
a)調用predict_16x16_mode_available()根據周圍宏塊的情況判斷其可用的預測模式（主要檢查左邊和上邊的塊是否可用）。

b)循環計算4種Intra16x16幀內預測模式：

i.調用predict_16x16[]()彙編函數進行Intra16x16幀內預測

ii.調用x264_pixel_function_t中的mbcmp[]()計算編碼代價（mbcmp[]()指向SAD或者SATD彙編函數）。

c)獲取最小代價的Intra16x16模式。
（2）進行Intra8x8模式的預測（未研究，流程應該類似）
（3）進行Intra4X4塊模式的預測
a)循環處理16個4x4的塊：

i.調用x264_mb_predict_intra4x4_mode()根據周圍宏塊情況判斷該塊可用的預測模式。

ii.循環計算9種Intra4x4的幀內預測模式：

1)調用predict_4x4 []()彙編函數進行Intra4x4幀內預測

2)調用x264_pixel_function_t中的mbcmp[]()計算編碼代價（mbcmp[]()指向SAD或者SATD彙編函數）。

iii.獲取最小代價的Intra4x4模式。

b)將16個4X4塊的最小代價相加，得到總代價。

（4）將上述3中模式的代價進行對比，取最小者爲當前宏塊的幀內預測模式。

後文將會對其中涉及到的幾種彙編函數進行分析。在看源代碼之前，簡單記錄一下相關的知識。

幀內預測知識

簡單記錄一下幀內預測的方法。幀內預測根據宏塊左邊和上邊的邊界像素值推算宏塊內部的像素值，幀內預測的效果如下圖所示。其中左邊的圖爲圖像原始畫面，右邊的圖爲經過幀內預測後沒有疊加殘差的畫面。

H.264中有兩種幀內預測模式：16x16亮度幀內預測模式和4x4亮度幀內預測模式。其中16x16幀內預測模式一共有4種，如下圖所示。

這4種模式列表如下。

模式	描述
Vertical	由上邊像素推出相應像素值
Horizontal	由左邊像素推出相應像素值
DC	由上邊和左邊像素平均值推出相應像素值
Plane	由上邊和左邊像素推出相應像素值

4x4幀內預測模式一共有9種，如下圖所示。

可以看出，Intra4x4幀內預測模式中前4種和Intra16x16是一樣的。後面多增加了幾種預測箭頭不是45度角的方式——前面的箭頭位於「口」中，而後面的箭頭位於「日」中。

像素比較知識

幀內預測代價計算的過程中涉及到SAD和SATD像素計算，簡單記錄幾個相關的概念。有關SAD、SATD、SSD的定義如下：

SAD（Sum of Absolute Difference）也可以稱爲SAE（Sum of Absolute Error），即絕對誤差和。它的計算方法就是求出兩個像素塊對應像素點的差值，將這些差值分別求絕對值之後再進行累加。
SATD（Sum of Absolute Transformed Difference）即Hadamard變換後再絕對值求和。它和SAD的區別在於多了一個「變換」。
SSD（Sum of Squared Difference）也可以稱爲SSE（Sum of Squared Error），即差值的平方和。它和SAD的區別在於多了一個「平方」。

H.264中使用SAD和SATD進行宏塊預測模式的判斷。早期的編碼器使用SAD進行計算，近期的編碼器多使用SATD進行計算。爲什麼使用SATD而不使用SAD呢？關鍵原因在於編碼之後碼流的大小是和圖像塊DCT變換後頻域信息緊密相關的，而和變換前的時域信息關聯性小一些。SAD只能反應時域信息；SATD卻可以反映頻域信息，而且計算複雜度也低於DCT變換，因此是比較合適的模式選擇的依據。
使用SAD進行模式選擇的示例如下所示。下面這張圖代表了一個普通的Intra16x16的宏塊的像素。它的下方包含了使用Vertical，Horizontal，DC和Plane四種幀內預測模式預測的像素。通過計算可以得到這幾種預測像素和原始像素之間的SAD（SAE）分別爲3985，5097，4991，2539。由於Plane模式的SAD取值最小，由此可以斷定Plane模式對於這個宏塊來說是最好的幀內預測模式。

下面按照Intra16x16預測，Intra4x4預測，像素計算的順序記錄依次記錄各個模塊的彙編函數源代碼。

Intra16x16幀內預測源代碼

Intra16x16幀內預測模塊的初始化函數是x264_predict_16x16_init()。該函數對x264_predict_t結構體中的函數指針進行了賦值。X264運行的過程中只要調用x264_predict_t的函數指針就可以完成相應的功能。

x264_predict_16x16_init()

x264_predict_16x16_init()用於初始化Intra16x16幀內預測彙編函數。該函數的定義位於x264\common\predict.c，如下所示。

[cpp]view plain copy 
    
 //Intra16x16幀內預測彙編函數初始化  
 void x264_predict_16x16_init( int cpu, x264_predict_t pf[7] )  
 {  
     //C語言版本  
     //================================================  
     //垂直 Vertical  
     pf[I_PRED_16x16_V ]     = x264_predict_16x16_v_c;  
     //水平 Horizontal  
     pf[I_PRED_16x16_H ]     = x264_predict_16x16_h_c;  
     //DC  
     pf[I_PRED_16x16_DC]     = x264_predict_16x16_dc_c;  
     //Plane  
     pf[I_PRED_16x16_P ]     = x264_predict_16x16_p_c;  
     //這幾種是啥？  
     pf[I_PRED_16x16_DC_LEFT]= x264_predict_16x16_dc_left_c;  
     pf[I_PRED_16x16_DC_TOP ]= x264_predict_16x16_dc_top_c;  
     pf[I_PRED_16x16_DC_128 ]= x264_predict_16x16_dc_128_c;  
     //================================================  
     //MMX版本  
 #if HAVE_MMX  
     x264_predict_16x16_init_mmx( cpu, pf );  
 #endif  
     //ALTIVEC版本  
 #if HAVE_ALTIVEC  
     if( cpu&X264_CPU_ALTIVEC )  
         x264_predict_16x16_init_altivec( pf );  
 #endif  
     //ARMV6版本  
 #if HAVE_ARMV6  
     x264_predict_16x16_init_arm( cpu, pf );  
 #endif  
     //AARCH64版本  
 #if ARCH_AARCH64  
     x264_predict_16x16_init_aarch64( cpu, pf );  
 #endif  
 }  

從源代碼可看出，x264_predict_16x16_init()首先對幀內預測函數指針數組x264_predict_t[]中的元素賦值了C語言版本的函數x264_predict_16x16_v_c()，x264_predict_16x16_h_c()，x264_predict_16x16_dc_c()，x264_predict_16x16_p_c()；然後會判斷系統平臺的特性，如果平臺支持的話，會調用x264_predict_16x16_init_mmx()，x264_predict_16x16_init_arm()等給x264_predict_t[]中的元素賦值經過彙編優化的函數。下文首先看一下Intra16x16中的4種幀內預測模式的C語言版本，作爲對比再看一下Intra16x16中Vertical模式的X86彙編版本和NEON彙編版本。

x264_predict_16x16_v_c()

x264_predict_16x16_v_c()是Intra16x16幀內預測Vertical模式的C語言版本函數。該函數的定義位於common\predict.c，如下所示。

[cpp]view plain copy 
    
 //16x16幀內預測  
 //垂直預測（Vertical）  
 void x264_predict_16x16_v_c( pixel *src )  
 {  
     /* 
      * Vertical預測方式 
      *   |X1 X2 X3 X4 
      * --+----------- 
      *   |X1 X2 X3 X4 
      *   |X1 X2 X3 X4 
      *   |X1 X2 X3 X4 
      *   |X1 X2 X3 X4 
      * 
      */  
     /* 
      * 【展開宏定義】 
      * uint32_t v0 = ((x264_union32_t*)(&src[ 0-FDEC_STRIDE]))->i; 
      * uint32_t v1 = ((x264_union32_t*)(&src[ 4-FDEC_STRIDE]))->i; 
      * uint32_t v2 = ((x264_union32_t*)(&src[ 8-FDEC_STRIDE]))->i; 
      * uint32_t v3 = ((x264_union32_t*)(&src[12-FDEC_STRIDE]))->i; 
      * 在這裏，上述代碼實際上相當於： 
      * uint32_t v0 = *((uint32_t*)(&src[ 0-FDEC_STRIDE])); 
      * uint32_t v1 = *((uint32_t*)(&src[ 4-FDEC_STRIDE])); 
      * uint32_t v2 = *((uint32_t*)(&src[ 8-FDEC_STRIDE])); 
      * uint32_t v3 = *((uint32_t*)(&src[12-FDEC_STRIDE])); 
      * 即分成4次，每次取出4個像素（一共16個像素），分別賦值給v0，v1，v2，v3 
      * 取出的值源自於16x16塊上面的一行像素 
      *    0|          4          8          12         16 
      *    ||    v0    |    v1    |    v2    |    v3    | 
      * ---++==========+==========+==========+==========+ 
      *    || 
      *    || 
      *    || 
      *    || 
      *    || 
      *    || 
      * 
      */  
     //pixel4實際上是uint32_t（佔用32bit），存儲4個像素的值（每個像素佔用8bit）  
   
     pixel4 v0 = MPIXEL_X4( &src[ 0-FDEC_STRIDE] );  
     pixel4 v1 = MPIXEL_X4( &src[ 4-FDEC_STRIDE] );  
     pixel4 v2 = MPIXEL_X4( &src[ 8-FDEC_STRIDE] );  
     pixel4 v3 = MPIXEL_X4( &src[12-FDEC_STRIDE] );  
   
     //循環賦值16行  
     for( int i = 0; i < 16; i++ )  
     {  
         //【展開宏定義】  
         //(((x264_union32_t*)(src+ 0))->i) = v0;  
         //(((x264_union32_t*)(src+ 4))->i) = v1;  
         //(((x264_union32_t*)(src+ 8))->i) = v2;  
         //(((x264_union32_t*)(src+12))->i) = v3;  
         //即分成4次，每次賦值4個像素  
         //  
         MPIXEL_X4( src+ 0 ) = v0;  
         MPIXEL_X4( src+ 4 ) = v1;  
         MPIXEL_X4( src+ 8 ) = v2;  
         MPIXEL_X4( src+12 ) = v3;  
         //下一行  
         //FDEC_STRIDE=32,是重建宏塊緩存fdec_buf一行的數據量  
         src += FDEC_STRIDE;  
     }  
 }  

從源代碼可以看出，x264_predict_16x16_v_c()首先取出16x16塊上面一行像素值，依次存儲在v0、v1、v2、v3，然後循環16次賦值給塊中的16行像素。

x264_predict_16x16_h_c()

x264_predict_16x16_h_c()是Intra16x16幀內預測Horizontal模式的C語言版本函數。該函數的定義位於common\predict.c，如下所示。

[cpp]view plain copy 
    
 //16x16幀內預測  
 //水平預測（Horizontal）  
 void x264_predict_16x16_h_c( pixel *src )  
 {  
     /* 
      * Horizontal預測方式 
      *   | 
      * --+----------- 
      * X5|X5 X5 X5 X5 
      * X6|X6 X6 X6 X6 
      * X7|X7 X7 X7 X7 
      * X8|X8 X8 X8 X8 
      * 
      */  
     /* 
      * const pixel4 v = PIXEL_SPLAT_X4( src[-1] ); 
      * 宏定義展開後 
      * const uint32_t v = (src[-1])*0x01010101U; 
      * 
      * PIXEL_SPLAT_X4()的作用應該是把最後一個像素（最後8位）拷貝給前面3個像素（前24位） 
      * 即把0x0100009F變成0x9F9F9F9F 
      * 推導： 
      * 前提是x佔8bit（對應1個像素） 
      * y=x*0x01010101 
      *  =x*(0x00000001+0x00000100+0x00010000+0x01000000) 
      *  =x<<0+x<<8+x<<16+x<<24 
      * 
      * const uint32_t v = (src[-1])*0x01010101U含義： 
      * 每行把src[-1]中像素值例如0x02賦值給v.v取值爲0x02020202 
      * src[-1]即16x16塊左側的值 
      */  
     //循環賦值16行  
     for( int i = 0; i < 16; i++ )  
     {  
         const pixel4 v = PIXEL_SPLAT_X4( src[-1] );  
         //宏定義展開後：  
         //((x264_union32_t*)(src+ 0))->i=v;  
         //((x264_union32_t*)(src+ 4))->i=v;  
         //((x264_union32_t*)(src+ 8))->i=v;  
         //((x264_union32_t*)(src+12))->i=v;  
         //即分4次，每次賦值4個像素（一行一共16個像素，取值是一樣的）  
         //  
         //   0|          4          8         12         16  
         //   ||          |          |          |          |  
         //---++==========+==========+==========+==========+  
         //   ||  
         // v ||    v     |    v     |    v     |    v     |  
         //   ||  
         //   ||  
         //   ||  
         //  
         MPIXEL_X4( src+ 0 ) = v;  
         MPIXEL_X4( src+ 4 ) = v;  
         MPIXEL_X4( src+ 8 ) = v;  
         MPIXEL_X4( src+12 ) = v;  
         //下一行  
         //FDEC_STRIDE=32,是重建宏塊緩存fdec_buf一行的數據量  
         src += FDEC_STRIDE;  
     }  
 }  

從源代碼可以看出，x264_predict_16x16_h_c()首先取出16x16塊每行左邊的1個像素，複製4份後存儲在v中，然後分成4次將v賦值給這一行像素。其中「PIXEL_SPLAT_X4()」的功能是取出變量低8位的數值複製4份到高24位，相關的推導功能已經記錄在源代碼中，不再重複敘述。

x264_predict_16x16_dc_c()

x264_predict_16x16_dc_c()是Intra16x16幀內預測DC模式的C語言版本函數。該函數的定義位於common\predict.c，如下所示。

[cpp]view plain copy 
    
 #define PREDICT_16x16_DC(v)\  
     for( int i = 0; i < 16; i++ )\  
     {\  
         MPIXEL_X4( src+ 0 ) = v;\  
         MPIXEL_X4( src+ 4 ) = v;\  
         MPIXEL_X4( src+ 8 ) = v;\  
         MPIXEL_X4( src+12 ) = v;\  
         src += FDEC_STRIDE;\  
     }  
   
 void x264_predict_16x16_dc_c( pixel *src )  
 {  
     /* 
      * DC預測方式 
      *   |X1 X2 X3 X4 
      * --+----------- 
      * X5| 
      * X6|     Y 
      * X7| 
      * X8| 
      * 
      * Y=(X1+X2+X3+X4+X5+X6+X7+X8)/8 
      */  
   
     int dc = 0;  
     //把16x16塊中所有像素的值加起來，存儲在dc中  
     for( int i = 0; i < 16; i++ )  
     {  
         //左側的值  
         dc += src[-1 + i * FDEC_STRIDE];  
         //上方的值  
         dc += src[i - FDEC_STRIDE];  
     }  
     //加起來的值除以32（一共16+16個點）  
     //「+16」是爲了四捨五入？  
     //PIXEL_SPLAT_X4()的作用應該是把最後一個像素（最後8位）拷貝給前面3個像素（前24位）  
     //即把0x0100009F變成0x9F9F9F9F  
     pixel4 dcsplat = PIXEL_SPLAT_X4( ( dc + 16 ) >> 5 );  
     //賦值到16x16塊中的每個像素  
     /* 
      * 宏展開之後結果 
      * for( int i = 0; i < 16; i++ ) 
      * { 
      *  (((x264_union32_t*)(src+ 0))->i) = dcsplat; 
      *  (((x264_union32_t*)(src+ 4))->i) = dcsplat; 
      *  (((x264_union32_t*)(src+ 8))->i) = dcsplat; 
      *  (((x264_union32_t*)(src+12))->i) = dcsplat; 
      *  src += 32; 
      * } 
      */

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。