本文轉自:http://blog.csdn.net/leixiaohua1020/article/details/45870269 歡迎訪問原處!
本文記錄x264的 x264_slice_write()函數中調用的x264_macroblock_analyse()的源代碼。x264_macroblock_analyse()對應着x264中的分析模塊。分析模塊主要完成了下面2個方面的功能:
(1)對於幀內宏塊,分析幀內預測模式
(2)對於幀間宏塊,進行運動估計,分析幀間預測模式
由於分析模塊比較複雜,因此分成兩篇文章記錄其中的源代碼:本文記錄幀內宏塊預測模式的分析,下一篇文章記錄幀間宏塊預測模式的分析。
函數調用關係圖
宏塊分析(Analysis)部分的源代碼在整個x264中的位置如下圖所示。
宏塊分析(Analysis)部分的函數調用關係如下圖所示。
從圖中可以看出,分析模塊的x264_macroblock_analyse()調用瞭如下函數(只列舉了幾個有代表性的函數):
x264_mb_analyse_init():Analysis模塊初始化。
x264_mb_analyse_intra():Intra宏塊幀內預測模式分析。
x264_macroblock_probe_pskip():分析是否是skip模式。
x264_mb_analyse_inter_p16x16():P16x16宏塊幀間預測模式分析。
x264_mb_analyse_inter_p8x8():P8x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_p16x8():P16x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_b16x16():B16x16宏塊幀間預測模式分析。
x264_mb_analyse_inter_b8x8():B8x8宏塊幀間預測模式分析。
x264_mb_analyse_inter_b16x8():B16x8宏塊幀間預測模式分析。
本文重點分析其中幀內宏塊(Intra宏塊)的分析函數x264_mb_analyse_intra()。下一篇文章再對x264_mb_analyse_inter_p16x16()等一系列幀間宏塊的分析函數。
x264_slice_write()
x264_slice_write()是x264項目的核心,它完成了編碼了一個Slice的工作。有關該函數的分析可以參考文章《
x264源代碼簡單分析:x264_slice_write()
》。本文分析其調用的x264_mb_analyse()函數。
x264_macroblock_analyse()
x264_macroblock_analyse()用於分析宏塊的預測模式。該函數的定義位於encoder\analyse.c,如下所示。
- /****************************************************************************
- * 分析-幀內預測模式選擇、幀間運動估計等
- *
- * 註釋和處理:雷霄驊
- * http://blog.csdn.net/leixiaohua1020
- * [email protected]
- ****************************************************************************/
- void x264_macroblock_analyse( x264_t *h )
- {
- x264_mb_analysis_t analysis;
- int i_cost = COST_MAX;
- //通過碼率控制方法,獲取本宏塊QP
- h->mb.i_qp = x264_ratecontrol_mb_qp( h );
- /* If the QP of this MB is within 1 of the previous MB, code the same QP as the previous MB,
- * to lower the bit cost of the qp_delta. Don't do this if QPRD is enabled. */
- if( h->param.rc.i_aq_mode && h->param.analyse.i_subpel_refine < 10 )
- h->mb.i_qp = abs(h->mb.i_qp - h->mb.i_last_qp) == 1 ? h->mb.i_last_qp : h->mb.i_qp;
-
- if( h->param.analyse.b_mb_info )
- h->fdec->effective_qp[h->mb.i_mb_xy] = h->mb.i_qp; /* Store the real analysis QP. */
- //初始化
- x264_mb_analyse_init( h, &analysis, h->mb.i_qp );
-
- /*--------------------------- Do the analysis ---------------------------*/
- //I幀:只使用幀內預測,分別計算亮度16x16(4種)和4x4(9種)所有模式的代價值,選出代價最小的模式
-
- //P幀:計算幀內模式和幀間模式( P Slice允許有Intra宏塊和P宏塊;同理B幀也支持Intra宏塊)。
- //對P幀的每一種分割進行幀間預測,得到最佳的運動矢量及最佳匹配塊。
- //幀間預測過程:選出最佳矢量——>找到最佳的整像素點——>找到最佳的二分之一像素點——>找到最佳的1/4像素點
- //然後取代價最小的爲最佳MV和分割方式
- //最後從幀內模式和幀間模式中選擇代價比較小的方式(有可能沒有找到很好的匹配塊,這時候就直接使用幀內預測而不是幀間預測)。
-
- if( h->sh.i_type == SLICE_TYPE_I )
- {
- //I slice
- //通過一系列幀內預測模式(16x16的4種,4x4的9種)代價的計算得出代價最小的最優模式
- intra_analysis:
- if( analysis.i_mbrd )
- x264_mb_init_fenc_cache( h, analysis.i_mbrd >= 2 );
- //幀內預測分析
- //從16×16的SAD,4個8×8的SAD和,16個4×4SAD中選出最優方式
- x264_mb_analyse_intra( h, &analysis, COST_MAX );
- if( analysis.i_mbrd )
- x264_intra_rd( h, &analysis, COST_MAX );
- //分析結果都存儲在analysis結構體中
- //開銷
- i_cost = analysis.i_satd_i16x16;
- h->mb.i_type = I_16x16;
- //如果I4x4或者I8x8開銷更小的話就拷貝
- //copy if little
- COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, h->mb.i_type, I_4x4 );
- COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, h->mb.i_type, I_8x8 );
- //畫面極其特殊的時候,纔有可能用到PCM
- if( analysis.i_satd_pcm < i_cost )
- h->mb.i_type = I_PCM;
-
- else if( analysis.i_mbrd >= 2 )
- x264_intra_rd_refine( h, &analysis );
- }
- else if( h->sh.i_type == SLICE_TYPE_P )
- {
- //P slice
-
- int b_skip = 0;
-
- h->mc.prefetch_ref( h->mb.pic.p_fref[0][0][h->mb.i_mb_x&3], h->mb.pic.i_stride[0], 0 );
-
- analysis.b_try_skip = 0;
- if( analysis.b_force_intra )
- {
- if( !h->param.analyse.b_psy )
- {
- x264_mb_analyse_init_qp( h, &analysis, X264_MAX( h->mb.i_qp - h->mb.ip_offset, h->param.rc.i_qp_min ) );
- goto intra_analysis;
- }
- }
- else
- {
- /* Special fast-skip logic using information from mb_info. */
- if( h->fdec->mb_info && (h->fdec->mb_info[h->mb.i_mb_xy]&X264_MBINFO_CONSTANT) )
- {
- if( !SLICE_MBAFF && (h->fdec->i_frame - h->fref[0][0]->i_frame) == 1 && !h->sh.b_weighted_pred &&
- h->fref[0][0]->effective_qp[h->mb.i_mb_xy] <= h->mb.i_qp )
- {
- h->mb.i_partition = D_16x16;
- /* Use the P-SKIP MV if we can... */
- if( !M32(h->mb.cache.pskip_mv) )
- {
- b_skip = 1;
- h->mb.i_type = P_SKIP;
- }
- /* Otherwise, just force a 16x16 block. */
- else
- {
- h->mb.i_type = P_L0;
- analysis.l0.me16x16.i_ref = 0;
- M32( analysis.l0.me16x16.mv ) = 0;
- }
- goto skip_analysis;
- }
- /* Reset the information accordingly */
- else if( h->param.analyse.b_mb_info_update )
- h->fdec->mb_info[h->mb.i_mb_xy] &= ~X264_MBINFO_CONSTANT;
- }
-
- int skip_invalid = h->i_thread_frames > 1 && h->mb.cache.pskip_mv[1] > h->mb.mv_max_spel[1];
- /* If the current macroblock is off the frame, just skip it. */
- if( HAVE_INTERLACED && !MB_INTERLACED && h->mb.i_mb_y * 16 >= h->param.i_height && !skip_invalid )
- b_skip = 1;
- /* Fast P_SKIP detection */
- else if( h->param.analyse.b_fast_pskip )
- {
- if( skip_invalid )
- // FIXME don't need to check this if the reference frame is done
- {}
- else if( h->param.analyse.i_subpel_refine >= 3 )
- analysis.b_try_skip = 1;
- else if( h->mb.i_mb_type_left[0] == P_SKIP ||
- h->mb.i_mb_type_top == P_SKIP ||
- h->mb.i_mb_type_topleft == P_SKIP ||
- h->mb.i_mb_type_topright == P_SKIP )
- b_skip = x264_macroblock_probe_pskip( h );//檢查是否是Skip類型
- }
- }
-
- h->mc.prefetch_ref( h->mb.pic.p_fref[0][0][h->mb.i_mb_x&3], h->mb.pic.i_stride[0], 1 );
-
- if( b_skip )
- {
- h->mb.i_type = P_SKIP;
- h->mb.i_partition = D_16x16;
- assert( h->mb.cache.pskip_mv[1] <= h->mb.mv_max_spel[1] || h->i_thread_frames == 1 );
- skip_analysis:
- /* Set up MVs for future predictors */
- for( int i = 0; i < h->mb.pic.i_fref[0]; i++ )
- M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;
- }
- else
- {
- const unsigned int flags = h->param.analyse.inter;
- int i_type;
- int i_partition;
- int i_satd_inter, i_satd_intra;
-
- x264_mb_analyse_load_costs( h, &analysis );
- /*
- * 16x16 幀間預測宏塊分析-P
- *
- * +--------+--------+
- * | |
- * | |
- * | |
- * + + +
- * | |
- * | |
- * | |
- * +--------+--------+
- *
- */
- x264_mb_analyse_inter_p16x16( h, &analysis );
-
- if( h->mb.i_type == P_SKIP )
- {
- for( int i = 1; i < h->mb.pic.i_fref[0]; i++ )
- M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;
- return;
- }
-
- if( flags & X264_ANALYSE_PSUB16x16 )
- {
- if( h->param.analyse.b_mixed_references )
- x264_mb_analyse_inter_p8x8_mixed_ref( h, &analysis );
- else{
- /*
- * 8x8幀間預測宏塊分析-P
- * +--------+
- * | |
- * | |
- * | |
- * +--------+
- */
- x264_mb_analyse_inter_p8x8( h, &analysis );
- }
- }
-
- /* Select best inter mode */
- i_type = P_L0;
- i_partition = D_16x16;
- i_cost = analysis.l0.me16x16.cost;
-
- //如果8x8的代價值小於16x16
- //則進行8x8子塊分割的處理
-
- //處理的數據源自於l0
- if( ( flags & X264_ANALYSE_PSUB16x16 ) && (!analysis.b_early_terminate ||
- analysis.l0.i_cost8x8 < analysis.l0.me16x16.cost) )
- {
- i_type = P_8x8;
- i_partition = D_8x8;
- i_cost = analysis.l0.i_cost8x8;
-
- /* Do sub 8x8 */
- if( flags & X264_ANALYSE_PSUB8x8 )
- {
- for( int i = 0; i < 4; i++ )
- {
- //8x8塊的子塊的分析
- /*
- * 4x4
- * +----+----+
- * | | |
- * +----+----+
- * | | |
- * +----+----+
- *
- */
- x264_mb_analyse_inter_p4x4( h, &analysis, i );
- int i_thresh8x4 = analysis.l0.me4x4[i][1].cost_mv + analysis.l0.me4x4[i][2].cost_mv;
- //如果4x4小於8x8
- //則再分析8x4,4x8的代價
- if( !analysis.b_early_terminate || analysis.l0.i_cost4x4[i] < analysis.l0.me8x8[i].cost + i_thresh8x4 )
- {
- int i_cost8x8 = analysis.l0.i_cost4x4[i];
- h->mb.i_sub_partition[i] = D_L0_4x4;
- /*
- * 8x4
- * +----+----+
- * | |
- * +----+----+
- * | |
- * +----+----+
- *
- */
- //如果8x4小於8x8
- x264_mb_analyse_inter_p8x4( h, &analysis, i );
- COPY2_IF_LT( i_cost8x8, analysis.l0.i_cost8x4[i],
- h->mb.i_sub_partition[i], D_L0_8x4 );
- /*
- * 4x8
- * +----+----+
- * | | |
- * + + +
- * | | |
- * +----+----+
- *
- */
- //如果4x8小於8x8
- x264_mb_analyse_inter_p4x8( h, &analysis, i );
- COPY2_IF_LT( i_cost8x8, analysis.l0.i_cost4x8[i],
- h->mb.i_sub_partition[i], D_L0_4x8 );
-
- i_cost += i_cost8x8 - analysis.l0.me8x8[i].cost;
- }
- x264_mb_cache_mv_p8x8( h, &analysis, i );
- }
- analysis.l0.i_cost8x8 = i_cost;
- }
- }
-
- /* Now do 16x8/8x16 */
- int i_thresh16x8 = analysis.l0.me8x8[1].cost_mv + analysis.l0.me8x8[2].cost_mv;
-
- //前提要求8x8的代價值小於16x16
- if( ( flags & X264_ANALYSE_PSUB16x16 ) && (!analysis.b_early_terminate ||
- analysis.l0.i_cost8x8 < analysis.l0.me16x16.cost + i_thresh16x8) )
- {
- int i_avg_mv_ref_cost = (analysis.l0.me8x8[2].cost_mv + analysis.l0.me8x8[2].i_ref_cost
- + analysis.l0.me8x8[3].cost_mv + analysis.l0.me8x8[3].i_ref_cost + 1) >> 1;
- analysis.i_cost_est16x8[1] = analysis.i_satd8x8[0][2] + analysis.i_satd8x8[0][3] + i_avg_mv_ref_cost;
- /*
- * 16x8 宏塊劃分
- *
- * +--------+--------+
- * | | |
- * | | |
- * | | |
- * +--------+--------+
- *
- */
- x264_mb_analyse_inter_p16x8( h, &analysis, i_cost );
- COPY3_IF_LT( i_cost, analysis.l0.i_cost16x8, i_type, P_L0, i_partition, D_16x8 );
-
- i_avg_mv_ref_cost = (analysis.l0.me8x8[1].cost_mv + analysis.l0.me8x8[1].i_ref_cost
- + analysis.l0.me8x8[3].cost_mv + analysis.l0.me8x8[3].i_ref_cost + 1) >> 1;
- analysis.i_cost_est8x16[1] = analysis.i_satd8x8[0][1] + analysis.i_satd8x8[0][3] + i_avg_mv_ref_cost;
- /*
- * 8x16 宏塊劃分
- *
- * +--------+
- * | |
- * | |
- * | |
- * +--------+
- * | |
- * | |
- * | |
- * +--------+
- *
- */
- x264_mb_analyse_inter_p8x16( h, &analysis, i_cost );
- COPY3_IF_LT( i_cost, analysis.l0.i_cost8x16, i_type, P_L0, i_partition, D_8x16 );
- }
-
- h->mb.i_partition = i_partition;
-
- /* refine qpel */
- //亞像素精度搜索
- //FIXME mb_type costs?
- if( analysis.i_mbrd || !h->mb.i_subpel_refine )
- {
- /* refine later */
- }
- else if( i_partition == D_16x16 )
- {
- x264_me_refine_qpel( h, &analysis.l0.me16x16 );
- i_cost = analysis.l0.me16x16.cost;
- }
- else if( i_partition == D_16x8 )
- {
- x264_me_refine_qpel( h, &analysis.l0.me16x8[0] );
- x264_me_refine_qpel( h, &analysis.l0.me16x8[1] );
- i_cost = analysis.l0.me16x8[0].cost + analysis.l0.me16x8[1].cost;
- }
- else if( i_partition == D_8x16 )
- {
- x264_me_refine_qpel( h, &analysis.l0.me8x16[0] );
- x264_me_refine_qpel( h, &analysis.l0.me8x16[1] );
- i_cost = analysis.l0.me8x16[0].cost + analysis.l0.me8x16[1].cost;
- }
- else if( i_partition == D_8x8 )
- {
- i_cost = 0;
- for( int i8x8 = 0; i8x8 < 4; i8x8++ )
- {
- switch( h->mb.i_sub_partition[i8x8] )
- {
- case D_L0_8x8:
- x264_me_refine_qpel( h, &analysis.l0.me8x8[i8x8] );
- i_cost += analysis.l0.me8x8[i8x8].cost;
- break;
- case D_L0_8x4:
- x264_me_refine_qpel( h, &analysis.l0.me8x4[i8x8][0] );
- x264_me_refine_qpel( h, &analysis.l0.me8x4[i8x8][1] );
- i_cost += analysis.l0.me8x4[i8x8][0].cost +
- analysis.l0.me8x4[i8x8][1].cost;
- break;
- case D_L0_4x8:
- x264_me_refine_qpel( h, &analysis.l0.me4x8[i8x8][0] );
- x264_me_refine_qpel( h, &analysis.l0.me4x8[i8x8][1] );
- i_cost += analysis.l0.me4x8[i8x8][0].cost +
- analysis.l0.me4x8[i8x8][1].cost;
- break;
-
- case D_L0_4x4:
- x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][0] );
- x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][1] );
- x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][2] );
- x264_me_refine_qpel( h, &analysis.l0.me4x4[i8x8][3] );
- i_cost += analysis.l0.me4x4[i8x8][0].cost +
- analysis.l0.me4x4[i8x8][1].cost +
- analysis.l0.me4x4[i8x8][2].cost +
- analysis.l0.me4x4[i8x8][3].cost;
- break;
- default:
- x264_log( h, X264_LOG_ERROR, "internal error (!8x8 && !4x4)\n" );
- break;
- }
- }
- }
-
- if( h->mb.b_chroma_me )
- {
- if( CHROMA444 )
- {
- x264_mb_analyse_intra( h, &analysis, i_cost );
- x264_mb_analyse_intra_chroma( h, &analysis );
- }
- else
- {
- x264_mb_analyse_intra_chroma( h, &analysis );
- x264_mb_analyse_intra( h, &analysis, i_cost - analysis.i_satd_chroma );
- }
- analysis.i_satd_i16x16 += analysis.i_satd_chroma;
- analysis.i_satd_i8x8 += analysis.i_satd_chroma;
- analysis.i_satd_i4x4 += analysis.i_satd_chroma;
- }
- else
- x264_mb_analyse_intra( h, &analysis, i_cost );//P Slice中也允許有Intra宏塊,所以也要進行分析
-
- i_satd_inter = i_cost;
- i_satd_intra = X264_MIN3( analysis.i_satd_i16x16,
- analysis.i_satd_i8x8,
- analysis.i_satd_i4x4 );
-
- if( analysis.i_mbrd )
- {
- x264_mb_analyse_p_rd( h, &analysis, X264_MIN(i_satd_inter, i_satd_intra) );
- i_type = P_L0;
- i_partition = D_16x16;
- i_cost = analysis.l0.i_rd16x16;
- COPY2_IF_LT( i_cost, analysis.l0.i_cost16x8, i_partition, D_16x8 );
- COPY2_IF_LT( i_cost, analysis.l0.i_cost8x16, i_partition, D_8x16 );
- COPY3_IF_LT( i_cost, analysis.l0.i_cost8x8, i_partition, D_8x8, i_type, P_8x8 );
- h->mb.i_type = i_type;
- h->mb.i_partition = i_partition;
- if( i_cost < COST_MAX )
- x264_mb_analyse_transform_rd( h, &analysis, &i_satd_inter, &i_cost );
- x264_intra_rd( h, &analysis, i_satd_inter * 5/4 + 1 );
- }
- //獲取最小的代價
- COPY2_IF_LT( i_cost, analysis.i_satd_i16x16, i_type, I_16x16 );
- COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, i_type, I_8x8 );
- COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, i_type, I_4x4 );
- COPY2_IF_LT( i_cost, analysis.i_satd_pcm, i_type, I_PCM );
-
- h->mb.i_type = i_type;
-
- if( analysis.b_force_intra && !IS_INTRA(i_type) )
- {
- /* Intra masking: copy fdec to fenc and re-encode the block as intra in order to make it appear as if
- * it was an inter block. */
- x264_analyse_update_cache( h, &analysis );
- x264_macroblock_encode( h );
- for( int p = 0; p < (CHROMA444 ? 3 : 1); p++ )
- h->mc.copy[PIXEL_16x16]( h->mb.pic.p_fenc[p], FENC_STRIDE, h->mb.pic.p_fdec[p], FDEC_STRIDE, 16 );
- if( !CHROMA444 )
- {
- int height = 16 >> CHROMA_V_SHIFT;
- h->mc.copy[PIXEL_8x8] ( h->mb.pic.p_fenc[1], FENC_STRIDE, h->mb.pic.p_fdec[1], FDEC_STRIDE, height );
- h->mc.copy[PIXEL_8x8] ( h->mb.pic.p_fenc[2], FENC_STRIDE, h->mb.pic.p_fdec[2], FDEC_STRIDE, height );
- }
- x264_mb_analyse_init_qp( h, &analysis, X264_MAX( h->mb.i_qp - h->mb.ip_offset, h->param.rc.i_qp_min ) );
- goto intra_analysis;
- }
-
- if( analysis.i_mbrd >= 2 && h->mb.i_type != I_PCM )
- {
- if( IS_INTRA( h->mb.i_type ) )
- {
- x264_intra_rd_refine( h, &analysis );
- }
- else if( i_partition == D_16x16 )
- {
- x264_macroblock_cache_ref( h, 0, 0, 4, 4, 0, analysis.l0.me16x16.i_ref );
- analysis.l0.me16x16.cost = i_cost;
- x264_me_refine_qpel_rd( h, &analysis.l0.me16x16, analysis.i_lambda2, 0, 0 );
- }
- else if( i_partition == D_16x8 )
- {
- h->mb.i_sub_partition[0] = h->mb.i_sub_partition[1] =
- h->mb.i_sub_partition[2] = h->mb.i_sub_partition[3] = D_L0_8x8;
- x264_macroblock_cache_ref( h, 0, 0, 4, 2, 0, analysis.l0.me16x8[0].i_ref );
- x264_macroblock_cache_ref( h, 0, 2, 4, 2, 0, analysis.l0.me16x8[1].i_ref );
- x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[0], analysis.i_lambda2, 0, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[1], analysis.i_lambda2, 8, 0 );
- }
- else if( i_partition == D_8x16 )
- {
- h->mb.i_sub_partition[0] = h->mb.i_sub_partition[1] =
- h->mb.i_sub_partition[2] = h->mb.i_sub_partition[3] = D_L0_8x8;
- x264_macroblock_cache_ref( h, 0, 0, 2, 4, 0, analysis.l0.me8x16[0].i_ref );
- x264_macroblock_cache_ref( h, 2, 0, 2, 4, 0, analysis.l0.me8x16[1].i_ref );
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[0], analysis.i_lambda2, 0, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[1], analysis.i_lambda2, 4, 0 );
- }
- else if( i_partition == D_8x8 )
- {
- x264_analyse_update_cache( h, &analysis );
- for( int i8x8 = 0; i8x8 < 4; i8x8++ )
- {
- if( h->mb.i_sub_partition[i8x8] == D_L0_8x8 )
- {
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x8[i8x8], analysis.i_lambda2, i8x8*4, 0 );
- }
- else if( h->mb.i_sub_partition[i8x8] == D_L0_8x4 )
- {
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x4[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x4[i8x8][1], analysis.i_lambda2, i8x8*4+2, 0 );
- }
- else if( h->mb.i_sub_partition[i8x8] == D_L0_4x8 )
- {
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x8[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x8[i8x8][1], analysis.i_lambda2, i8x8*4+1, 0 );
- }
- else if( h->mb.i_sub_partition[i8x8] == D_L0_4x4 )
- {
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][0], analysis.i_lambda2, i8x8*4+0, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][1], analysis.i_lambda2, i8x8*4+1, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][2], analysis.i_lambda2, i8x8*4+2, 0 );
- x264_me_refine_qpel_rd( h, &analysis.l0.me4x4[i8x8][3], analysis.i_lambda2, i8x8*4+3, 0 );
- }
- }
- }
- }
- }
- }
- else if( h->sh.i_type == SLICE_TYPE_B )//B Slice的時候
- {
- int i_bskip_cost = COST_MAX;
- int b_skip = 0;
-
- if( analysis.i_mbrd )
- x264_mb_init_fenc_cache( h, analysis.i_mbrd >= 2 );
-
- h->mb.i_type = B_SKIP;
- if( h->mb.b_direct_auto_write )
- {
- /* direct=auto heuristic: prefer whichever mode allows more Skip macroblocks */
- for( int i = 0; i < 2; i++ )
- {
- int b_changed = 1;
- h->sh.b_direct_spatial_mv_pred ^= 1;
- analysis.b_direct_available = x264_mb_predict_mv_direct16x16( h, i && analysis.b_direct_available ? &b_changed : NULL );
- if( analysis.b_direct_available )
- {
- if( b_changed )
- {
- x264_mb_mc( h );
- b_skip = x264_macroblock_probe_bskip( h );
- }
- h->stat.frame.i_direct_score[ h->sh.b_direct_spatial_mv_pred ] += b_skip;
- }
- else
- b_skip = 0;
- }
- }
- else
- analysis.b_direct_available = x264_mb_predict_mv_direct16x16( h, NULL );
-
- analysis.b_try_skip = 0;
- if( analysis.b_direct_available )
- {
- if( !h->mb.b_direct_auto_write )
- x264_mb_mc( h );
- /* If the current macroblock is off the frame, just skip it. */
- if( HAVE_INTERLACED && !MB_INTERLACED && h->mb.i_mb_y * 16 >= h->param.i_height )
- b_skip = 1;
- else if( analysis.i_mbrd )
- {
- i_bskip_cost = ssd_mb( h );
- /* 6 = minimum cavlc cost of a non-skipped MB */
- b_skip = h->mb.b_skip_mc = i_bskip_cost <= ((6 * analysis.i_lambda2 + 128) >> 8);
- }
- else if( !h->mb.b_direct_auto_write )
- {
- /* Conditioning the probe on neighboring block types
- * doesn't seem to help speed or quality. */
- analysis.b_try_skip = x264_macroblock_probe_bskip( h );
- if( h->param.analyse.i_subpel_refine < 3 )
- b_skip = analysis.b_try_skip;
- }
- /* Set up MVs for future predictors */
- if( b_skip )
- {
- for( int i = 0; i < h->mb.pic.i_fref[0]; i++ )
- M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;
- for( int i = 0; i < h->mb.pic.i_fref[1]; i++ )
- M32( h->mb.mvr[1][i][h->mb.i_mb_xy] ) = 0;
- }
- }
-
- if( !b_skip )
- {
- const unsigned int flags = h->param.analyse.inter;
- int i_type;
- int i_partition;
- int i_satd_inter;
- h->mb.b_skip_mc = 0;
- h->mb.i_type = B_DIRECT;
-
- x264_mb_analyse_load_costs( h, &analysis );
-
- /* select best inter mode */
- /* direct must be first */
- if( analysis.b_direct_available )
- x264_mb_analyse_inter_direct( h, &analysis );
- /*
- * 16x16 幀間預測宏塊分析-B
- *
- * +--------+--------+
- * | |
- * | |
- * | |
- * + + +
- * | |
- * | |
- * | |
- * +--------+--------+
- *
- */
- x264_mb_analyse_inter_b16x16( h, &analysis );
-
- if( h->mb.i_type == B_SKIP )
- {
- for( int i = 1; i < h->mb.pic.i_fref[0]; i++ )
- M32( h->mb.mvr[0][i][h->mb.i_mb_xy] ) = 0;
- for( int i = 1; i < h->mb.pic.i_fref[1]; i++ )
- M32( h->mb.mvr[1][i][h->mb.i_mb_xy] ) = 0;
- return;
- }
-
- i_type = B_L0_L0;
- i_partition = D_16x16;
- i_cost = analysis.l0.me16x16.cost;
- COPY2_IF_LT( i_cost, analysis.l1.me16x16.cost, i_type, B_L1_L1 );
- COPY2_IF_LT( i_cost, analysis.i_cost16x16bi, i_type, B_BI_BI );
- COPY2_IF_LT( i_cost, analysis.i_cost16x16direct, i_type, B_DIRECT );
-
- if( analysis.i_mbrd && analysis.b_early_terminate && analysis.i_cost16x16direct <= i_cost * 33/32 )
- {
- x264_mb_analyse_b_rd( h, &analysis, i_cost );
- if( i_bskip_cost < analysis.i_rd16x16direct &&
- i_bskip_cost < analysis.i_rd16x16bi &&
- i_bskip_cost < analysis.l0.i_rd16x16 &&
- i_bskip_cost < analysis.l1.i_rd16x16 )
- {
- h->mb.i_type = B_SKIP;
- x264_analyse_update_cache( h, &analysis );
- return;
- }
- }
-
- if( flags & X264_ANALYSE_BSUB16x16 )
- {
-
- /*
- * 8x8 幀間預測宏塊分析-B
- * +--------+
- * | |
- * | |
- * | |
- * +--------+
- *
- */
-
- if( h->param.analyse.b_mixed_references )
- x264_mb_analyse_inter_b8x8_mixed_ref( h, &analysis );
- else
- x264_mb_analyse_inter_b8x8( h, &analysis );
-
- COPY3_IF_LT( i_cost, analysis.i_cost8x8bi, i_type, B_8x8, i_partition, D_8x8 );
-
- /* Try to estimate the cost of b16x8/b8x16 based on the satd scores of the b8x8 modes */
- int i_cost_est16x8bi_total = 0, i_cost_est8x16bi_total = 0;
- int i_mb_type, i_partition16x8[2], i_partition8x16[2];
- for( int i = 0; i < 2; i++ )
- {
- int avg_l0_mv_ref_cost, avg_l1_mv_ref_cost;
- int i_l0_satd, i_l1_satd, i_bi_satd, i_best_cost;
- // 16x8
- i_best_cost = COST_MAX;
- i_l0_satd = analysis.i_satd8x8[0][i*2] + analysis.i_satd8x8[0][i*2+1];
- i_l1_satd = analysis.i_satd8x8[1][i*2] + analysis.i_satd8x8[1][i*2+1];
- i_bi_satd = analysis.i_satd8x8[2][i*2] + analysis.i_satd8x8[2][i*2+1];
- avg_l0_mv_ref_cost = ( analysis.l0.me8x8[i*2].cost_mv + analysis.l0.me8x8[i*2].i_ref_cost
- + analysis.l0.me8x8[i*2+1].cost_mv + analysis.l0.me8x8[i*2+1].i_ref_cost + 1 ) >> 1;
- avg_l1_mv_ref_cost = ( analysis.l1.me8x8[i*2].cost_mv + analysis.l1.me8x8[i*2].i_ref_cost
- + analysis.l1.me8x8[i*2+1].cost_mv + analysis.l1.me8x8[i*2+1].i_ref_cost + 1 ) >> 1;
- COPY2_IF_LT( i_best_cost, i_l0_satd + avg_l0_mv_ref_cost, i_partition16x8[i], D_L0_8x8 );
- COPY2_IF_LT( i_best_cost, i_l1_satd + avg_l1_mv_ref_cost, i_partition16x8[i], D_L1_8x8 );
- COPY2_IF_LT( i_best_cost, i_bi_satd + avg_l0_mv_ref_cost + avg_l1_mv_ref_cost, i_partition16x8[i], D_BI_8x8 );
- analysis.i_cost_est16x8[i] = i_best_cost;
-
- // 8x16
- i_best_cost = COST_MAX;
- i_l0_satd = analysis.i_satd8x8[0][i] + analysis.i_satd8x8[0][i+2];
- i_l1_satd = analysis.i_satd8x8[1][i] + analysis.i_satd8x8[1][i+2];
- i_bi_satd = analysis.i_satd8x8[2][i] + analysis.i_satd8x8[2][i+2];
- avg_l0_mv_ref_cost = ( analysis.l0.me8x8[i].cost_mv + analysis.l0.me8x8[i].i_ref_cost
- + analysis.l0.me8x8[i+2].cost_mv + analysis.l0.me8x8[i+2].i_ref_cost + 1 ) >> 1;
- avg_l1_mv_ref_cost = ( analysis.l1.me8x8[i].cost_mv + analysis.l1.me8x8[i].i_ref_cost
- + analysis.l1.me8x8[i+2].cost_mv + analysis.l1.me8x8[i+2].i_ref_cost + 1 ) >> 1;
- COPY2_IF_LT( i_best_cost, i_l0_satd + avg_l0_mv_ref_cost, i_partition8x16[i], D_L0_8x8 );
- COPY2_IF_LT( i_best_cost, i_l1_satd + avg_l1_mv_ref_cost, i_partition8x16[i], D_L1_8x8 );
- COPY2_IF_LT( i_best_cost, i_bi_satd + avg_l0_mv_ref_cost + avg_l1_mv_ref_cost, i_partition8x16[i], D_BI_8x8 );
- analysis.i_cost_est8x16[i] = i_best_cost;
- }
- i_mb_type = B_L0_L0 + (i_partition16x8[0]>>2) * 3 + (i_partition16x8[1]>>2);
- analysis.i_cost_est16x8[1] += analysis.i_lambda * i_mb_b16x8_cost_table[i_mb_type];
- i_cost_est16x8bi_total = analysis.i_cost_est16x8[0] + analysis.i_cost_est16x8[1];
- i_mb_type = B_L0_L0 + (i_partition8x16[0]>>2) * 3 + (i_partition8x16[1]>>2);
- analysis.i_cost_est8x16[1] += analysis.i_lambda * i_mb_b16x8_cost_table[i_mb_type];
- i_cost_est8x16bi_total = analysis.i_cost_est8x16[0] + analysis.i_cost_est8x16[1];
-
- /* We can gain a little speed by checking the mode with the lowest estimated cost first */
- int try_16x8_first = i_cost_est16x8bi_total < i_cost_est8x16bi_total;
- if( try_16x8_first && (!analysis.b_early_terminate || i_cost_est16x8bi_total < i_cost) )
- {
- x264_mb_analyse_inter_b16x8( h, &analysis, i_cost );
- COPY3_IF_LT( i_cost, analysis.i_cost16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );
- }
- if( !analysis.b_early_terminate || i_cost_est8x16bi_total < i_cost )
- {
- x264_mb_analyse_inter_b8x16( h, &analysis, i_cost );
- COPY3_IF_LT( i_cost, analysis.i_cost8x16bi, i_type, analysis.i_mb_type8x16, i_partition, D_8x16 );
- }
- if( !try_16x8_first && (!analysis.b_early_terminate || i_cost_est16x8bi_total < i_cost) )
- {
- x264_mb_analyse_inter_b16x8( h, &analysis, i_cost );
- COPY3_IF_LT( i_cost, analysis.i_cost16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );
- }
- }
-
- if( analysis.i_mbrd || !h->mb.i_subpel_refine )
- {
- /* refine later */
- }
- /* refine qpel */
- else if( i_partition == D_16x16 )
- {
- analysis.l0.me16x16.cost -= analysis.i_lambda * i_mb_b_cost_table[B_L0_L0];
- analysis.l1.me16x16.cost -= analysis.i_lambda * i_mb_b_cost_table[B_L1_L1];
- if( i_type == B_L0_L0 )
- {
- x264_me_refine_qpel( h, &analysis.l0.me16x16 );
- i_cost = analysis.l0.me16x16.cost
- + analysis.i_lambda * i_mb_b_cost_table[B_L0_L0];
- }
- else if( i_type == B_L1_L1 )
- {
- x264_me_refine_qpel( h, &analysis.l1.me16x16 );
- i_cost = analysis.l1.me16x16.cost
- + analysis.i_lambda * i_mb_b_cost_table[B_L1_L1];
- }
- else if( i_type == B_BI_BI )
- {
- x264_me_refine_qpel( h, &analysis.l0.bi16x16 );
- x264_me_refine_qpel( h, &analysis.l1.bi16x16 );
- }
- }
- else if( i_partition == D_16x8 )
- {
- for( int i = 0; i < 2; i++ )
- {
- if( analysis.i_mb_partition16x8[i] != D_L1_8x8 )
- x264_me_refine_qpel( h, &analysis.l0.me16x8[i] );
- if( analysis.i_mb_partition16x8[i] != D_L0_8x8 )
- x264_me_refine_qpel( h, &analysis.l1.me16x8[i] );
- }
- }
- else if( i_partition == D_8x16 )
- {
- for( int i = 0; i < 2; i++ )
- {
- if( analysis.i_mb_partition8x16[i] != D_L1_8x8 )
- x264_me_refine_qpel( h, &analysis.l0.me8x16[i] );
- if( analysis.i_mb_partition8x16[i] != D_L0_8x8 )
- x264_me_refine_qpel( h, &analysis.l1.me8x16[i] );
- }
- }
- else if( i_partition == D_8x8 )
- {
- for( int i = 0; i < 4; i++ )
- {
- x264_me_t *m;
- int i_part_cost_old;
- int i_type_cost;
- int i_part_type = h->mb.i_sub_partition[i];
- int b_bidir = (i_part_type == D_BI_8x8);
-
- if( i_part_type == D_DIRECT_8x8 )
- continue;
- if( x264_mb_partition_listX_table[0][i_part_type] )
- {
- m = &analysis.l0.me8x8[i];
- i_part_cost_old = m->cost;
- i_type_cost = analysis.i_lambda * i_sub_mb_b_cost_table[D_L0_8x8];
- m->cost -= i_type_cost;
- x264_me_refine_qpel( h, m );
- if( !b_bidir )
- analysis.i_cost8x8bi += m->cost + i_type_cost - i_part_cost_old;
- }
- if( x264_mb_partition_listX_table[1][i_part_type] )
- {
- m = &analysis.l1.me8x8[i];
- i_part_cost_old = m->cost;
- i_type_cost = analysis.i_lambda * i_sub_mb_b_cost_table[D_L1_8x8];
- m->cost -= i_type_cost;
- x264_me_refine_qpel( h, m );
- if( !b_bidir )
- analysis.i_cost8x8bi += m->cost + i_type_cost - i_part_cost_old;
- }
- /* TODO: update mvp? */
- }
- }
-
- i_satd_inter = i_cost;
-
- if( analysis.i_mbrd )
- {
- x264_mb_analyse_b_rd( h, &analysis, i_satd_inter );
- i_type = B_SKIP;
- i_cost = i_bskip_cost;
- i_partition = D_16x16;
- COPY2_IF_LT( i_cost, analysis.l0.i_rd16x16, i_type, B_L0_L0 );
- COPY2_IF_LT( i_cost, analysis.l1.i_rd16x16, i_type, B_L1_L1 );
- COPY2_IF_LT( i_cost, analysis.i_rd16x16bi, i_type, B_BI_BI );
- COPY2_IF_LT( i_cost, analysis.i_rd16x16direct, i_type, B_DIRECT );
- COPY3_IF_LT( i_cost, analysis.i_rd16x8bi, i_type, analysis.i_mb_type16x8, i_partition, D_16x8 );
- COPY3_IF_LT( i_cost, analysis.i_rd8x16bi, i_type, analysis.i_mb_type8x16, i_partition, D_8x16 );
- COPY3_IF_LT( i_cost, analysis.i_rd8x8bi, i_type, B_8x8, i_partition, D_8x8 );
-
- h->mb.i_type = i_type;
- h->mb.i_partition = i_partition;
- }
-
- if( h->mb.b_chroma_me )
- {
- if( CHROMA444 )
- {
- x264_mb_analyse_intra( h, &analysis, i_satd_inter );
- x264_mb_analyse_intra_chroma( h, &analysis );
- }
- else
- {
- x264_mb_analyse_intra_chroma( h, &analysis );
- x264_mb_analyse_intra( h, &analysis, i_satd_inter - analysis.i_satd_chroma );
- }
- analysis.i_satd_i16x16 += analysis.i_satd_chroma;
- analysis.i_satd_i8x8 += analysis.i_satd_chroma;
- analysis.i_satd_i4x4 += analysis.i_satd_chroma;
- }
- else
- x264_mb_analyse_intra( h, &analysis, i_satd_inter );
-
- if( analysis.i_mbrd )
- {
- x264_mb_analyse_transform_rd( h, &analysis, &i_satd_inter, &i_cost );
- x264_intra_rd( h, &analysis, i_satd_inter * 17/16 + 1 );
- }
-
- COPY2_IF_LT( i_cost, analysis.i_satd_i16x16, i_type, I_16x16 );
- COPY2_IF_LT( i_cost, analysis.i_satd_i8x8, i_type, I_8x8 );
- COPY2_IF_LT( i_cost, analysis.i_satd_i4x4, i_type, I_4x4 );
- COPY2_IF_LT( i_cost, analysis.i_satd_pcm, i_type, I_PCM );
-
- h->mb.i_type = i_type;
- h->mb.i_partition = i_partition;
-
- if( analysis.i_mbrd >= 2 && IS_INTRA( i_type ) && i_type != I_PCM )
- x264_intra_rd_refine( h, &analysis );
- if( h->mb.i_subpel_refine >= 5 )
- x264_refine_bidir( h, &analysis );
-
- if( analysis.i_mbrd >= 2 && i_type > B_DIRECT && i_type < B_SKIP )
- {
- int i_biweight;
- x264_analyse_update_cache( h, &analysis );
-
- if( i_partition == D_16x16 )
- {
- if( i_type == B_L0_L0 )
- {
- analysis.l0.me16x16.cost = i_cost;
- x264_me_refine_qpel_rd( h, &analysis.l0.me16x16, analysis.i_lambda2, 0, 0 );
- }
- else if( i_type == B_L1_L1 )
- {
- analysis.l1.me16x16.cost = i_cost;
- x264_me_refine_qpel_rd( h, &analysis.l1.me16x16, analysis.i_lambda2, 0, 1 );
- }
- else if( i_type == B_BI_BI )
- {
- i_biweight = h->mb.bipred_weight[analysis.l0.bi16x16.i_ref][analysis.l1.bi16x16.i_ref];
- x264_me_refine_bidir_rd( h, &analysis.l0.bi16x16, &analysis.l1.bi16x16, i_biweight, 0, analysis.i_lambda2 );
- }
- }
- else if( i_partition == D_16x8 )
- {
- for( int i = 0; i < 2; i++ )
- {
- h->mb.i_sub_partition[i*2] = h->mb.i_sub_partition[i*2+1] = analysis.i_mb_partition16x8[i];
- if( analysis.i_mb_partition16x8[i] == D_L0_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l0.me16x8[i], analysis.i_lambda2, i*8, 0 );
- else if( analysis.i_mb_partition16x8[i] == D_L1_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l1.me16x8[i], analysis.i_lambda2, i*8, 1 );
- else if( analysis.i_mb_partition16x8[i] == D_BI_8x8 )
- {
- i_biweight = h->mb.bipred_weight[analysis.l0.me16x8[i].i_ref][analysis.l1.me16x8[i].i_ref];
- x264_me_refine_bidir_rd( h, &analysis.l0.me16x8[i], &analysis.l1.me16x8[i], i_biweight, i*2, analysis.i_lambda2 );
- }
- }
- }
- else if( i_partition == D_8x16 )
- {
- for( int i = 0; i < 2; i++ )
- {
- h->mb.i_sub_partition[i] = h->mb.i_sub_partition[i+2] = analysis.i_mb_partition8x16[i];
- if( analysis.i_mb_partition8x16[i] == D_L0_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x16[i], analysis.i_lambda2, i*4, 0 );
- else if( analysis.i_mb_partition8x16[i] == D_L1_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l1.me8x16[i], analysis.i_lambda2, i*4, 1 );
- else if( analysis.i_mb_partition8x16[i] == D_BI_8x8 )
- {
- i_biweight = h->mb.bipred_weight[analysis.l0.me8x16[i].i_ref][analysis.l1.me8x16[i].i_ref];
- x264_me_refine_bidir_rd( h, &analysis.l0.me8x16[i], &analysis.l1.me8x16[i], i_biweight, i, analysis.i_lambda2 );
- }
- }
- }
- else if( i_partition == D_8x8 )
- {
- for( int i = 0; i < 4; i++ )
- {
- if( h->mb.i_sub_partition[i] == D_L0_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l0.me8x8[i], analysis.i_lambda2, i*4, 0 );
- else if( h->mb.i_sub_partition[i] == D_L1_8x8 )
- x264_me_refine_qpel_rd( h, &analysis.l1.me8x8[i], analysis.i_lambda2, i*4, 1 );
- else if( h->mb.i_sub_partition[i] == D_BI_8x8 )
- {
- i_biweight = h->mb.bipred_weight[analysis.l0.me8x8[i].i_ref][analysis.l1.me8x8[i].i_ref];
- x264_me_refine_bidir_rd( h, &analysis.l0.me8x8[i], &analysis.l1.me8x8[i], i_biweight, i, analysis.i_lambda2 );
- }
- }
- }
- }
- }
- }
-
- x264_analyse_update_cache( h, &analysis );
-
- /* In rare cases we can end up qpel-RDing our way back to a larger partition size
- * without realizing it. Check for this and account for it if necessary. */
- if( analysis.i_mbrd >= 2 )
- {
- /* Don't bother with bipred or 8x8-and-below, the odds are incredibly low. */
- static const uint8_t check_mv_lists[X264_MBTYPE_MAX] = {[P_L0]=1, [B_L0_L0]=1, [B_L1_L1]=2};
- int list = check_mv_lists[h->mb.i_type] - 1;
- if( list >= 0 && h->mb.i_partition != D_16x16 &&
- M32( &h->mb.cache.mv[list][x264_scan8[0]] ) == M32( &h->mb.cache.mv[list][x264_scan8[12]] ) &&
- h->mb.cache.ref[list][x264_scan8[0]] == h->mb.cache.ref[list][x264_scan8[12]] )
- h->mb.i_partition = D_16x16;
- }
-
- if( !analysis.i_mbrd )
- x264_mb_analyse_transform( h );
-
- if( analysis.i_mbrd == 3 && !IS_SKIP(h->mb.i_type) )
- x264_mb_analyse_qp_rd( h, &analysis );
-
- h->mb.b_trellis = h->param.analyse.i_trellis;
- h->mb.b_noise_reduction = h->mb.b_noise_reduction || (!!h->param.analyse.i_noise_reduction && !IS_INTRA( h->mb.i_type ));
-
- if( !IS_SKIP(h->mb.i_type) && h->mb.i_psy_trellis && h->param.analyse.i_trellis == 1 )
- x264_psy_trellis_init( h, 0 );
- if( h->mb.b_trellis == 1 || h->mb.b_noise_reduction )
- h->mb.i_skip_intra = 0;
- }
儘管x264_macroblock_analyse()的源代碼比較長,但是它的邏輯比較清晰,如下所示:
(1)如果當前是I Slice,調用x264_mb_analyse_intra()進行Intra宏塊的幀內預測模式分析。
(2)如果當前是P Slice,則進行下面流程的分析:
a)調用x264_macroblock_probe_pskip()分析是否爲Skip宏塊,如果是的話則不再進行下面分析。
b)調用x264_mb_analyse_inter_p16x16()分析P16x16幀間預測的代價。
c)調用x264_mb_analyse_inter_p8x8()分析P8x8幀間預測的代價。
d)如果P8x8代價值小於P16x16,則依次對4個8x8的子宏塊分割進行判斷:
i.調用x264_mb_analyse_inter_p4x4()分析P4x4幀間預測的代價。
ii.如果P4x4代價值小於P8x8,則調用 x264_mb_analyse_inter_p8x4()和x264_mb_analyse_inter_p4x8()分析P8x4和P4x8幀間預測的代價。
e)如果P8x8代價值小於P16x16,調用x264_mb_analyse_inter_p16x8()和x264_mb_analyse_inter_p8x16()分析P16x8和P8x16幀間預測的代價。
f)此外還要調用x264_mb_analyse_intra(),檢查當前宏塊作爲Intra宏塊編碼的代價是否小於作爲P宏塊編碼的代價(P Slice中也允許有Intra宏塊)。
(3)如果當前是B Slice,則進行和P Slice類似的處理。
本文記錄這一流程中Intra宏塊的幀內預測模式分析函數x264_mb_analyse_intra()。
x264_mb_analyse_intra()
x264_mb_analyse_intra()用於對Intra宏塊進行幀內預測模式的分析。該函數的定義位於encoder\analyse.c,如下所示。
- //幀內預測分析-從16x16的SAD,4個8x8的SAD和,16個4x4SAD中選出最優方式
- static void x264_mb_analyse_intra( x264_t *h, x264_mb_analysis_t *a, int i_satd_inter )
- {
- const unsigned int flags = h->sh.i_type == SLICE_TYPE_I ? h->param.analyse.intra : h->param.analyse.inter;
- //計算
- //p_fenc是編碼幀
- pixel *p_src = h->mb.pic.p_fenc[0];
- //p_fdec是重建幀
- pixel *p_dst = h->mb.pic.p_fdec[0];
-
- static const int8_t intra_analysis_shortcut[2][2][2][5] =
- {
- {{{I_PRED_4x4_HU, -1, -1, -1, -1},
- {I_PRED_4x4_DDL, I_PRED_4x4_VL, -1, -1, -1}},
- {{I_PRED_4x4_DDR, I_PRED_4x4_HD, I_PRED_4x4_HU, -1, -1},
- {I_PRED_4x4_DDL, I_PRED_4x4_DDR, I_PRED_4x4_VR, I_PRED_4x4_VL, -1}}},
- {{{I_PRED_4x4_HU, -1, -1, -1, -1},
- {-1, -1, -1, -1, -1}},
- {{I_PRED_4x4_DDR, I_PRED_4x4_HD, I_PRED_4x4_HU, -1, -1},
- {I_PRED_4x4_DDR, I_PRED_4x4_VR, -1, -1, -1}}},
- };
-
- int idx;
- int lambda = a->i_lambda;
-
- /*---------------- Try all mode and calculate their score ---------------*/
- /* Disabled i16x16 for AVC-Intra compat */
- //幀內16x16
- if( !h->param.i_avcintra_class )
- {
- //獲得可用的幀內預測模式-針對幀內16x16
- /*
- * 16x16塊
- *
- * +--------+--------+
- * | |
- * | |
- * | |
- * + + +
- * | |
- * | |
- * | |
- * +--------+--------+
- *
- */
- //左側是否有可用數據?上方是否有可用數據?
- const int8_t *predict_mode = predict_16x16_mode_available( h->mb.i_neighbour_intra );
-
- /* Not heavily tuned */
- static const uint8_t i16x16_thresh_lut[11] = { 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4 };
- int i16x16_thresh = a->b_fast_intra ? (i16x16_thresh_lut[h->mb.i_subpel_refine]*i_satd_inter)>>1 : COST_MAX;
-
- if( !h->mb.b_lossless && predict_mode[3] >= 0 )
- {
- h->pixf.intra_mbcmp_x3_16x16( p_src, p_dst, a->i_satd_i16x16_dir );
- a->i_satd_i16x16_dir[0] += lambda * bs_size_ue(0);
- a->i_satd_i16x16_dir[1] += lambda * bs_size_ue(1);
- a->i_satd_i16x16_dir[2] += lambda * bs_size_ue(2);
- COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[0], a->i_predict16x16, 0 );
- COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[1], a->i_predict16x16, 1 );
- COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[2], a->i_predict16x16, 2 );
-
- /* Plane is expensive, so don't check it unless one of the previous modes was useful. */
- if( a->i_satd_i16x16 <= i16x16_thresh )
- {
- h->predict_16x16[I_PRED_16x16_P]( p_dst );
- a->i_satd_i16x16_dir[I_PRED_16x16_P] = h->pixf.mbcmp[PIXEL_16x16]( p_dst, FDEC_STRIDE, p_src, FENC_STRIDE );
- a->i_satd_i16x16_dir[I_PRED_16x16_P] += lambda * bs_size_ue(3);
- COPY2_IF_LT( a->i_satd_i16x16, a->i_satd_i16x16_dir[I_PRED_16x16_P], a->i_predict16x16, 3 );
- }
- }
- else
- {
- //遍歷所有的可用的Intra16x16幀內預測模式
- //最多4種
- for( ; *predict_mode >= 0; predict_mode++ )
- {
- int i_satd;
- int i_mode = *predict_mode;
-
- //幀內預測彙編函數:根據左邊和上邊的像素計算出預測值
- /*
- * 幀內預測舉例
- * Vertical預測方式
- * |X1 X2 ... X16
- * --+---------------
- * |X1 X2 ... X16
- * |X1 X2 ... X16
- * |.. .. ... X16
- * |X1 X2 ... X16
- *
- * Horizontal預測方式
- * |
- * --+---------------
- * X1| X1 X1 ... X1
- * X2| X2 X2 ... X2
- * ..| .. .. ... ..
- * X16|X16 X16 ... X16
- *
- * DC預測方式
- * |X1 X2 ... X16
- * --+---------------
- * X17|
- * X18| Y
- * ..|
- * X32|
- *
- * Y=(X1+X2+X3+X4+...+X31+X32)/32
- *
- */
- if( h->mb.b_lossless )
- x264_predict_lossless_16x16( h, 0, i_mode );
- else
- h->predict_16x16[i_mode]( p_dst );//計算結果存儲在p_dst重建幀中
-
- //計算SAD或者是SATD(SATD(transformed)是經過Hadamard變換之後的SAD)
- //即編碼代價
- //數據位於p_dst和p_src
- i_satd = h->pixf.mbcmp[PIXEL_16x16]( p_dst, FDEC_STRIDE, p_src, FENC_STRIDE ) +
- lambda * bs_size_ue( x264_mb_pred_mode16x16_fix[i_mode] );
-
- //COPY2_IF_LT()函數的意思是「copy if little」。即如果值更小(代價更小),就拷貝。
- //宏定義展開後如下所示
- //if((i_satd)<(a->i_satd_i16x16))
- //{
- // (a->i_satd_i16x16)=(i_satd);
- // (a->i_predict16x16)=(i_mode);
- //}
- COPY2_IF_LT( a->i_satd_i16x16, i_satd, a->i_predict16x16, i_mode );
- //每種模式的代價都會存儲
- a->i_satd_i16x16_dir[i_mode] = i_satd;
- }
- }
-
- if( h->sh.i_type == SLICE_TYPE_B )
- /* cavlc mb type prefix */
- a->i_satd_i16x16 += lambda * i_mb_b_cost_table[I_16x16];
-
- if( a->i_satd_i16x16 > i16x16_thresh )
- return;
- }
-
- uint16_t *cost_i4x4_mode = (uint16_t*)ALIGN((intptr_t)x264_cost_i4x4_mode,64) + a->i_qp*32 + 8;
- /* 8x8 prediction selection */
- //幀內8x8(沒研究過)
- if( flags & X264_ANALYSE_I8x8 )
- {
- ALIGNED_ARRAY_32( pixel, edge,[36] );
- x264_pixel_cmp_t sa8d = (h->pixf.mbcmp[0] == h->pixf.satd[0]) ? h->pixf.sa8d[PIXEL_8x8] : h->pixf.mbcmp[PIXEL_8x8];
- int i_satd_thresh = a->i_mbrd ? COST_MAX : X264_MIN( i_satd_inter, a->i_satd_i16x16 );
-
- // FIXME some bias like in i4x4?
- int i_cost = lambda * 4; /* base predmode costs */
- h->mb.i_cbp_luma = 0;
-
- if( h->sh.i_type == SLICE_TYPE_B )
- i_cost += lambda * i_mb_b_cost_table[I_8x8];
-
- for( idx = 0;; idx++ )
- {
- int x = idx&1;
- int y = idx>>1;
- pixel *p_src_by = p_src + 8*x + 8*y*FENC_STRIDE;
- pixel *p_dst_by = p_dst + 8*x + 8*y*FDEC_STRIDE;
- int i_best = COST_MAX;
- int i_pred_mode = x264_mb_predict_intra4x4_mode( h, 4*idx );
-
- const int8_t *predict_mode = predict_8x8_mode_available( a->b_avoid_topright, h->mb.i_neighbour8[idx], idx );
- h->predict_8x8_filter( p_dst_by, edge, h->mb.i_neighbour8[idx], ALL_NEIGHBORS );
-
- if( h->pixf.intra_mbcmp_x9_8x8 && predict_mode[8] >= 0 )
- {
- /* No shortcuts here. The SSSE3 implementation of intra_mbcmp_x9 is fast enough. */
- i_best = h->pixf.intra_mbcmp_x9_8x8( p_src_by, p_dst_by, edge, cost_i4x4_mode-i_pred_mode, a->i_satd_i8x8_dir[idx] );
- i_cost += i_best & 0xffff;
- i_best >>= 16;
- a->i_predict8x8[idx] = i_best;
- if( idx == 3 || i_cost > i_satd_thresh )
- break;
- x264_macroblock_cache_intra8x8_pred( h, 2*x, 2*y, i_best );
- }
- else
- {
- if( !h->mb.b_lossless && predict_mode[5] >= 0 )
- {
- ALIGNED_ARRAY_16( int32_t, satd,[9] );
- h->pixf.intra_mbcmp_x3_8x8( p_src_by, edge, satd );
- int favor_vertical = satd[I_PRED_4x4_H] > satd[I_PRED_4x4_V];
- satd[i_pred_mode] -= 3 * lambda;
- for( int i = 2; i >= 0; i-- )
- {
- int cost = satd[i];
- a->i_satd_i8x8_dir[idx][i] = cost + 4 * lambda;
- COPY2_IF_LT( i_best, cost, a->i_predict8x8[idx], i );
- }
-
- /* Take analysis shortcuts: don't analyse modes that are too
- * far away direction-wise from the favored mode. */
- if( a->i_mbrd < 1 + a->b_fast_intra )
- predict_mode = intra_analysis_shortcut[a->b_avoid_topright][predict_mode[8] >= 0][favor_vertical];
- else
- predict_mode += 3;
- }
-
- for( ; *predict_mode >= 0 && (i_best >= 0 || a->i_mbrd >= 2); predict_mode++ )
- {
- int i_satd;
- int i_mode = *predict_mode;
-
- if( h->mb.b_lossless )
- x264_predict_lossless_8x8( h, p_dst_by, 0, idx, i_mode, edge );
- else
- h->predict_8x8[i_mode]( p_dst_by, edge );
-
- i_satd = sa8d( p_dst_by, FDEC_STRIDE, p_src_by, FENC_STRIDE );
- if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) )
- i_satd -= 3 * lambda;
-
- COPY2_IF_LT( i_best, i_satd, a->i_predict8x8[idx], i_mode );
- a->i_satd_i8x8_dir[idx][i_mode] = i_satd + 4 * lambda;
- }
- i_cost += i_best + 3*lambda;
-
- if( idx == 3 || i_cost > i_satd_thresh )
- break;
- if( h->mb.b_lossless )
- x264_predict_lossless_8x8( h, p_dst_by, 0, idx, a->i_predict8x8[idx], edge );
- else
- h->predict_8x8[a->i_predict8x8[idx]]( p_dst_by, edge );
- x264_macroblock_cache_intra8x8_pred( h, 2*x, 2*y, a->i_predict8x8[idx] );
- }
- /* we need to encode this block now (for next ones) */
- x264_mb_encode_i8x8( h, 0, idx, a->i_qp, a->i_predict8x8[idx], edge, 0 );
- }
-
- if( idx == 3 )
- {
- a->i_satd_i8x8 = i_cost;
- if( h->mb.i_skip_intra )
- {
- h->mc.copy[PIXEL_16x16]( h->mb.pic.i8x8_fdec_buf, 16, p_dst, FDEC_STRIDE, 16 );
- h->mb.pic.i8x8_nnz_buf[0] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 0]] );
- h->mb.pic.i8x8_nnz_buf[1] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 2]] );
- h->mb.pic.i8x8_nnz_buf[2] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 8]] );
- h->mb.pic.i8x8_nnz_buf[3] = M32( &h->mb.cache.non_zero_count[x264_scan8[10]] );
- h->mb.pic.i8x8_cbp = h->mb.i_cbp_luma;
- if( h->mb.i_skip_intra == 2 )
- h->mc.memcpy_aligned( h->mb.pic.i8x8_dct_buf, h->dct.luma8x8, sizeof(h->mb.pic.i8x8_dct_buf) );
- }
- }
- else
- {
- static const uint16_t cost_div_fix8[3] = {1024,512,341};
- a->i_satd_i8x8 = COST_MAX;
- i_cost = (i_cost * cost_div_fix8[idx]) >> 8;
- }
- /* Not heavily tuned */
- static const uint8_t i8x8_thresh[11] = { 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6 };
- if( a->b_early_terminate && X264_MIN(i_cost, a->i_satd_i16x16) > (i_satd_inter*i8x8_thresh[h->mb.i_subpel_refine])>>2 )
- return;
- }
-
- /* 4x4 prediction selection */
- //幀內4x4
- if( flags & X264_ANALYSE_I4x4 )
- {
- /*
- * 16x16 宏塊被劃分爲16個4x4子塊
- *
- * +----+----+----+----+
- * | | | | |
- * +----+----+----+----+
- * | | | | |
- * +----+----+----+----+
- * | | | | |
- * +----+----+----+----+
- * | | | | |
- * +----+----+----+----+
- *
- */
- int i_cost = lambda * (24+16); /* 24from JVT (SATD0), 16 from base predmode costs */
- int i_satd_thresh = a->b_early_terminate ? X264_MIN3( i_satd_inter, a->i_satd_i16x16, a->i_satd_i8x8 ) : COST_MAX;
- h->mb.i_cbp_luma = 0;
-
- if( a->b_early_terminate && a->i_mbrd )
- i_satd_thresh = i_satd_thresh * (10-a->b_fast_intra)/8;
-
- if( h->sh.i_type == SLICE_TYPE_B )
- i_cost += lambda * i_mb_b_cost_table[I_4x4];
- //循環所有的4x4塊
- for( idx = 0;; idx++ )
- {
- //編碼幀中的像素
- //block_idx_xy_fenc[]記錄了4x4小塊在p_fenc中的偏移地址
- pixel *p_src_by = p_src + block_idx_xy_fenc[idx];
- //重建幀中的像素
- //block_idx_xy_fdec[]記錄了4x4小塊在p_fdec中的偏移地址
- pixel *p_dst_by = p_dst + block_idx_xy_fdec[idx];
-
- int i_best = COST_MAX;
- int i_pred_mode = x264_mb_predict_intra4x4_mode( h, idx );
- //獲得可用的幀內預測模式-針對幀內4x4
- //左側是否有可用數據?上方是否有可用數據?
- const int8_t *predict_mode = predict_4x4_mode_available( a->b_avoid_topright, h->mb.i_neighbour4[idx], idx );
-
- if( (h->mb.i_neighbour4[idx] & (MB_TOPRIGHT|MB_TOP)) == MB_TOP )
- /* emulate missing topright samples */
- MPIXEL_X4( &p_dst_by[4 - FDEC_STRIDE] ) = PIXEL_SPLAT_X4( p_dst_by[3 - FDEC_STRIDE] );
-
- if( h->pixf.intra_mbcmp_x9_4x4 && predict_mode[8] >= 0 )
- {
- /* No shortcuts here. The SSSE3 implementation of intra_mbcmp_x9 is fast enough. */
- i_best = h->pixf.intra_mbcmp_x9_4x4( p_src_by, p_dst_by, cost_i4x4_mode-i_pred_mode );
- i_cost += i_best & 0xffff;
- i_best >>= 16;
- a->i_predict4x4[idx] = i_best;
- if( i_cost > i_satd_thresh || idx == 15 )
- break;
- h->mb.cache.intra4x4_pred_mode[x264_scan8[idx]] = i_best;
- }
- else
- {
- if( !h->mb.b_lossless && predict_mode[5] >= 0 )
- {
- ALIGNED_ARRAY_16( int32_t, satd,[9] );
-
- h->pixf.intra_mbcmp_x3_4x4( p_src_by, p_dst_by, satd );
- int favor_vertical = satd[I_PRED_4x4_H] > satd[I_PRED_4x4_V];
- satd[i_pred_mode] -= 3 * lambda;
- i_best = satd[I_PRED_4x4_DC]; a->i_predict4x4[idx] = I_PRED_4x4_DC;
- COPY2_IF_LT( i_best, satd[I_PRED_4x4_H], a->i_predict4x4[idx], I_PRED_4x4_H );
- COPY2_IF_LT( i_best, satd[I_PRED_4x4_V], a->i_predict4x4[idx], I_PRED_4x4_V );
-
- /* Take analysis shortcuts: don't analyse modes that are too
- * far away direction-wise from the favored mode. */
- if( a->i_mbrd < 1 + a->b_fast_intra )
- predict_mode = intra_analysis_shortcut[a->b_avoid_topright][predict_mode[8] >= 0][favor_vertical];
- else
- predict_mode += 3;
- }
-
- if( i_best > 0 )
- {
- //遍歷所有Intra4x4幀內模式,最多9種
- for( ; *predict_mode >= 0; predict_mode++ )
- {
- int i_satd;
- int i_mode = *predict_mode;
- /*
- * 4x4幀內預測舉例
- *
- * Vertical預測方式
- * |X1 X2 X3 X4
- * --+-----------
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- *
- * Horizontal預測方式
- * |
- * --+-----------
- * X5|X5 X5 X5 X5
- * X6|X6 X6 X6 X6
- * X7|X7 X7 X7 X7
- * X8|X8 X8 X8 X8
- *
- * DC預測方式
- * |X1 X2 X3 X4
- * --+-----------
- * X5|
- * X6| Y
- * X7|
- * X8|
- *
- * Y=(X1+X2+X3+X4+X5+X6+X7+X8)/8
- *
- */
- if( h->mb.b_lossless )
- x264_predict_lossless_4x4( h, p_dst_by, 0, idx, i_mode );
- else
- h->predict_4x4[i_mode]( p_dst_by );//幀內預測彙編函數-存儲在重建幀中
-
- //計算SAD或者是SATD(SATD(Transformed)是經過Hadamard變換之後的SAD)
- //即編碼代價
- //p_src_by編碼幀,p_dst_by重建幀
- i_satd = h->pixf.mbcmp[PIXEL_4x4]( p_dst_by, FDEC_STRIDE, p_src_by, FENC_STRIDE );
- if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) )
- {
- i_satd -= lambda * 3;
- if( i_satd <= 0 )
- {
- i_best = i_satd;
- a->i_predict4x4[idx] = i_mode;
- break;
- }
- }
- //COPY2_IF_LT()函數的意思是「copy if little」。即如果值更小(代價更小),就拷貝。
- //宏定義展開後如下所示
- //if((i_satd)<(i_best))
- //{
- // (i_best)=(i_satd);
- // (a->i_predict4x4[idx])=(i_mode);
- //}
-
- //看看代價是否更小
- //i_best中存儲了最小的代價值
- //i_predict4x4[idx]中存儲了代價最小的預測模式(idx爲4x4小塊的序號)
- COPY2_IF_LT( i_best, i_satd, a->i_predict4x4[idx], i_mode );
- }
- }
- //累加各個4x4塊的代價(累加每個塊的最小代價)
- i_cost += i_best + 3 * lambda;
- if( i_cost > i_satd_thresh || idx == 15 )
- break;
- if( h->mb.b_lossless )
- x264_predict_lossless_4x4( h, p_dst_by, 0, idx, a->i_predict4x4[idx] );
- else
- h->predict_4x4[a->i_predict4x4[idx]]( p_dst_by );
-
- /*
- * 將mode填充至intra4x4_pred_mode_cache
- *
- * 用簡單圖形表示intra4x4_pred_mode_cache如下。數字代表填充順序(一共填充16次)
- * |
- * --+-------------------
- * | 0 0 0 0 0 0 0 0
- * | 0 0 0 0 1 2 5 6
- * | 0 0 0 0 3 4 7 8
- * | 0 0 0 0 9 10 13 14
- * | 0 0 0 0 11 12 15 16
- *
- */
- h->mb.cache.intra4x4_pred_mode[x264_scan8[idx]] = a->i_predict4x4[idx];
- }
- /* we need to encode this block now (for next ones) */
- x264_mb_encode_i4x4( h, 0, idx, a->i_qp, a->i_predict4x4[idx], 0 );
- }
- if( idx == 15 )//處理最後一個4x4小塊(一共16個塊)
- {
- //開銷(累加完的)
- a->i_satd_i4x4 = i_cost;
- if( h->mb.i_skip_intra )
- {
- h->mc.copy[PIXEL_16x16]( h->mb.pic.i4x4_fdec_buf, 16, p_dst, FDEC_STRIDE, 16 );
- h->mb.pic.i4x4_nnz_buf[0] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 0]] );
- h->mb.pic.i4x4_nnz_buf[1] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 2]] );
- h->mb.pic.i4x4_nnz_buf[2] = M32( &h->mb.cache.non_zero_count[x264_scan8[ 8]] );
- h->mb.pic.i4x4_nnz_buf[3] = M32( &h->mb.cache.non_zero_count[x264_scan8[10]] );
- h->mb.pic.i4x4_cbp = h->mb.i_cbp_luma;
- if( h->mb.i_skip_intra == 2 )
- h->mc.memcpy_aligned( h->mb.pic.i4x4_dct_buf, h->dct.luma4x4, sizeof(h->mb.pic.i4x4_dct_buf) );
- }
- }
- else
- a->i_satd_i4x4 = COST_MAX;
- }
- }
總體說來x264_mb_analyse_intra()通過計算Intra16x16,Intra8x8(暫時沒有研究),Intra4x4這3中幀內預測模式的代價,比較後得到最佳的幀內預測模式。該函數的等流程大致如下:
(1)進行Intra16X16模式的預測
a)調用predict_16x16_mode_available()根據周圍宏塊的情況判斷其可用的預測模式(主要檢查左邊和上邊的塊是否可用)。
b)循環計算4種Intra16x16幀內預測模式:
i.調用predict_16x16[]()彙編函數進行Intra16x16幀內預測
ii.調用x264_pixel_function_t中的mbcmp[]()計算編碼代價(mbcmp[]()指向SAD或者SATD彙編函數)。
c)獲取最小代價的Intra16x16模式。
(2)進行Intra8x8模式的預測(未研究,流程應該類似)
(3)進行Intra4X4塊模式的預測
a)循環處理16個4x4的塊:
i.調用x264_mb_predict_intra4x4_mode()根據周圍宏塊情況判斷該塊可用的預測模式。
ii.循環計算9種Intra4x4的幀內預測模式:
1)調用predict_4x4 []()彙編函數進行Intra4x4幀內預測
2)調用x264_pixel_function_t中的mbcmp[]()計算編碼代價(mbcmp[]()指向SAD或者SATD彙編函數)。
iii.獲取最小代價的Intra4x4模式。
b)將16個4X4塊的最小代價相加,得到總代價。
(4)將上述3中模式的代價進行對比,取最小者爲當前宏塊的幀內預測模式。
後文將會對其中涉及到的幾種彙編函數進行分析。在看源代碼之前,簡單記錄一下相關的知識。
幀內預測知識
簡單記錄一下幀內預測的方法。幀內預測根據宏塊左邊和上邊的邊界像素值推算宏塊內部的像素值,幀內預測的效果如下圖所示。其中左邊的圖爲圖像原始畫面,右邊的圖爲經過幀內預測後沒有疊加殘差的畫面。
H.264中有兩種幀內預測模式:16x16亮度幀內預測模式和4x4亮度幀內預測模式。其中16x16幀內預測模式一共有4種,如下圖所示。
這4種模式列表如下。
模式 |
描述 |
Vertical |
由上邊像素推出相應像素值 |
Horizontal |
由左邊像素推出相應像素值 |
DC |
由上邊和左邊像素平均值推出相應像素值 |
Plane |
由上邊和左邊像素推出相應像素值 |
4x4幀內預測模式一共有9種,如下圖所示。
可以看出,Intra4x4幀內預測模式中前4種和Intra16x16是一樣的。後面多增加了幾種預測箭頭不是45度角的方式——前面的箭頭位於「口」中,而後面的箭頭位於「日」中。
像素比較知識
幀內預測代價計算的過程中涉及到SAD和SATD像素計算,簡單記錄幾個相關的概念。有關SAD、SATD、SSD的定義如下:
SAD(Sum of Absolute Difference)也可以稱爲SAE(Sum of Absolute Error),即絕對誤差和。它的計算方法就是求出兩個像素塊對應像素點的差值,將這些差值分別求絕對值之後再進行累加。
SATD(Sum of Absolute Transformed Difference)即Hadamard變換後再絕對值求和。它和SAD的區別在於多了一個「變換」。
SSD(Sum of Squared Difference)也可以稱爲SSE(Sum of Squared Error),即差值的平方和。它和SAD的區別在於多了一個「平方」。
H.264中使用SAD和SATD進行宏塊預測模式的判斷。早期的編碼器使用SAD進行計算,近期的編碼器多使用SATD進行計算。爲什麼使用SATD而不使用SAD呢?關鍵原因在於編碼之後碼流的大小是和圖像塊DCT變換後頻域信息緊密相關的,而和變換前的時域信息關聯性小一些。SAD只能反應時域信息;SATD卻可以反映頻域信息,而且計算複雜度也低於DCT變換,因此是比較合適的模式選擇的依據。
使用SAD進行模式選擇的示例如下所示。下面這張圖代表了一個普通的Intra16x16的宏塊的像素。它的下方包含了使用Vertical,Horizontal,DC和Plane四種幀內預測模式預測的像素。通過計算可以得到這幾種預測像素和原始像素之間的SAD(SAE)分別爲3985,5097,4991,2539。由於Plane模式的SAD取值最小,由此可以斷定Plane模式對於這個宏塊來說是最好的幀內預測模式。
下面按照Intra16x16預測,Intra4x4預測,像素計算的順序記錄依次記錄各個模塊的彙編函數源代碼。
Intra16x16幀內預測源代碼
Intra16x16幀內預測模塊的初始化函數是x264_predict_16x16_init()。該函數對x264_predict_t結構體中的函數指針進行了賦值。X264運行的過程中只要調用x264_predict_t的函數指針就可以完成相應的功能。
x264_predict_16x16_init()
x264_predict_16x16_init()用於初始化Intra16x16幀內預測彙編函數。該函數的定義位於x264\common\predict.c,如下所示。
- //Intra16x16幀內預測彙編函數初始化
- void x264_predict_16x16_init( int cpu, x264_predict_t pf[7] )
- {
- //C語言版本
- //================================================
- //垂直 Vertical
- pf[I_PRED_16x16_V ] = x264_predict_16x16_v_c;
- //水平 Horizontal
- pf[I_PRED_16x16_H ] = x264_predict_16x16_h_c;
- //DC
- pf[I_PRED_16x16_DC] = x264_predict_16x16_dc_c;
- //Plane
- pf[I_PRED_16x16_P ] = x264_predict_16x16_p_c;
- //這幾種是啥?
- pf[I_PRED_16x16_DC_LEFT]= x264_predict_16x16_dc_left_c;
- pf[I_PRED_16x16_DC_TOP ]= x264_predict_16x16_dc_top_c;
- pf[I_PRED_16x16_DC_128 ]= x264_predict_16x16_dc_128_c;
- //================================================
- //MMX版本
- #if HAVE_MMX
- x264_predict_16x16_init_mmx( cpu, pf );
- #endif
- //ALTIVEC版本
- #if HAVE_ALTIVEC
- if( cpu&X264_CPU_ALTIVEC )
- x264_predict_16x16_init_altivec( pf );
- #endif
- //ARMV6版本
- #if HAVE_ARMV6
- x264_predict_16x16_init_arm( cpu, pf );
- #endif
- //AARCH64版本
- #if ARCH_AARCH64
- x264_predict_16x16_init_aarch64( cpu, pf );
- #endif
- }
從源代碼可看出,x264_predict_16x16_init()首先對幀內預測函數指針數組x264_predict_t[]中的元素賦值了C語言版本的函數x264_predict_16x16_v_c(),x264_predict_16x16_h_c(),x264_predict_16x16_dc_c(),x264_predict_16x16_p_c();然後會判斷系統平臺的特性,如果平臺支持的話,會調用x264_predict_16x16_init_mmx(),x264_predict_16x16_init_arm()等給x264_predict_t[]中的元素賦值經過彙編優化的函數。下文首先看一下Intra16x16中的4種幀內預測模式的C語言版本,作爲對比再看一下Intra16x16中Vertical模式的X86彙編版本和NEON彙編版本。
x264_predict_16x16_v_c()
x264_predict_16x16_v_c()是Intra16x16幀內預測Vertical模式的C語言版本函數。該函數的定義位於common\predict.c,如下所示。
- //16x16幀內預測
- //垂直預測(Vertical)
- void x264_predict_16x16_v_c( pixel *src )
- {
- /*
- * Vertical預測方式
- * |X1 X2 X3 X4
- * --+-----------
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- * |X1 X2 X3 X4
- *
- */
- /*
- * 【展開宏定義】
- * uint32_t v0 = ((x264_union32_t*)(&src[ 0-FDEC_STRIDE]))->i;
- * uint32_t v1 = ((x264_union32_t*)(&src[ 4-FDEC_STRIDE]))->i;
- * uint32_t v2 = ((x264_union32_t*)(&src[ 8-FDEC_STRIDE]))->i;
- * uint32_t v3 = ((x264_union32_t*)(&src[12-FDEC_STRIDE]))->i;
- * 在這裏,上述代碼實際上相當於:
- * uint32_t v0 = *((uint32_t*)(&src[ 0-FDEC_STRIDE]));
- * uint32_t v1 = *((uint32_t*)(&src[ 4-FDEC_STRIDE]));
- * uint32_t v2 = *((uint32_t*)(&src[ 8-FDEC_STRIDE]));
- * uint32_t v3 = *((uint32_t*)(&src[12-FDEC_STRIDE]));
- * 即分成4次,每次取出4個像素(一共16個像素),分別賦值給v0,v1,v2,v3
- * 取出的值源自於16x16塊上面的一行像素
- * 0| 4 8 12 16
- * || v0 | v1 | v2 | v3 |
- * ---++==========+==========+==========+==========+
- * ||
- * ||
- * ||
- * ||
- * ||
- * ||
- *
- */
- //pixel4實際上是uint32_t(佔用32bit),存儲4個像素的值(每個像素佔用8bit)
-
- pixel4 v0 = MPIXEL_X4( &src[ 0-FDEC_STRIDE] );
- pixel4 v1 = MPIXEL_X4( &src[ 4-FDEC_STRIDE] );
- pixel4 v2 = MPIXEL_X4( &src[ 8-FDEC_STRIDE] );
- pixel4 v3 = MPIXEL_X4( &src[12-FDEC_STRIDE] );
-
- //循環賦值16行
- for( int i = 0; i < 16; i++ )
- {
- //【展開宏定義】
- //(((x264_union32_t*)(src+ 0))->i) = v0;
- //(((x264_union32_t*)(src+ 4))->i) = v1;
- //(((x264_union32_t*)(src+ 8))->i) = v2;
- //(((x264_union32_t*)(src+12))->i) = v3;
- //即分成4次,每次賦值4個像素
- //
- MPIXEL_X4( src+ 0 ) = v0;
- MPIXEL_X4( src+ 4 ) = v1;
- MPIXEL_X4( src+ 8 ) = v2;
- MPIXEL_X4( src+12 ) = v3;
- //下一行
- //FDEC_STRIDE=32,是重建宏塊緩存fdec_buf一行的數據量
- src += FDEC_STRIDE;
- }
- }
從源代碼可以看出,x264_predict_16x16_v_c()首先取出16x16塊上面一行像素值,依次存儲在v0、v1、v2、v3,然後循環16次賦值給塊中的16行像素。
x264_predict_16x16_h_c()
x264_predict_16x16_h_c()是Intra16x16幀內預測Horizontal模式的C語言版本函數。該函數的定義位於common\predict.c,如下所示。
- //16x16幀內預測
- //水平預測(Horizontal)
- void x264_predict_16x16_h_c( pixel *src )
- {
- /*
- * Horizontal預測方式
- * |
- * --+-----------
- * X5|X5 X5 X5 X5
- * X6|X6 X6 X6 X6
- * X7|X7 X7 X7 X7
- * X8|X8 X8 X8 X8
- *
- */
- /*
- * const pixel4 v = PIXEL_SPLAT_X4( src[-1] );
- * 宏定義展開後
- * const uint32_t v = (src[-1])*0x01010101U;
- *
- * PIXEL_SPLAT_X4()的作用應該是把最後一個像素(最後8位)拷貝給前面3個像素(前24位)
- * 即把0x0100009F變成0x9F9F9F9F
- * 推導:
- * 前提是x佔8bit(對應1個像素)
- * y=x*0x01010101
- * =x*(0x00000001+0x00000100+0x00010000+0x01000000)
- * =x<<0+x<<8+x<<16+x<<24
- *
- * const uint32_t v = (src[-1])*0x01010101U含義:
- * 每行把src[-1]中像素值例如0x02賦值給v.v取值爲0x02020202
- * src[-1]即16x16塊左側的值
- */
- //循環賦值16行
- for( int i = 0; i < 16; i++ )
- {
- const pixel4 v = PIXEL_SPLAT_X4( src[-1] );
- //宏定義展開後:
- //((x264_union32_t*)(src+ 0))->i=v;
- //((x264_union32_t*)(src+ 4))->i=v;
- //((x264_union32_t*)(src+ 8))->i=v;
- //((x264_union32_t*)(src+12))->i=v;
- //即分4次,每次賦值4個像素(一行一共16個像素,取值是一樣的)
- //
- // 0| 4 8 12 16
- // || | | | |
- //---++==========+==========+==========+==========+
- // ||
- // v || v | v | v | v |
- // ||
- // ||
- // ||
- //
- MPIXEL_X4( src+ 0 ) = v;
- MPIXEL_X4( src+ 4 ) = v;
- MPIXEL_X4( src+ 8 ) = v;
- MPIXEL_X4( src+12 ) = v;
- //下一行
- //FDEC_STRIDE=32,是重建宏塊緩存fdec_buf一行的數據量
- src += FDEC_STRIDE;
- }
- }
從源代碼可以看出,x264_predict_16x16_h_c()首先取出16x16塊每行左邊的1個像素,複製4份後存儲在v中,然後分成4次將v賦值給這一行像素。其中「PIXEL_SPLAT_X4()」的功能是取出變量低8位的數值複製4份到高24位,相關的推導功能已經記錄在源代碼中,不再重複敘述。
x264_predict_16x16_dc_c()
x264_predict_16x16_dc_c()是Intra16x16幀內預測DC模式的C語言版本函數。該函數的定義位於common\predict.c,如下所示。
- #define PREDICT_16x16_DC(v)\
- for( int i = 0; i < 16; i++ )\
- {\
- MPIXEL_X4( src+ 0 ) = v;\
- MPIXEL_X4( src+ 4 ) = v;\
- MPIXEL_X4( src+ 8 ) = v;\
- MPIXEL_X4( src+12 ) = v;\
- src += FDEC_STRIDE;\
- }
-
- void x264_predict_16x16_dc_c( pixel *src )
- {
- /*
- * DC預測方式
- * |X1 X2 X3 X4
- * --+-----------
- * X5|
- * X6| Y
- * X7|
- * X8|
- *
- * Y=(X1+X2+X3+X4+X5+X6+X7+X8)/8
- */
-
- int dc = 0;
- //把16x16塊中所有像素的值加起來,存儲在dc中
- for( int i = 0; i < 16; i++ )
- {
- //左側的值
- dc += src[-1 + i * FDEC_STRIDE];
- //上方的值
- dc += src[i - FDEC_STRIDE];
- }
- //加起來的值除以32(一共16+16個點)
- //「+16」是爲了四捨五入?
- //PIXEL_SPLAT_X4()的作用應該是把最後一個像素(最後8位)拷貝給前面3個像素(前24位)
- //即把0x0100009F變成0x9F9F9F9F
- pixel4 dcsplat = PIXEL_SPLAT_X4( ( dc + 16 ) >> 5 );
- //賦值到16x16塊中的每個像素
- /*
- * 宏展開之後結果
- * for( int i = 0; i < 16; i++ )
- * {
- * (((x264_union32_t*)(src+ 0))->i) = dcsplat;
- * (((x264_union32_t*)(src+ 4))->i) = dcsplat;
- * (((x264_union32_t*)(src+ 8))->i) = dcsplat;
- * (((x264_union32_t*)(src+12))->i) = dcsplat;
- * src += 32;
- * }
- */