ffmpeg實現dxva2硬件加速

時間 2019-12-08

標籤 ffmpeg 實現 dxva2 dxva 硬件加速简体版

原文原文鏈接

這幾天在作dxva2硬件加速，找不到什麼資料，翻譯了一下微軟的兩篇相關文檔。這是第二篇，記錄用ffmpeg實現dxva2。html

第一篇翻譯的Direct3D device manager，連接：http://www.cnblogs.com/betterwgo/p/6124588.htmlredis

　　第二篇翻譯的在DirectShow中支持DXVA 2.0，連接：http://www.cnblogs.com/betterwgo/p/6125351.html
　　在作dxva2的過程當中，參考了許多網上的代碼，這些代碼又多參考VLC和ffmpeg的例子。windows

1.ffmpeg支持dxva2硬件加速的格式
　　當前我所使用的ffmpeg的版本是3.2，支持dxva2硬件加速的有如下幾種文件格式： AV_CODEC_ID_MPEG2VIDEO、AV_CODEC_ID_H26四、AV_CODEC_ID_VC一、AV_CODEC_ID_WMV三、AV_CODEC_ID_HEVC、AV_CODEC_ID_VP9。ffmpeg識別爲這幾種格式的文件均可以嘗試使用dxva2作硬件加速。但這並不表明是這幾種格式的文件就必定支持dxva2硬件加速，由於我就遇到了一個AV_CODEC_ID_HEVC文件在初始化配置dxva2的過程當中會失敗，PotPlayer在播放這個文件時也不能用dxva2硬件加速。
2.一些要注意的地方
　　（1）ffmpeg只實現了dxva2硬件解碼的內容。我所翻譯的第一篇、第二篇文章的那部份內容除了解碼部分，都要由用戶本身去實現。這一塊很有一點複雜，不過不用擔憂，VLC和ffmpeg都有例子能夠參考。這一部分的內容須要對以上兩篇翻譯的內容有所瞭解才能比較好的理解代碼的邏輯。
　　（2）要想真正看到硬件加速的效果，解碼後的數據不建議再copy到內存中用CPU進行處理。我一開始就是由於拷貝到吧解碼後的數據又copy回內存致使不只gpu的使用率看不到明顯變化，並且CPU的使用率相對於不使用dxva2反而提升了。後來我修改成把解碼後的數據直接顯示出來，GPU使用率一會兒就上去了，CPU使用率也降下來了。
3.關鍵代碼
　　因爲網上已經有從ffmpeg的例子中分離出來的配置dxva2解碼器的代碼，因此具體實現起來也至關簡單。
（1）頭文件ffmpeg_dxva2.happ

/*
 * This file is part of FFmpeg.
 *
 * FFmpeg is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * FFmpeg is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with FFmpeg; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */

#ifndef FFMPEG_DXVA2_H
#define FFMPEG_DXVA2_H

//#include "windows.h"

extern "C"{
#include "libavcodec/avcodec.h"
#include "libavutil/pixfmt.h"
#include "libavutil/rational.h"
}

enum HWAccelID {
    HWACCEL_NONE = 0,
    HWACCEL_AUTO,
    HWACCEL_VDPAU,
    HWACCEL_DXVA2,
    HWACCEL_VDA,
    HWACCEL_VIDEOTOOLBOX,
    HWACCEL_QSV,
};

typedef struct AVStream AVStream;
typedef struct AVCodecContext AVCodecContext;
typedef struct AVCodec AVCodec;
typedef struct AVFrame AVFrame;
typedef struct AVDictionary AVDictionary;

typedef struct InputStream {
    int file_index;
    AVStream *st;
    int discard;             /* true if stream data should be discarded */
    int user_set_discard;
    int decoding_needed;     /* non zero if the packets must be decoded in 'raw_fifo', see DECODING_FOR_* */
#define DECODING_FOR_OST    1
#define DECODING_FOR_FILTER 2

    AVCodecContext *dec_ctx;
    AVCodec *dec;
    AVFrame *decoded_frame;
    AVFrame *filter_frame; /* a ref of decoded_frame, to be sent to filters */

    int64_t       start;     /* time when read started */
    /* predicted dts of the next packet read for this stream or (when there are
        * several frames in a packet) of the next frame in current packet (in AV_TIME_BASE units) */
    int64_t       next_dts;
    int64_t       dts;       ///< dts of the last packet read for this stream (in AV_TIME_BASE units)

    int64_t       next_pts;  ///< synthetic pts for the next decode frame (in AV_TIME_BASE units)
    int64_t       pts;       ///< current pts of the decoded frame  (in AV_TIME_BASE units)
    int           wrap_correction_done;

    int64_t filter_in_rescale_delta_last;

    int64_t min_pts; /* pts with the smallest value in a current stream */
    int64_t max_pts; /* pts with the higher value in a current stream */
    int64_t nb_samples; /* number of samples in the last decoded audio frame before looping */

    double ts_scale;
    int saw_first_ts;
    int showed_multi_packet_warning;
    AVDictionary *decoder_opts;
    AVRational framerate;               /* framerate forced with -r */
    int top_field_first;
    int guess_layout_max;

    int autorotate;
    int resample_height;
    int resample_width;
    int resample_pix_fmt;

    int      resample_sample_fmt;
    int      resample_sample_rate;
    int      resample_channels;
    uint64_t resample_channel_layout;

    int fix_sub_duration;
    struct { /* previous decoded subtitle and related variables */
        int got_output;
        int ret;
        AVSubtitle subtitle;
    } prev_sub;

    struct sub2video {
        int64_t last_pts;
        int64_t end_pts;
        AVFrame *frame;
        int w, h;
    } sub2video;

    int dr1;

    /* decoded data from this stream goes into all those filters
        * currently video and audio only */
    //InputFilter **filters;
    //int        nb_filters;

    //int reinit_filters;

    /* hwaccel options */
    enum HWAccelID hwaccel_id;
    char  *hwaccel_device;

    /* hwaccel context */
    enum HWAccelID active_hwaccel_id;
    void  *hwaccel_ctx;
    void(*hwaccel_uninit)(AVCodecContext *s);
    int(*hwaccel_get_buffer)(AVCodecContext *s, AVFrame *frame, int flags);
    int(*hwaccel_retrieve_data)(AVCodecContext *s, AVFrame *frame);
    enum AVPixelFormat hwaccel_pix_fmt;
    enum AVPixelFormat hwaccel_retrieved_pix_fmt;

    /* stats */
    // combined size of all the packets read
    uint64_t data_size;
    /* number of packets successfully read for this stream */
    uint64_t nb_packets;
    // number of frames/samples retrieved from the decoder
    uint64_t frames_decoded;
    uint64_t samples_decoded;
} InputStream;


int dxva2_init(AVCodecContext *s, HWND hwnd);
int dxva2_retrieve_data_call(AVCodecContext *s, AVFrame *frame);

#endif /* FFMPEG_DXVA2_H */

　　以上代碼實際上是從ffmpeg中抽出來的。HWAccelID爲硬件加速器的ID，在初始化配置解碼器的時候會用到，咱們實際用的是HWACCEL_DXVA2。InputStream這個結構體水很深，包含了一些在初始化配置中會用到的數據，還包含了一些函數指針，注意這些函數指針的使用。我要說的實際上是如下兩個函數：ide

int dxva2_init(AVCodecContext *s, HWND hwnd);
int dxva2_retrieve_data_call(AVCodecContext *s, AVFrame *frame);

　　函數dxva2_init是初始化配置dxva2解碼器的入口，配置工做主要就是由它來完成。在文章最後我會上傳整個工程的源碼。前兩篇翻譯的文章的內容幾乎都是爲它服務的，我上傳的源碼中的ffmpeg_dxva2.cpp主要就是爲了作這一部分工做，固然dxva2_retrieve_data_call也包含在了其中。要想看懂dxva2_init函數的邏輯，你最好看看前面兩篇翻譯的內容，另外你還須要懂一點D3D渲染的基本知識。
　　函數dxva2_retrieve_data_call用來得到解碼後的數據的。如我前面所說，若是沒必要要，最後不要再把它copy出來，直接用D3D繪製出來就好了，把數據從GPU再copy到內存中會極大的下降GPU的使用率，在個人試驗中這樣作徹底沒達到GPU加速的目的，反而是CPU的使用率增高了。因此你在我上傳的源碼中看到的是直接繪製數據。函數

static int dxva2_retrieve_data(AVCodecContext *s, AVFrame *frame)
{
    LPDIRECT3DSURFACE9 surface = (LPDIRECT3DSURFACE9)frame->data[3];
    InputStream        *ist = (InputStream *)s->opaque;
    DXVA2Context       *ctx = (DXVA2Context *)ist->hwaccel_ctx;

    EnterCriticalSection(&cs);
    //直接渲染
    ctx->d3d9device->Clear(0, NULL, D3DCLEAR_TARGET, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0);
    ctx->d3d9device->BeginScene();
    if (m_pBackBuffer)
    {
        m_pBackBuffer->Release();
        m_pBackBuffer = NULL;
    }
    ctx->d3d9device->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &m_pBackBuffer);
    GetClientRect(d3dpp.hDeviceWindow, &m_rtViewport);
    ctx->d3d9device->StretchRect(surface, NULL, m_pBackBuffer, &m_rtViewport, D3DTEXF_LINEAR);
    ctx->d3d9device->EndScene();
    ctx->d3d9device->Present(NULL, NULL, NULL, NULL);

    LeaveCriticalSection(&cs);

    return 0;
}

（2）實現
　　有了ffmpeg_dxva2.h和ffmpeg_dxva2.cpp這兩個文件後實現起來就很是簡單了。
　主流程中配置dxva2部分的代碼：oop

switch (codec->id)
  {
  case AV_CODEC_ID_MPEG2VIDEO:
  case AV_CODEC_ID_H264:
  case AV_CODEC_ID_VC1:
  case AV_CODEC_ID_WMV3:
  case AV_CODEC_ID_HEVC:
  case AV_CODEC_ID_VP9:
    {
        codecctx->thread_count = 1;  // Multithreading is apparently not compatible with hardware decoding
        InputStream *ist = new InputStream();
        ist->hwaccel_id = HWACCEL_AUTO;
        ist->active_hwaccel_id = HWACCEL_AUTO;
        ist->hwaccel_device = "dxva2";
        ist->dec = codec;
        ist->dec_ctx = codecctx;

        codecctx->opaque = ist;
        if (dxva2_init(codecctx, hWnd) == 0)
        {
            codecctx->get_buffer2 = ist->hwaccel_get_buffer;
            codecctx->get_format = GetHwFormat;
            codecctx->thread_safe_callbacks = 1;

            break;
        }

        bAccel = false;
        break;
    }
  default:
      bAccel = false;
      break;
  }

能夠看出其中主要就是調用dxva2_init函數。
解碼並渲染的代碼：ui

if (pkt.stream_index == videoindex)
 {
  int got_picture = 0;

  DWORD t_start = GetTickCount();
  int bytes_used = avcodec_decode_video2(codecctx, picture, &got_picture, &pkt);
  if (got_picture)
  {
      if (bAccel)
      {
          //獲取數據同時渲染
          dxva2_retrieve_data_call(codecctx, picture);

          DWORD t_end = GetTickCount();
          printf("dxva2 time using: %lu\n", t_end - t_start);
      }
      else
      {
          //非dxva2情形
          if (img_convert_ctx &&pFrameBGR && out_buffer)
          {
              //轉換數據並渲染
              sws_scale(img_convert_ctx, (const uint8_t* const*)picture->data, picture->linesize, 0, codecctx->height, pFrameBGR->data, pFrameBGR->linesize);
              m_D3DVidRender.Render_YUV(out_buffer, picture->width, picture->height);

              DWORD t_end = GetTickCount();
              printf("normal time using: %lu\n", t_end - t_start);
          }
      }

      count++;
  }

  av_packet_unref(&pkt);
 }

在dxva2_init函數中其實已經對D3D的渲染進行了配置，因此只須要穿進去窗口句柄，而後調用dxva2_retrieve_data_call函數就能夠直接把數據繪製在句柄所對應得窗口上。this

源碼：http://download.csdn.net/download/qq_33892166/9698473spa

工程基於VS2013，須要對ffmpeg有必定了解，對D3D也要有必定的瞭解。注意在代碼中修改要播放的視頻的路徑，不然控制檯退出不正常，VS會卡死的，我也是剛發現有這個問題。最後本身修改一下控制檯的代碼。直接把調出控制檯的代碼註釋掉也能夠正常運行，不過就看不到調試信息了。

--------------------------------------------------------------2017.3.5 更新---------------------------------------------

本次更新只爲解答一個問的人比較多的問題：如何在CPU佔用不變的狀況下，把硬解的數據再copy回顯卡？

個人建議：最好不要這樣作。若是你要對硬解出來的數據作進一步處理，我建議直接在顯卡上進行，GPU並行計算對於作某些圖像處理速度比CPU快好多。若是你渲染用的OpenGL,你能夠用GLSL寫shader；若是你渲染用的D3D，你能夠用HLSL寫shader；若是你渲染用的D3D11，compute shader將有可能完成你要求得更爲複雜的圖像處理（compute shader沒用過，只看了點介紹，可能不確實）；若是你能用CUDA，不要猶豫；若是你不能用CUDA，我推薦你OpenCL。若是以上都不行，我知識面也有限，也沒有什麼好辦法。由於硬解出來的數據很大，從顯存copy到內存必然很耗時間和CPU，並且從時間消耗上來看，從顯存copy回內存的時間比硬解自己所花費的時間大好多，因此依我狹窄的知識面來看，若是你必定要copy回內存，我建議你乾脆放棄硬解。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。