FFmpeg 入門(1)：截取視頻幀

時間 2019-11-17

標籤 ffmpeg 入門截取視頻简体版

原文原文鏈接

本文轉自：FFmpeg 入門(1)：截取視頻幀 | www.samirchen.comhtml

背景

在 Mac OS 上若是要運行教程中的相關代碼須要先安裝 FFmpeg，建議使用 brew 來安裝：git

// 用 brew 安裝 FFmpeg：
brew install ffmpeg

或者你能夠參考在 Mac OS 上編譯 FFmpeg使用源碼編譯和安裝 FFmpeg。github

教程原文地址：http://dranger.com/ffmpeg/tutorial01.html，本文中的代碼作過部分修正。數組

概要

媒體文件一般有一些基本的組成部分。首先，文件自己被稱爲「容器(container)」，容器的類型定義了文件的信息是如何存儲，好比，AVI、QuickTime 等容器格式。接着，你須要瞭解的概念是「流(streams)」，例如，你一般會有一路音頻流和一路視頻流。流中的數據元素被稱爲「幀(frames)」。每路流都會被相應的「編/解碼器(codec)」進行編碼或解碼（codec 這個名字就是源於 COded 和 DECoded）。codec 定義了實際數據是如何被編解碼的，好比你用到的 codecs 多是 DivX 和 MP3。「數據包(packets)」是從流中讀取的數據片斷，這些數據片斷中包含的一個個比特就是解碼後能最終被咱們的應用程序處理的原始幀數據。爲了達到咱們音視頻處理的目標，每一個數據包都包含着完整的幀，在音頻狀況下，一個數據包中可能會包含多個音頻幀。服務器

基於以上這些基礎，處理視頻流和音頻流的過程其實很簡單：數據結構

1：從 video.avi 文件中打開 video_stream。
2：從 video_stream 中讀取數據包到 frame。
3：若是數據包中的 frame 不完整，則跳到步驟 2。
4：處理 frame。
5：跳到步驟 2。

儘管在一些程序中上面步驟 4 處理 frame 的邏輯可能會很是複雜，可是在本文中的例程中，用 FFmpeg 來處理多媒體文件的部分會寫的比較簡單一些，這裏咱們將要作的就是打開一個媒體文件，讀取其中的視頻流，將視頻流中獲取到的視頻幀寫入到 PPM 文件中保存起來。app

下面咱們一步一步來實現。ide

打開媒體文件

首先，咱們來看看如何打開媒體文件。在使用 FFmpeg 時，首先須要初始化對應的 Library。函數

#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libavutil/imgutils.h>
//...

int main(int argc, char *argv[]) {

    // Register all formats and codecs.
    av_register_all();

    // ...
}

上面的代碼會註冊 FFmpeg 庫中全部可用的「視頻格式」和「codec」，這樣當使用庫打開一個媒體文件時，就能找到對應的視頻格式處理程序和 codec 來處理。須要注意的是在使用 FFmpeg 時，你只須要調用 av_register_all() 一次便可，所以咱們在 main 中調用。固然，你也能夠根據需求只註冊給定的視頻格式和 codec，但一般你不須要這麼作。ui

接下來咱們就要準備打開媒體文件了，那麼媒體文件中有哪些信息是值得注意的呢？

是否包含：音頻、視頻。
碼流的封裝格式，用於解封裝。
視頻的編碼格式，用於初始化視頻解碼器
音頻的編碼格式，用於初始化音頻解碼器。
視頻的分辨率、幀率、碼率，用於視頻的渲染。
音頻的採樣率、位寬、通道數，用於初始化音頻播放器。
碼流的總時長，用於展現、拖動 Seek。
其餘 Metadata 信息，如做者、日期等，用於展現。

這些關鍵的媒體信息，被稱做 metadata，經常記錄在整個碼流的開頭或者結尾處，例如：wav 格式主要由 wav header 頭來記錄音頻的採樣率、通道數、位寬等關鍵信息；mp4 格式，則存放在 moov box 結構中；而 FLV 格式則記錄在 onMetaData 中等等。

avformat_open_input 這個函數主要負責服務器的鏈接和碼流頭部信息的拉取，咱們就用它來打開媒體文件：

AVFormatContext *pFormatCtx = NULL;

// Open video file.
if (avformat_open_input(&pFormatCtx, argv[1], NULL, NULL) != 0) {
    return -1; // Couldn't open file.
}

咱們從程序入口得到要打開文件的路徑，做爲 avformat_open_input 函數的第二個參數傳入，這個函數會讀取媒體文件的文件頭並將文件格式相關的信息存儲在咱們做爲第一個參數傳入的 AVFormatContext 數據結構中。avformat_open_input 函數的第三個參數用於指定媒體文件格式，第四個參數是文件格式相關選項。若是你後面這兩個參數傳入的是 NULL，那麼 libavformat 將自動探測文件格式。

接下來對於媒體信息的探測和分析工做就要交給 avformat_find_stream_info 函數了：

// Retrieve stream information.
if (avformat_find_stream_info(pFormatCtx, NULL) < 0) {
    return -1; // Couldn't find stream information.
}

avformat_find_stream_info 函數會爲 pFormatCtx->streams 填充對應的信息。這裏還有一個調試用的函數 av_dump_format 能夠爲咱們打印 pFormatCtx 中都有哪些信息。

// Dump information about file onto standard error.
av_dump_format(pFormatCtx, 0, argv[1], 0);

AVFormatContext 裏包含了下面這些跟媒體信息有關的成員：

struct AVInputFormat *iformat; // 記錄了封裝格式信息
unsigned int nb_streams; // 記錄了該 URL 中包含有幾路流
AVStream **streams; // 一個結構體數組，每一個對象記錄了一路流的詳細信息
int64_t start_time; // 第一幀的時間戳
int64_t duration; // 碼流的總時長
int64_t bit_rate; // 碼流的總碼率，bps
AVDictionary *metadata; // 一些文件信息頭，key/value 字符串

你拿到這些數據後，與 av_dump_format 的輸出對比可能會發現一些不一樣，這時候能夠去看看 FFmpeg 源碼中 av_dump_format 的實現，裏面對打印出來的數據是有一些處理邏輯的。好比對於 start_time 的處理代碼以下：

if (ic->start_time != AV_NOPTS_VALUE) {
    int secs, us;
    av_log(NULL, AV_LOG_INFO, ", start: ");
    secs = ic->start_time / AV_TIME_BASE;
    us = llabs(ic->start_time % AV_TIME_BASE);
    av_log(NULL, AV_LOG_INFO, "%d.%06d", secs, (int) av_rescale(us, 1000000, AV_TIME_BASE));
}

因而可知，通過 avformat_find_stream_info 的處理，咱們能夠拿到媒體資源的封裝格式、總時長、總碼率了。此外 pFormatCtx->streams 是一個 AVStream 指針的數組，裏面包含了媒體資源的每一路流信息，數組的大小爲 pFormatCtx->nb_streams。

AVStream 結構體中關鍵的成員包括：

AVCodecContext *codec; // 記錄了該碼流的編碼信息
int64_t start_time; // 第一幀的時間戳
int64_t duration; // 該碼流的時長
int64_t nb_frames; // 該碼流的總幀數
AVDictionary *metadata; // 一些文件信息頭，key/value 字符串
AVRational avg_frame_rate; // 平均幀率

這裏能夠拿到平均幀率。

AVCodecContext 則記錄了一路流的具體編碼信息，其中關鍵的成員包括：

const struct AVCodec *codec; // 編碼的詳細信息
enum AVCodecID codec_id; // 編碼類型
int bit_rate; // 平均碼率
video only：
- int width, height; // 圖像的寬高尺寸，碼流中不必定存在該信息，會由解碼後覆蓋
- enum AVPixelFormat pix_fmt; // 原始圖像的格式，碼流中不必定存在該信息，會由解碼後覆蓋
audio only：
- int sample_rate; // 音頻的採樣率
- int channels; // 音頻的通道數
- enum AVSampleFormat sample_fmt; // 音頻的格式，位寬
- int frame_size; // 每一個音頻幀的 sample 個數

能夠看到編碼類型、圖像的寬度高度、音頻的參數都在這裏了。

瞭解完這些數據結構，咱們接着往下走，直到咱們找到一個視頻流：

// Find the first video stream.
videoStream = -1;
for (i = 0; i < pFormatCtx->nb_streams; i++) {
    if(pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
        videoStream = i;
        break;
    }
}
if (videoStream == -1) {
    return -1; // Didn't find a video stream.
}

// Get a pointer to the codec context for the video stream.
pCodecCtxOrig = pFormatCtx->streams[videoStream]->codec;

流信息中關於 codec 的部分存儲在 codec context 中，這裏包含了這路流所使用的全部的 codec 的信息，如今咱們有一個指向它的指針了，可是咱們接着還須要找到真正的 codec 並打開它：

// Find the decoder for the video stream.
pCodec = avcodec_find_decoder(pCodecCtxOrig->codec_id);
if (pCodec == NULL) {
    fprintf(stderr, "Unsupported codec!\n");
    return -1; // Codec not found.
}
// Copy context.
pCodecCtx = avcodec_alloc_context3(pCodec);
if (avcodec_copy_context(pCodecCtx, pCodecCtxOrig) != 0) {
    fprintf(stderr, "Couldn't copy codec context");
    return -1; // Error copying codec context.
}

// Open codec.
if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
    return -1; // Could not open codec.
}

須要注意，咱們不能直接使用視頻流中的 AVCodecContext，因此咱們須要用 avcodec_copy_context() 來拷貝一份新的 AVCodecContext 出來。

存儲數據

接下來，咱們須要一個地方來存儲視頻中的幀：

AVFrame *pFrame = NULL;

// Allocate video frame.
pFrame = av_frame_alloc();

因爲咱們計劃將視頻幀輸出存儲爲 PPM 文件，而 PPM 文件是會存儲爲 24-bit RGB 格式的，因此咱們須要將視頻幀從它原本的格式轉換爲 RGB。FFmpeg 能夠幫咱們作這些。對於大多數的項目，咱們可能都有將原來的視頻幀轉換爲指定格式的需求。如今咱們就來建立一個AVFrame 用於格式轉換：

// Allocate an AVFrame structure.
pFrameRGB = av_frame_alloc();
if (pFrameRGB == NULL) {
    return -1;
}

儘管咱們已經分配了內存類處理視頻幀，當咱們轉格式時，咱們仍然須要一塊地方來存儲視頻幀的原始數據。咱們使用 av_image_get_buffer_size 來獲取須要的內存大小，而後手動分配這塊內存。

int numBytes;
uint8_t *buffer = NULL;

// Determine required buffer size and allocate buffer.
numBytes = av_image_get_buffer_size(AV_PIX_FMT_RGB24, pCodecCtx->width, pCodecCtx->height, 1);
buffer = (uint8_t *) av_malloc(numBytes * sizeof(uint8_t));

av_malloc 是一個 FFmpeg 的 malloc，主要是對 malloc 作了一些封裝來保證地址對齊之類的事情，它不會保證你的代碼不發生內存泄漏、屢次釋放或其餘 malloc 問題。

如今咱們用 av_image_fill_arrays 函數來關聯 frame 和咱們剛纔分配的內存。

// Assign appropriate parts of buffer to image planes in pFrameRGB Note that pFrameRGB is an AVFrame, but AVFrame is a superset of AVPicture
av_image_fill_arrays(pFrameRGB->data, pFrameRGB->linesize, buffer, AV_PIX_FMT_RGB24, pCodecCtx->width, pCodecCtx->height, 1);

如今，咱們準備從視頻流讀取數據了。

讀取數據

接下來咱們要作的就是從整個視頻流中讀取數據包 packet，並將數據解碼到咱們的 frame 中，一旦得到完整的 frame，咱們就轉換其格式並存儲它。

AVPacket packet;
int frameFinished;
struct SwsContext *sws_ctx = NULL;

// Initialize SWS context for software scaling.
sws_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height, AV_PIX_FMT_RGB24, SWS_BILINEAR, NULL, NULL, NULL);

// Read frames and save first five frames to disk.
i = 0;
while (av_read_frame(pFormatCtx, &packet) >= 0) {
    // Is this a packet from the video stream?
    if (packet.stream_index == videoStream) {
        // Decode video frame
        avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);

        // Did we get a video frame?
        if (frameFinished) {
            // Convert the image from its native format to RGB.
            sws_scale(sws_ctx, (uint8_t const * const *) pFrame->data, pFrame->linesize, 0, pCodecCtx->height, pFrameRGB->data, pFrameRGB->linesize);

            // Save the frame to disk.
            if (++i <= 5) {
                SaveFrame(pFrameRGB, pCodecCtx->width, pCodecCtx->height, i);
            }
        }
    }

    // Free the packet that was allocated by av_read_frame.
    av_packet_unref(&packet);
}

接下來的程序是比較好理解的：av_read_frame() 函數從視頻流中讀取一個數據包 packet，把它存儲在 AVPacket 數據結構中。須要注意，咱們只建立了 packet 結構，FFmpeg 則爲咱們填充了其中的數據，其中 packet.data 這個指針會指向這些數據，而這些數據佔用的內存須要經過 av_packet_unref() 函數來釋放。avcodec_decode_video2() 函數將數據包 packet 轉換爲視頻幀 frame。可是，咱們可能沒法經過只解碼一個 packet 就得到一個完整的視頻幀 frame，可能須要讀取多個 packet 才行，avcodec_decode_video2() 會在解碼到完整的一幀時設置 frameFinished 爲真。最後當解碼到完整的一幀時，咱們用 sws_scale() 函數來將視頻幀原本的格式 pCodecCtx->pix_fmt 轉換爲 RGB。記住你能夠將一個 AVFrame 指針轉換爲一個 AVPicture 指針。最後，咱們使用咱們的 SaveFrame 函數來保存這一個視頻幀到文件。

在 SaveFrame 函數中，咱們將 RGB 信息寫入到一個 PPM 文件中。

void SaveFrame(AVFrame *pFrame, int width, int height, int iFrame) {
    FILE *pFile;
    char szFilename[32];
    int y;
  
    // Open file.
    sprintf(szFilename, "frame%d.ppm", iFrame);
    pFile = fopen(szFilename, "wb");
    if (pFile == NULL) {
        return;
    }
  
    // Write header.
    fprintf(pFile, "P6\n%d %d\n255\n", width, height);
  
    // Write pixel data.
    for (y = 0; y < height; y++) {
        fwrite(pFrame->data[0]+y*pFrame->linesize[0], 1, width*3, pFile);
    }
  
    // Close file.
    fclose(pFile);
}

下面咱們回到 main 函數，當咱們完成了視頻流的讀取，咱們須要作一些掃尾工做：

// Free the RGB image.
av_free(buffer);
av_frame_free(&pFrameRGB);

// Free the YUV frame.
av_frame_free(&pFrame);

// Close the codecs.
avcodec_close(pCodecCtx);
avcodec_close(pCodecCtxOrig);

// Close the video file.
avformat_close_input(&pFormatCtx);

return 0;

你能夠看到，這裏咱們用 av_free() 函數來釋放咱們用 av_malloc() 分配的內存。

以上即是咱們這節教程的所有內容，其中的完整代碼你能夠從這裏得到：https://github.com/samirchen/TestFFmpeg

編譯執行

你可使用下面的命令編譯它：

$ gcc -o tutorial01 tutorial01.c -lavutil -lavformat -lavcodec -lswscale -lz -lm

找一個媒體文件，你能夠這樣執行一下試試：

$ tutorial01 myvideofile.mp4

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。