iOS利用FFmpeg解析音視頻數據流

時間 2019-11-06

原文原文鏈接

需求

利用FFmpeg解析音視頻流,音視頻流能夠來自一個標準的RTMP的URL或者是一個文件. 經過解析獲得音視頻流,進一步就能夠解碼, 而後視頻渲染在屏幕上,音頻經過揚聲器輸出.ios

實現原理

利用FFmpeg框架中libavformat模塊能夠經過函數av_read_frame解析出音視頻流的音視頻數據,若是直接使用FFmpeg硬解,僅須要解析到AVPacket便可傳給解碼模塊使用,若是使用VideoToolbox中的硬解, 對於視頻數據,還須要獲取其NALU Header中的(vps)sps, pps以便後續使用.git

閱讀前提:

iOS中FFmpeg環境搭建
FFmpeg基本知識
音視頻基礎

GitHub地址(附代碼) : iOS Parse

掘金地址 : iOS Parse

簡書地址 : iOS Parse

博客地址 : iOS Parse

簡易流程

使用流程github

初始化解析類: - (instancetype)initWithPath:(NSString *)path;
開始解析: startParseWithCompletionHandler
獲取解析後的數據: 從上一步中startParseWithCompletionHandler方法中的Block獲取解析後的音視頻數據.

FFmpeg parse流程數組

建立format context: avformat_alloc_context
打開文件流: avformat_open_input
尋找流信息: avformat_find_stream_info
獲取音視頻流的索引值: formatContext->streams[i]->codecpar->codec_type == (isVideoStream ? AVMEDIA_TYPE_VIDEO : AVMEDIA_TYPE_AUDIO)
獲取音視頻流: m_formatContext->streams[m_audioStreamIndex]
解析音視頻數據幀: av_read_frame
獲取extra data: av_bitstream_filter_filter

具體步驟

1. 將FFmpeg框架導入項目中

下面的連接中包含搭建iOS須要的FFmpeg環境的詳細步驟,須要的能夠提早閱讀.bash

iOS編譯FFmpeg數據結構

導入FFmpeg框架後,首先須要將用到FFmpeg的文件更名爲.mm, 由於涉及C,C++混編,因此須要更改文件名app

而後在頭文件中導入FFmpeg頭文件.框架

// FFmpeg Header File
#ifdef __cplusplus
extern "C" {
#endif
    
#include "libavformat/avformat.h"
#include "libavcodec/avcodec.h"
#include "libavutil/avutil.h"
#include "libswscale/swscale.h"
#include "libswresample/swresample.h"
#include "libavutil/opt.h"
    
#ifdef __cplusplus
};
#endif
複製代碼

注意: FFmpeg是一個廣爲流傳的框架,其結構複雜,通常導入都按照如上格式,以文件夾名爲根目錄進行導入,具體設置,請參考上文連接.ide

2. 初始化

2.1. 註冊FFmpeg

void av_register_all(void); 初始化libavformat並註冊全部muxers，demuxers與協議。若是不調用此功能，則能夠選擇一個特定想要支持的格式。

通常在程序中的main函數或是主程序啓動的代理方法- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions中初始化FFmpeg,執行一次便可.函數

av_register_all();
複製代碼

2.2. 利用視頻文件生成格式上下文對象

avformat_alloc_context(): 初始化avformat上下文對象.
int avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)函數
- fmt: 若是非空表示強制指定一個輸入流的格式, 設置爲空會自動選擇.
int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options); :讀取媒體文件的數據包以獲取流信息

- (AVFormatContext *)createFormatContextbyFilePath:(NSString *)filePath {
    if (filePath == nil) {
        log4cplus_error(kModuleName, "%s: file path is NULL",__func__);
        return NULL;
    }
    
    AVFormatContext  *formatContext = NULL;
    AVDictionary     *opts          = NULL;
    
    av_dict_set(&opts, "timeout", "1000000", 0);//設置超時1秒
    
    formatContext = avformat_alloc_context();
    BOOL isSuccess = avformat_open_input(&formatContext, [filePath cStringUsingEncoding:NSUTF8StringEncoding], NULL, &opts) < 0 ? NO : YES;
    av_dict_free(&opts);
    if (!isSuccess) {
        if (formatContext) {
            avformat_free_context(formatContext);
        }
        return NULL;
    }
    
    if (avformat_find_stream_info(formatContext, NULL) < 0) {
        avformat_close_input(&formatContext);
        return NULL;
    }
    
    return formatContext;
}
複製代碼

2.3. 獲取Audio / Video流的索引值.

經過遍歷format context對象能夠從nb_streams數組中找到音頻或視頻流索引,以便後續使用.

注意: 後面代碼中僅須要知道音頻,視頻的索引就能夠快速讀取到format context對象中對應流的信息.

- (int)getAVStreamIndexWithFormatContext:(AVFormatContext *)formatContext isVideoStream:(BOOL)isVideoStream {
    int avStreamIndex = -1;
    for (int i = 0; i < formatContext->nb_streams; i++) {
        if ((isVideoStream ? AVMEDIA_TYPE_VIDEO : AVMEDIA_TYPE_AUDIO) == formatContext->streams[i]->codecpar->codec_type) {
            avStreamIndex = i;
        }
    }
    
    if (avStreamIndex == -1) {
        log4cplus_error(kModuleName, "%s: Not find video stream",__func__);
        return NULL;
    }else {
        return avStreamIndex;
    }
}

複製代碼

2.4. 是否支持音視頻流

目前視頻僅支持H264, H265編碼的格式.實際過程當中,解碼獲得視頻的旋轉角度多是不一樣的,以及不一樣機型能夠支持的解碼文件格式也是不一樣的,因此能夠用這個方法手動過濾一些不支持的狀況.具體請下載代碼觀看,這裏僅列出實戰中測試出支持的列表.

/*
         各機型支持的最高分辨率和FPS組合:
         
         iPhone 6S: 60fps -> 720P
         30fps -> 4K
         
         iPhone 7P: 60fps -> 1080p
         30fps -> 4K
         
         iPhone 8: 60fps -> 1080p
         30fps -> 4K
         
         iPhone 8P: 60fps -> 1080p
         30fps -> 4K
         
         iPhone X: 60fps -> 1080p
         30fps -> 4K
         
         iPhone XS: 60fps -> 1080p
         30fps -> 4K
         */
複製代碼

音頻本例中僅支持AAC格式.其餘格式可根據需求自行更改.

3. 開始解析

初始化AVPacket以存放解析後的數據

使用AVPacket這個結構體來存儲壓縮數據.對於視頻而言, 它一般包含一個壓縮幀,對音頻而言,可能包含多個壓縮幀,該結構體類型經過av_malloc()函數分配內存,經過av_packet_ref()函數拷貝,經過av_packet_unref().函數釋放內存.

AVPacket    packet;
av_init_packet(&packet);

複製代碼

解析數據

int av_read_frame(AVFormatContext *s, AVPacket *pkt); : 此函數返回存儲在文件中的內容，而且不驗證解碼器的有效幀是什麼。它會將存儲在文件中的內容分紅幀，併爲每次調用返回一個。它不會在有效幀之間省略無效數據，以便爲解碼器提供解碼時可能的最大信息。

int size = av_read_frame(formatContext, &packet);
            if (size < 0 || packet.size < 0) {
                handler(YES, YES, NULL, NULL);
                log4cplus_error(kModuleName, "%s: Parse finish",__func__);
                break;
            }
複製代碼

獲取sps, pps等NALU Header信息

經過調用av_bitstream_filter_filter能夠從碼流中過濾獲得sps, pps等NALU Header信息.

av_bitstream_filter_init: 經過給定的比特流過濾器名詞建立並初始化一個比特流過濾器上下文.

av_bitstream_filter_filter: 此函數經過過濾buf參數中的數據,將過濾後的數據放在poutbuf參數中.輸出的buffer必須被調用者釋放.

此函數使用buf_size大小過濾緩衝區buf，並將過濾後的緩衝區放在poutbuf指向的緩衝區中。

attribute_deprecated int av_bitstream_filter_filter	(	AVBitStreamFilterContext * 	bsfc,   
AVCodecContext * 	avctx,
const char * 	args,   // filter 配置參數
uint8_t ** 	poutbuf,    // 過濾後的數據
int * 	poutbuf_size,   // 過濾後的數據大小
const uint8_t * 	buf,// 提供給過濾器的原始數據
int 	buf_size,       // 提供給過濾器的原始數據大小
int 	keyframe        // 若是要過濾的buffer對應於關鍵幀分組數據，則設置爲非零
)	
複製代碼

注意: 下面使用new_packet是爲了解決av_bitstream_filter_filter會產生內存泄漏的問題.每次使用完後將用new_packet釋放便可.

if (packet.stream_index == videoStreamIndex) {
    static char filter_name[32];
    if (formatContext->streams[videoStreamIndex]->codecpar->codec_id == AV_CODEC_ID_H264) {
        strncpy(filter_name, "h264_mp4toannexb", 32);
        videoInfo.videoFormat = XDXH264EncodeFormat;
    } else if (formatContext->streams[videoStreamIndex]->codecpar->codec_id == AV_CODEC_ID_HEVC) {
        strncpy(filter_name, "hevc_mp4toannexb", 32);
        videoInfo.videoFormat = XDXH265EncodeFormat;
    } else {
        break;
    }
    
    AVPacket new_packet = packet;
    if (self->m_bitFilterContext == NULL) {
        self->m_bitFilterContext = av_bitstream_filter_init(filter_name);
    }
    av_bitstream_filter_filter(self->m_bitFilterContext, formatContext->streams[videoStreamIndex]->codec, NULL, &new_packet.data, &new_packet.size, packet.data, packet.size, 0);
    
}

複製代碼

根據特定規則生成時間戳

能夠根據本身的需求自定義時間戳生成規則.這裏使用當前系統時間戳加上數據包中的自帶的pts/dts生成了時間戳.

CMSampleTimingInfo timingInfo;
    CMTime presentationTimeStamp     = kCMTimeInvalid;
    presentationTimeStamp            = CMTimeMakeWithSeconds(current_timestamp + packet.pts * av_q2d(formatContext->streams[videoStreamIndex]->time_base), fps);
    timingInfo.presentationTimeStamp = presentationTimeStamp;
    timingInfo.decodeTimeStamp       = CMTimeMakeWithSeconds(current_timestamp + av_rescale_q(packet.dts, formatContext->streams[videoStreamIndex]->time_base, input_base), fps);
複製代碼

獲取parse到的數據

本例將獲取到的數據放在自定義的結構體中,而後經過block回調傳給方法的調用者,調用者能夠在回調函數中處理parse到的視頻數據.

struct XDXParseVideoDataInfo {
    uint8_t                 *data;
    int                     dataSize;
    uint8_t                 *extraData;
    int                     extraDataSize;
    Float64                 pts;
    Float64                 time_base;
    int                     videoRotate;
    int                     fps;
    CMSampleTimingInfo      timingInfo;
    XDXVideoEncodeFormat    videoFormat;
};

...

    videoInfo.data          = video_data;
    videoInfo.dataSize      = video_size;
    videoInfo.extraDataSize = formatContext->streams[videoStreamIndex]->codec->extradata_size;
    videoInfo.extraData     = (uint8_t *)malloc(videoInfo.extraDataSize);
    videoInfo.timingInfo    = timingInfo;
    videoInfo.pts           = packet.pts * av_q2d(formatContext->streams[videoStreamIndex]->time_base);
    videoInfo.fps           = fps;
    
    memcpy(videoInfo.extraData, formatContext->streams[videoStreamIndex]->codec->extradata, videoInfo.extraDataSize);
    av_free(new_packet.data);
    
    // send videoInfo
    if (handler) {
        handler(YES, NO, &videoInfo, NULL);
    }
    
    free(videoInfo.extraData);
    free(videoInfo.data);
複製代碼

獲取parse到的音頻數據

struct XDXParseAudioDataInfo {
    uint8_t     *data;
    int         dataSize;
    int         channel;
    int         sampleRate;
    Float64     pts;
};

...

    if (packet.stream_index == audioStreamIndex) {
        XDXParseAudioDataInfo audioInfo = {0};
        audioInfo.data = (uint8_t *)malloc(packet.size);
        memcpy(audioInfo.data, packet.data, packet.size);
        audioInfo.dataSize = packet.size;
        audioInfo.channel = formatContext->streams[audioStreamIndex]->codecpar->channels;
        audioInfo.sampleRate = formatContext->streams[audioStreamIndex]->codecpar->sample_rate;
        audioInfo.pts = packet.pts * av_q2d(formatContext->streams[audioStreamIndex]->time_base);
        
        // send audio info
        if (handler) {
            handler(NO, NO, NULL, &audioInfo);
        }
        
        free(audioInfo.data);
    }
複製代碼

釋放packet

由於咱們已經將packet中的關鍵數據拷貝到自定義的結構體中,因此使用完後須要釋放packet.

av_packet_unref(&packet);
複製代碼

parse完成後釋放相關資源

- (void)freeAllResources {
    if (m_formatContext) {
        avformat_close_input(&m_formatContext);
        m_formatContext = NULL;
    }
    
    if (m_bitFilterContext) {
        av_bitstream_filter_close(m_bitFilterContext);
        m_bitFilterContext = NULL;
    }
}
複製代碼

注意: 若是使用FFmpeg硬解,則僅僅須要獲取到AVPacket數據結構便可.不須要再將數據封裝到自定義的結構體中

4. 外部調用

上面操做執行完後,便可經過以下block獲取解析後的數據,通常須要繼續對音視頻進行解碼操做.後面文章會講到,請持續關注.

XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    [parseHandler startParseGetAVPackeWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, AVPacket packet) {
        if (isFinish) {
            // parse finish
            ...
            return;
        }
        
        if (isVideoFrame) {
            // decode video
            ...
        }else {
            // decode audio
            ...
        }
    }];
複製代碼