衆所周知,原始的音視頻數據沒法直接在網絡上傳輸,推流須要編碼後的音視頻數據以合成的視頻流,如flv, mov, asf流等,根據接收方須要的格式進行合成並傳輸,這裏以合成asf流爲例,講述一個完整推流過程:即音視頻從採集到編碼,同步合成asf視頻流,而後能夠將流進行傳輸,爲了方便,本例將合成的視頻流寫入一個asf文件中,以供測試.html
注意: 測試須要使用終端經過: ffplay播放demo中錄製好的文件,由於asf是windows才支持的格式,mac自帶播放器沒法播放.ios
對於iOS而言,咱們能夠經過底層API捕獲視頻幀與音頻幀數據,捕獲視頻幀使用AVFoundation
框架中的AVCaptureSession
, 其實它同時也能夠捕獲音頻數據,而由於咱們想使用最低延時與最高音質的音頻, 因此須要藉助最底層的音頻捕捉框架Audio Unit
,而後使用VideoToolbox
框架中的VTCompressionSessionRef
能夠對視頻數據進行編碼,使用AudioConverter
能夠對音頻數據進行編碼,咱們在採集時能夠將第一幀I幀產生時的系統時間做爲音視頻時間戳的一個起點,日後的視頻說都基於此,由此可擴展作音視頻同步方案,最終,咱們將比那編碼好的音視頻數據經過FFmpeg進行合成,這裏以asf流爲例進行合成,並將生成好的asf流寫入文件,以供測試. 生成好的asf流可直接用於網絡傳輸.git
採集視頻github
AVCaptureSession
對象sessionPreset/activeFormat
,指定幀率setActiveVideoMinFrameDuration/setActiveVideoMaxFrameDuration
AVCaptureDevice
kCVPixelBufferPixelFormatTypeKey
- (void)setSampleBufferDelegate:(nullable id<AVCaptureVideoDataOutputSampleBufferDelegate>)sampleBufferDelegate queue:(nullable dispatch_queue_t)sampleBufferCallbackQueue
AVCaptureVideoPreviewLayer
CMSampleBufferRef
採集音頻windows
setPreferredIOBufferDuration
AudioComponentInstanceNew
kAudioUnitProperty_ShouldAllocateBuffer
AudioOutputUnitStart
AudioUnitRender
編碼視頻數據數組
VTCompressionSessionCreate
VTCompressionSessionPrepareToEncodeFrames
VTCompressionSessionEncodeFrame
CMBlockBufferRef
編碼音頻數據緩存
kAudioEncoderComponentType
AudioConverterNewSpecific
AudioConverterFillComplexBuffer
音視頻同步bash
以編碼的第一幀視頻的系統時間做爲音視頻數據的基準時間戳,隨後將採集到音視頻數據中的時間戳減去該基準時間戳做爲各自的時間戳, 同步有兩種策略,一種是以音頻時間戳爲準, 即當出現錯誤時,讓視頻時間戳去追音頻時間戳,這樣作即會形成看到畫面會快進或快退,二是以視頻時間戳爲準,即當出現錯誤時,讓音頻時間戳去追視時間戳,即聲音可能會刺耳,不推薦.因此通常使用第一種方案,經過估計下一幀視頻時間戳看看若是超出同步範圍則進行同步.網絡
FFmpeg合成數據流session
AVFormatContext
: avformat_alloc_context
AVCodec
: avcodec_find_encoder
視頻:AV_CODEC_ID_H264/AV_CODEC_ID_HEVC
,音頻:AV_CODEC_ID_AAC
AVStream
: avformat_new_stream
video_codec_id, audio_codec_id
avformat_write_header
av_write_frame
- (void)viewDidLoad {
[super viewDidLoad];
// Do any additional setup after loading the view.
[self configureCamera];
[self configureAudioCapture];
[self configureAudioEncoder];
[self configurevideoEncoder];
[self configureAVMuxHandler];
[self configureAVRecorder];
}
複製代碼
- (void)xdxCaptureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
if ([output isKindOfClass:[AVCaptureVideoDataOutput class]] == YES) {
if (self.videoEncoder) {
[self.videoEncoder startEncodeDataWithBuffer:sampleBuffer
isNeedFreeBuffer:NO];
}
}
}
複製代碼
#pragma mark Video Encoder
- (void)receiveVideoEncoderData:(XDXVideEncoderDataRef)dataRef {
[self.muxHandler addVideoData:dataRef->data size:(int)dataRef->size timestamp:dataRef->timestamp isKeyFrame:dataRef->isKeyFrame isExtraData:dataRef->isExtraData videoFormat:XDXMuxVideoFormatH264];
}
複製代碼
#pragma mark Audio Capture and Audio Encode
- (void)receiveAudioDataByDevice:(XDXCaptureAudioDataRef)audioDataRef {
[self.audioEncoder encodeAudioWithSourceBuffer:audioDataRef->data
sourceBufferSize:audioDataRef->size
pts:audioDataRef->pts
completeHandler:^(XDXAudioEncderDataRef dataRef) {
if (dataRef->size > 10) {
[self.muxHandler addAudioData:(uint8_t *)dataRef->data
size:dataRef->size
channelNum:1
sampleRate:44100
timestamp:dataRef->pts];
}
free(dataRef->data);
}];
}
複製代碼
#pragma mark Mux
- (IBAction)startRecordBtnDidClicked:(id)sender {
int size = 0;
char *data = (char *)[self.muxHandler getAVStreamHeadWithSize:&size];
[self.recorder startRecordWithIsHead:YES data:data size:size];
self.isRecording = YES;
}
- (void)receiveAVStreamWithIsHead:(BOOL)isHead data:(uint8_t *)data size:(int)size {
if (isHead) {
return;
}
if (self.isRecording) {
[self.recorder startRecordWithIsHead:NO data:(char *)data size:size];
}
}
複製代碼
- (void)configureFFmpegWithFormat:(const char *)format {
if(m_outputContext != NULL) {
av_free(m_outputContext);
m_outputContext = NULL;
}
m_outputContext = avformat_alloc_context();
m_outputFormat = av_guess_format(format, NULL, NULL);
m_outputContext->oformat = m_outputFormat;
m_outputFormat->audio_codec = AV_CODEC_ID_NONE;
m_outputFormat->video_codec = AV_CODEC_ID_NONE;
m_outputContext->nb_streams = 0;
m_video_stream = avformat_new_stream(m_outputContext, NULL);
m_video_stream->id = 0;
m_audio_stream = avformat_new_stream(m_outputContext, NULL);
m_audio_stream->id = 1;
log4cplus_info(kModuleName, "configure ffmpeg finish.");
}
複製代碼
設置該編碼的視頻流中詳細的信息, 如編碼器類型,配置信息,原始視頻數據格式,視頻的寬高,比特率,幀率,基準時間戳,extra data等.
這裏最重要的就是extra data,注意,由於咱們要根據extra data才能生成正確的頭數據,而asf流須要的是annux b格式的數據,蘋果採集的視頻數據格式爲avcc因此在編碼模塊中已經將其轉爲annux b格式的數據,並經過參數傳入,這裏能夠直接使用,關於這兩種格式區別也能夠參考閱讀前提中的碼流介紹的文章.
- (void)configureVideoStreamWithVideoFormat:(XDXMuxVideoFormat)videoFormat extraData:(uint8_t *)extraData extraDataSize:(int)extraDataSize {
if (m_outputContext == NULL) {
log4cplus_error(kModuleName, "%s: m_outputContext is null",__func__);
return;
}
if(m_outputFormat == NULL){
log4cplus_error(kModuleName, "%s: m_outputFormat is null",__func__);
return;
}
AVFormatContext *formatContext = avformat_alloc_context();
AVStream *stream = NULL;
if(XDXMuxVideoFormatH264 == videoFormat) {
AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_H264);
stream = avformat_new_stream(formatContext, codec);
stream->codecpar->codec_id = AV_CODEC_ID_H264;
}else if(XDXMuxVideoFormatH265 == videoFormat) {
AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_HEVC);
stream = avformat_new_stream(formatContext, codec);
stream->codecpar->codec_tag = MKTAG('h', 'e', 'v', 'c');
stream->codecpar->profile = FF_PROFILE_HEVC_MAIN;
stream->codecpar->format = AV_PIX_FMT_YUV420P;
stream->codecpar->codec_id = AV_CODEC_ID_HEVC;
}
stream->codecpar->format = AV_PIX_FMT_YUVJ420P;
stream->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
stream->codecpar->width = 1280;
stream->codecpar->height = 720;
stream->codecpar->bit_rate = 1024*1024;
stream->time_base.den = 1000;
stream->time_base.num = 1;
stream->time_base = (AVRational){1, 1000};
stream->codec->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
memcpy(m_video_stream, stream, sizeof(AVStream));
if(extraData) {
int newExtraDataSize = extraDataSize + AV_INPUT_BUFFER_PADDING_SIZE;
m_video_stream->codecpar->extradata_size = extraDataSize;
m_video_stream->codecpar->extradata = (uint8_t *)av_mallocz(newExtraDataSize);
memcpy(m_video_stream->codecpar->extradata, extraData, extraDataSize);
}
av_free(stream);
m_outputContext->video_codec_id = m_video_stream->codecpar->codec_id;
m_outputFormat->video_codec = m_video_stream->codecpar->codec_id;
self.isReadyForVideo = YES;
[self productStreamHead];
}
複製代碼
首先根據編碼音頻的類型生成編碼器並生成流對象,而後 配置音頻流的詳細信息,如壓縮數據格式,採樣率,聲道數,比特率,extra data等等.這裏要注意的是extra data是爲了保存mp4文件時播放器可以正確解碼播放準備的,能夠參考這幾篇文章:audio extra data1,audio extra data2
- (void)configureAudioStreamWithChannelNum:(int)channelNum sampleRate:(int)sampleRate {
AVFormatContext *formatContext = avformat_alloc_context();
AVCodec *codec = avcodec_find_encoder(AV_CODEC_ID_AAC);
AVStream *stream = avformat_new_stream(formatContext, codec);
stream->index = 1;
stream->id = 1;
stream->duration = 0;
stream->time_base.num = 1;
stream->time_base.den = 1000;
stream->start_time = 0;
stream->priv_data = NULL;
stream->codecpar->codec_type = AVMEDIA_TYPE_AUDIO;
stream->codecpar->codec_id = AV_CODEC_ID_AAC;
stream->codecpar->format = AV_SAMPLE_FMT_S16;
stream->codecpar->sample_rate = sampleRate;
stream->codecpar->channels = channelNum;
stream->codecpar->bit_rate = 0;
stream->codecpar->extradata_size = 2;
stream->codecpar->extradata = (uint8_t *)malloc(2);
stream->time_base.den = 25;
stream->time_base.num = 1;
/*
* why we put extra data here for audio: when save to MP4 file, the player can not decode it correctly
* http://ffmpeg-users.933282.n4.nabble.com/AAC-decoder-td1013071.html
* http://ffmpeg.org/doxygen/trunk/mpeg4audio_8c.html#aa654ec3126f37f3b8faceae3b92df50e
* extra data have 16 bits:
* Audio object type - normally 5 bits, but 11 bits if AOT_ESCAPE
* Sampling index - 4 bits
* if (Sampling index == 15)
* Sample rate - 24 bits
* Channel configuration - 4 bits
* last reserved- 3 bits
* for exmpale: "Low Complexity Sampling frequency 44100Hz, 1 channel mono":
* AOT_LC == 2 -> 00010
- * 44.1kHz == 4 -> 0100
+ * 44.1kHz == 4 -> 0100 48kHz == 3 -> 0011
* mono == 1 -> 0001
* so extra data: 00010 0100 0001 000 ->0x12 0x8
+ 00010 0011 0001 000 ->0x11 0x88
+
*/
if (stream->codecpar->sample_rate == 44100) {
stream->codecpar->extradata[0] = 0x12;
//iRig mic HD have two chanel 0x11
if(channelNum == 1)
stream->codecpar->extradata[1] = 0x8;
else
stream->codecpar->extradata[1] = 0x10;
}else if (stream->codecpar->sample_rate == 48000) {
stream->codecpar->extradata[0] = 0x11;
//iRig mic HD have two chanel 0x11
if(channelNum == 1)
stream->codecpar->extradata[1] = 0x88;
else
stream->codecpar->extradata[1] = 0x90;
}else if (stream->codecpar->sample_rate == 32000){
stream->codecpar->extradata[0] = 0x12;
if (channelNum == 1)
stream->codecpar->extradata[1] = 0x88;
else
stream->codecpar->extradata[1] = 0x90;
}
else if (stream->codecpar->sample_rate == 16000){
stream->codecpar->extradata[0] = 0x14;
if (channelNum == 1)
stream->codecpar->extradata[1] = 0x8;
else
stream->codecpar->extradata[1] = 0x10;
}else if(stream->codecpar->sample_rate == 8000){
stream->codecpar->extradata[0] = 0x15;
if (channelNum == 1)
stream->codecpar->extradata[1] = 0x88;
else
stream->codecpar->extradata[1] = 0x90;
}
stream->codec->flags|= AV_CODEC_FLAG_GLOBAL_HEADER;
memcpy(m_audio_stream, stream, sizeof(AVStream));
av_free(stream);
m_outputContext->audio_codec_id = stream->codecpar->codec_id;
m_outputFormat->audio_codec = stream->codecpar->codec_id;
self.isReadyForAudio = YES;
[self productStreamHead];
}
複製代碼
當前面2,3部都配置完成後,咱們將音視頻流注入上下文對象及對象中的流格式中,便可開始生成頭數據.avformat_write_header
- (void)productStreamHead {
log4cplus_debug("record", "%s,line:%d",__func__,__LINE__);
if (m_outputFormat->video_codec == AV_CODEC_ID_NONE) {
log4cplus_error(kModuleName, "%s: video codec is NULL.",__func__);
return;
}
if(m_outputFormat->audio_codec == AV_CODEC_ID_NONE) {
log4cplus_error(kModuleName, "%s: audio codec is NULL.",__func__);
return;
}
/* prepare header and save header data in a stream */
if (avio_open_dyn_buf(&m_outputContext->pb) < 0) {
avio_close_dyn_buf(m_outputContext->pb, NULL);
log4cplus_error(kModuleName, "%s: AVFormat_HTTP_FF_OPEN_DYURL_ERROR.",__func__);
return;
}
/*
* HACK to avoid mpeg ps muxer to spit many underflow errors
* Default value from FFmpeg
* Try to set it use configuration option
*/
m_outputContext->max_delay = (int)(0.7*AV_TIME_BASE);
int result = avformat_write_header(m_outputContext,NULL);
if (result < 0) {
log4cplus_error(kModuleName, "%s: Error writing output header, res:%d",__func__,result);
return;
}
uint8_t * output = NULL;
int len = avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
if(len > 0 && output != NULL) {
av_free(output);
self.isReadyForHead = YES;
if (m_avhead_data) {
free(m_avhead_data);
}
m_avhead_data_size = len;
m_avhead_data = (uint8_t *)malloc(len);
memcpy(m_avhead_data, output, len);
if ([self.delegate respondsToSelector:@selector(receiveAVStreamWithIsHead:data:size:)]) {
[self.delegate receiveAVStreamWithIsHead:YES data:output size:len];
}
log4cplus_error(kModuleName, "%s: create head length = %d",__func__, len);
}else{
self.isReadyForHead = NO;
log4cplus_error(kModuleName, "%s: product stream header failed.",__func__);
}
}
複製代碼
該數組經過封裝C++中的vector實現一個輕量級數據結構以緩存數據.
新建一條線程專門合成音視頻數據,合成策略即取出音視頻數據中時間戳較小的一幀先寫,由於音視頻數據整體誤差不大,因此理想狀況應該是取一幀視頻,一幀音頻,固然由於音頻採樣較快,可能會相對多一兩幀,而當音視頻數據因爲某種緣由不一樣步時,則會等待,直至時間戳從新同步才能繼續進行合成.
int err = pthread_create(&m_muxThread,NULL,MuxAVPacket,(__bridge_retained void *)self);
if(err != 0){
log4cplus_error(kModuleName, "%s: create thread failed: %s",__func__, strerror(err));
}
void * MuxAVPacket(void *arg) {
pthread_setname_np("XDX_MUX_THREAD");
XDXAVStreamMuxHandler *instance = (__bridge_transfer XDXAVStreamMuxHandler *)arg;
if(instance != nil) {
[instance dispatchAVData];
}
return NULL;
}
#pragma mark Mux
- (void)dispatchAVData {
XDXMuxMediaList audioPack;
XDXMuxMediaList videoPack;
memset(&audioPack, 0, sizeof(XDXMuxMediaList));
memset(&videoPack, 0, sizeof(XDXMuxMediaList));
[m_AudioListPack reset];
[m_VideoListPack reset];
while (true) {
int videoCount = [m_VideoListPack count];
int audioCount = [m_AudioListPack count];
if(videoCount == 0 || audioCount == 0) {
usleep(5*1000);
log4cplus_debug(kModuleName, "%s: Mux dispatch list: v:%d, a:%d",__func__,videoCount, audioCount);
continue;
}
if(audioPack.timeStamp == 0) {
[m_AudioListPack popData:&audioPack];
}
if(videoPack.timeStamp == 0) {
[m_VideoListPack popData:&videoPack];
}
if(audioPack.timeStamp >= videoPack.timeStamp) {
log4cplus_debug(kModuleName, "%s: Mux dispatch input video time stamp = %llu",__func__,videoPack.timeStamp);
if(videoPack.data != NULL && videoPack.data->data != NULL){
[self addVideoPacket:videoPack.data
timestamp:videoPack.timeStamp
extraDataHasChanged:videoPack.extraDataHasChanged];
av_free(videoPack.data->data);
av_free(videoPack.data);
}else{
log4cplus_error(kModuleName, "%s: Mux Video AVPacket data abnormal",__func__);
}
videoPack.timeStamp = 0;
}else {
log4cplus_debug(kModuleName, "%s: Mux dispatch input audio time stamp = %llu",__func__,audioPack.timeStamp);
if(audioPack.data != NULL && audioPack.data->data != NULL) {
[self addAudioPacket:audioPack.data
timestamp:audioPack.timeStamp];
av_free(audioPack.data->data);
av_free(audioPack.data);
}else {
log4cplus_error(kModuleName, "%s: Mux audio AVPacket data abnormal",__func__);
}
audioPack.timeStamp = 0;
}
}
}
複製代碼
經過av_write_frame
便可獲取合成好的數據.
- (void)productAVDataPacket:(AVPacket *)packet extraDataHasChanged:(BOOL)extraDataHasChanged {
BOOL isVideoIFrame = NO;
uint8_t *output = NULL;
int len = 0;
if (avio_open_dyn_buf(&m_outputContext->pb) < 0) {
return;
}
if(packet->stream_index == 0 && packet->flags != 0) {
isVideoIFrame = YES;
}
if (av_write_frame(m_outputContext, packet) < 0) {
avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
if(output != NULL)
free(output);
log4cplus_error(kModuleName, "%s: Error writing output data",__func__);
return;
}
len = avio_close_dyn_buf(m_outputContext->pb, (uint8_t **)(&output));
if(len == 0 || output == NULL) {
log4cplus_debug(kModuleName, "%s: mux len:%d or data abnormal",__func__,len);
if(output != NULL)
av_free(output);
return;
}
if ([self.delegate respondsToSelector:@selector(receiveAVStreamWithIsHead:data:size:)]) {
[self.delegate receiveAVStreamWithIsHead:NO data:output size:len];
}
if(output != NULL)
av_free(output);
}
複製代碼