FFmpeg 入門(3)：播放音頻

時間 2019-11-12

標籤 ffmpeg 入門播放音頻简体版

原文原文鏈接

本文轉自：FFmpeg 入門(3)：播放音頻 | www.samirchen.comgit

音頻

SDL 提供了播放音頻的方法。SDL_OpenAudio 函數用來讓設備播放音頻，它須要咱們傳入一個包含了全部咱們輸出須要的音頻信息的 SDL_AudioSpec 結構體數據。github

在展現接下來的代碼以前，咱們先說說 PC 上是如何處理音頻的。數字音頻包含了一長串「音頻採樣(sample)」，每個採樣表明着一個音頻波形的值。聲音是在必定的「音頻採樣率(sample rate)」下被錄製下來的，音頻採樣率即每秒音頻採樣的數量，表示的是播放音頻速度。常見的音頻採樣率是 22500 和 44100，分別用於廣播和 CD。此外，大部分音頻還能夠用更多的通道來實現立體聲和環繞聲等效果，好比立體聲會一次來 2 個音頻採樣。這樣當咱們從媒體文件中獲取數據時，咱們不知道咱們會得到多少音頻採樣，而 FFmpeg 也不會只給咱們部分採樣，也就是說，它不會對立體聲的多通道採樣進行分割。緩存

SDL 的音頻播放的實現大體是這樣的：建立 SDL_AudioSpec 結構體，設置你的音頻播放數據，包括：採樣率(freq)、音頻格式(format)、通道數(channels)、採樣大小(samples)、回調函數(callback)和用戶數據(userdata)等。當開始播放音頻時，SDL 會持續調用這個回調方法來填充固定數量的字節到音頻緩衝區。而後咱們調用 SDL_OpenAudio() 函數，傳入這個 SDL_AudioSpec 結構體數據，這時它會打開音頻設備並給咱們返回一個另外的 SDL_AudioSpec，這個纔是咱們真正使用的 SDL_AudioSpec，這個跟咱們傳入的可能有不一樣。安全

建立音頻

簡單介紹了音頻相關的知識後，咱們接下來開始看代碼：像處理視頻流同樣，咱們從媒體文件中獲取音頻流。數據結構

// Find the first video stream.
videoStream = -1;
audioStream = -1;
for (i = 0; i < pFormatCtx->nb_streams; i++) {
    if (pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO && videoStream < 0) {
        videoStream = i;
    }
    if (pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO && audioStream < 0) {
        audioStream = i;
    }
}
if (videoStream == -1) {
    return -1; // Didn't find a video stream.
}
if (audioStream == -1) {
    return -1; // Didn't find a audio stream.
}

接着咱們能夠從 AVStream 的中 AVCodecContext 結構體中獲取咱們想要的全部信息。拿到 AVCodecContext 後，咱們就能夠用這些信息來建立音頻了：ide

AVCodecContext *aCodecCtx = NULL;
AVCodec *aCodec = NULL;
SDL_AudioSpec wanted_spec, spec;

aCodecCtx = pFormatCtx->streams[audioStream]->codec;
aCodec = avcodec_find_decoder(aCodecCtx->codec_id);
if (!aCodec) {
    fprintf(stderr, "Unsupported codec!\n");
    return -1;
}

// Set audio settings from codec info.
wanted_spec.freq = aCodecCtx->sample_rate;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = aCodecCtx->channels;
wanted_spec.silence = 0;
wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
wanted_spec.callback = audio_callback;
wanted_spec.userdata = aCodecCtx;

if (SDL_OpenAudio(&wanted_spec, &spec) < 0) {
    fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
    return -1;
}

avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);

對於 SDL_AudioSpec 的成員，咱們這裏說明一下：函數

freq: 採樣率。
format: 這個參數用來告訴 SDL 音頻的格式。S16SYS 中的 S 表示 signed，16 表示每一個採樣佔 16 bits。SYS 表示大小端和系統保持一致。這個格式是 avcodec_decode_audio4() 會給咱們的。
channels: 音頻通道數。
silence: 是否靜音。
samples: 音頻緩存的大小。推薦值爲 512~8192，ffplay 使用的是 1024。
callback: 回調函數。
userdata: 回調函數帶的用戶數據。

最後咱們經過 SDL_OpenAudio() 來打開音頻。ui

隊列

如今咱們能夠從碼流里拉取音頻數據了，可是咱們接下來改如何處理這些數據呢？咱們持續從媒體文件中獲取數據包(packet)，與此同時，SDL 也會持續調用回調方法。一種解決方案時，建立一塊全局的存儲區，讓咱們能不斷把數據放進去，讓 SDL 能不斷經過 audio_callback 從裏面把數據取出來做進一步處理。因此接下來咱們將建立一個數據包的隊列(packet queue)。事實上，FFmpeg 已經提供了相應的數據結構 AVPacketList，這是一個 packet 鏈表。基於此，咱們定義了咱們的 PacketQueue：this

typedef struct PacketQueue {
    AVPacketList *first_pkt, *last_pkt;
    int nb_packets;
    int size;
    SDL_mutex *mutex;
    SDL_cond *cond;
} PacketQueue;

須要指出的是 nb_packets 跟 size 不是一回事，size 是從 packet->size 得到的字節數。你能夠看到咱們這裏還有 SDL_mutex 和 SDL_cond 成員，這是由於 SDL 是在一個獨立的線程裏面來處理音頻，因此咱們須要有互斥機制來保證對隊列數據的正確操做。線程

下面是隊列初始化的代碼：

void packet_queue_init(PacketQueue *q) {
    memset(q, 0, sizeof(PacketQueue));
    q->mutex = SDL_CreateMutex();
    q->cond = SDL_CreateCond();
}

下面是向隊列添加數據的代碼：

int packet_queue_put(PacketQueue *q, AVPacket *pkt) {
    AVPacketList *pkt1;
    if (av_packet_ref(pkt, pkt) < 0) {
        return -1;
    }
    pkt1 = av_malloc(sizeof(AVPacketList));
    if (!pkt1) {
        return -1;
    }
    pkt1->pkt = *pkt;
    pkt1->next = NULL;
    
    
    SDL_LockMutex(q->mutex);
    
    if (!q->last_pkt) {
        q->first_pkt = pkt1;
    }
    else {
        q->last_pkt->next = pkt1;
    }
    q->last_pkt = pkt1;
    q->nb_packets++;
    q->size += pkt1->pkt.size;
    SDL_CondSignal(q->cond);
    
    SDL_UnlockMutex(q->mutex);
    return 0;
}

SDL_LockMutex() 經過鎖住 mutex 來讓咱們能安全的想隊列裏寫入數據。SDL_CondSignal() 則在添加完數據後發出信號告訴數據消費方準備獲取數據進行下一步處理，同時 SDL_UnlockMutex() 解鎖 mutex 讓消費方能正常獲取數據。

下面是對應的從隊列取數據的代碼：

int quit = 0;
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block) {
    AVPacketList *pkt1;
    int ret;
    
    SDL_LockMutex(q->mutex);
    
    for (;;) {
        if (quit) {
            ret = -1;
            break;
        }
        
        pkt1 = q->first_pkt;
        if (pkt1) {
            q->first_pkt = pkt1->next;
            if (!q->first_pkt) {
                q->last_pkt = NULL;
            }
            q->nb_packets--;
            q->size -= pkt1->pkt.size;
            *pkt = pkt1->pkt;
            av_free(pkt1);
            ret = 1;
            break;
        } else if (!block) {
            ret = 0;
            break;
        } else {
            SDL_CondWait(q->cond, q->mutex);
        }
    }
    SDL_UnlockMutex(q->mutex);
    return ret;
}

在上面代碼中，咱們實現了一個 for 循環，當這個 for 循環遇到阻塞了那就說明確定是獲得一組數據了。咱們經過 SDL 的 SDL_CondWait() 函數來避免永遠循環。基本上全部的 SDL_CondWait() 都會等待 SDL_CondSignal() 或者 SDL_CondBroadcast() 的信號，而後繼續。看起好像咱們已經在 mutex 這死鎖了，由於若是咱們不開鎖 packet_queue_put() 函數就沒法向隊列裏寫數據，但事實上 SDL_CondWait() 函數是會在合適的時候解開咱們傳給它的鎖的，並在收到信號時再嘗試去鎖上。

程序退出

代碼中咱們有一個全局變量 quit，這個變量是爲了當咱們在界面上點了退出後，能告訴線程退出。

SDL_PollEvent(&event);
switch (event.type) {
    case SDL_QUIT:
        quit = 1;
        SDL_Quit();
        exit(0);
        break;
    default:
        break;
}

填充隊列數據

接着要作的就是建立咱們的隊列：

PacketQueue audioq;
int main(int argc, char *argv[]) {
    // ... code ...

    avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);

    // audio_st = pFormatCtx->streams[index].
    packet_queue_init(&audioq);
    SDL_PauseAudio(0);
    
    // ... code ...
}

SDL_PauseAudio() 最終開啓了音頻設備，若是這時候沒有得到數據，那麼它就靜音。

一旦當咱們的隊列創建起來，咱們就能夠開始往裏面填充數據包了。下面就是咱們用於填數據的循環：

while (av_read_frame(pFormatCtx, &packet) >= 0) {
    // Is this a packet from the video stream?
    if (packet.stream_index == videoStream) {
        // Decode video frame.
        
        // ... code ...

    } else if (packet.stream_index == audioStream) {
        packet_queue_put(&audioq, &packet);
    } else {
        // Free the packet that was allocated by av_read_frame.
        av_packet_unref(&packet);
    }

    // ... code ...

}

注意，咱們沒有在把音頻數據包 packet 放入隊列後就當即釋放它，咱們會在解碼它以後才釋放。

獲取隊列數據

這裏咱們開始實現咱們獲取和處理隊列數據的回調函數，這個回調函數必須遵循這樣的形式：void callback(void *userdata, Uint8 *stream, int len)，其中 userdata 是咱們給 SDL 的指針，stream 是咱們寫入音頻數據的緩衝區，len 是緩衝區的大小。

void audio_callback(void *userdata, Uint8 *stream, int len) {
    AVCodecContext *aCodecCtx = (AVCodecContext *)userdata;
    int len1, audio_size;
    
    static uint8_t audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];
    static unsigned int audio_buf_size = 0;
    static unsigned int audio_buf_index = 0;
    
    while (len > 0) {
        if (audio_buf_index >= audio_buf_size) {
            // We have already sent all our data; get more.
            audio_size = audio_decode_frame(aCodecCtx, audio_buf, audio_buf_size);
            if (audio_size < 0) {
                // If error, output silence.
                audio_buf_size = 1024; // arbitrary?
                memset(audio_buf, 0, audio_buf_size);
            } else {
                audio_buf_size = audio_size;
            }
            audio_buf_index = 0;
        }
        len1 = audio_buf_size - audio_buf_index;
        if (len1 > len) {
            len1 = len;
        }
        memcpy(stream, (uint8_t *) audio_buf + audio_buf_index, len1);
        len -= len1;
        stream += len1;
        audio_buf_index += len1;
    }
}

這裏實現了一個簡單的循環來拉取數據，audio_decode_frame() 會存儲解碼結果在一個臨時緩衝區，這個緩衝區的數據會流向 stream。audio_buf 這個緩衝區的大小是 FFmpeg 給咱們的最大音頻幀大小的 1.5 倍，從而起到一個很好的彈性的做用。

音頻解碼

音頻解碼的代碼都在 audio_decode_frame() 函數中實現：

int audio_decode_frame(AVCodecContext *aCodecCtx, uint8_t *audio_buf, int buf_size) {
    static AVPacket pkt;
    static uint8_t *audio_pkt_data = NULL;
    static int audio_pkt_size = 0;
    static AVFrame frame;
    
    int len1, data_size = 0;
    
    for (;;) {
        while(audio_pkt_size > 0) {
            int got_frame = 0;
            len1 = avcodec_decode_audio4(aCodecCtx, &frame, &got_frame, &pkt);
            if (len1 < 0) {
                // if error, skip frame.
                audio_pkt_size = 0;
                break;
            }
            audio_pkt_data += len1;
            audio_pkt_size -= len1;
            if (got_frame) {
                data_size = av_samples_get_buffer_size(NULL, aCodecCtx->channels, frame.nb_samples, aCodecCtx->sample_fmt, 1);
                memcpy(audio_buf, frame.data[0], data_size);
            }
            if (data_size <= 0) {
                // No data yet, get more frames.
                continue;
            }
            // We have data, return it and come back for more later.
            return data_size;
        }
        if (pkt.data) {
            av_packet_unref(&pkt);
        }
        
        if (quit) {
            return -1;
        }
        
        if (packet_queue_get(&audioq, &pkt, 1) < 0) {
            return -1;
        }
        audio_pkt_data = pkt.data;
        audio_pkt_size = pkt.size;
    }
}

咱們從最後面開始看這整段代碼的處理邏輯，咱們調用 packet_queue_get() 函數從隊列中取數據包並存下來，接着一旦咱們有了數據包就調用 avcodec_decode_audio4() 函數來進行解碼。在一些狀況下，一個數據包 packet 可能有超過 1 個幀，那樣就須要屢次調用這段處理邏輯來從數據包裏取得全部數據。一旦咱們取得了一個幀，咱們就把它拷貝的音頻緩衝區。這裏須要注意的是數據類型轉換，由於 SDL 給咱們的是 8 bit 的整型緩衝區，可是 FFmpeg 給咱們的數據是 16 bit 的整型緩衝區。另外，還須要搞清楚 len1 和 data_size 的區別，len1 是一個 packet 中已被咱們使用的字節數，data_size 是返回的原生數據的大小。

當咱們獲取到一些數據後就當即返回來看看是否須要從隊列中得到更多的數據或者已經能夠足夠。若是一個 packet 中還有更多的數據須要繼續處理，咱們就將這些數據保存一會；若是已經處理完一個 packet 中的全部數據，那咱們就釋放這個 packet 的內存。

略做總結，主線程的數據讀取的循環負責從媒體文件中讀取數據並寫入到隊列，咱們從隊列中取出數據給 audio_callback 回調函數處理，回調函數會將數據交給 SDL，SDL 將數據交給聲卡來播放出聲音。

以上即是咱們這節教程的所有內容，其中的完整代碼你能夠從這裏得到：https://github.com/samirchen/TestFFmpeg

編譯執行

你可使用下面的命令編譯它：

$ gcc -o tutorial03 tutorial03.c -lavutil -lavformat -lavcodec -lswscale -lz -lm `sdl-config --cflags --libs`

找一個視頻文件，你能夠這樣執行一下試試：

$ tutorial03 myvideofile.mp4

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。