FFMPEG解碼流程

AVCodecContext

這是一個描述編解碼器上下文的數據結構,包含了衆多編解碼器須要的參數信息,以下列出了部分比較重要的域:數組

?安全

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
typedef struct AVCodecContext {
  
     ......
  
     /**
      * some codecs need / can use extradata like Huffman tables.
      * mjpeg: Huffman tables
      * rv10: additional flags
      * mpeg4: global headers (they can be in the bitstream or here)
      * The allocated memory should be FF_INPUT_BUFFER_PADDING_SIZE bytes larger
      * than extradata_size to avoid prolems if it is read with the bitstream reader.
      * The bytewise contents of extradata must not depend on the architecture or CPU endianness.
      * - encoding: Set/allocated/freed by libavcodec.
      * - decoding: Set/allocated/freed by user.
      */
     uint8_t *extradata;
     int extradata_size;
     /**
      * This is the fundamental unit of time (in seconds) in terms
      * of which frame timestamps are represented. For fixed-fps content,
      * timebase should be 1/framerate and timestamp increments should be
      * identically 1.
      * - encoding: MUST be set by user.
      * - decoding: Set by libavcodec.
      */
     AVRational time_base;
  
     /* video only */
     /**
      * picture width / height.
      * - encoding: MUST be set by user.
      * - decoding: Set by libavcodec.
      * Note: For compatibility it is possible to set this instead of
      * coded_width/height before decoding.
      */
     int width, height;
  
     ......
  
     /* audio only */
     int sample_rate; ///< samples per second
     int channels;    ///< number of audio channels
  
     /**
      * audio sample format
      * - encoding: Set by user.
      * - decoding: Set by libavcodec.
      */
     enum SampleFormat sample_fmt;  ///< sample format
  
     /* The following data should not be initialized. */
     /**
      * Samples per packet, initialized when calling 'init'.
      */
     int frame_size;
     int frame_number;   ///< audio or video frame number
  
     ......
  
     char codec_name[32];
     enum AVMediaType codec_type; /* see AVMEDIA_TYPE_xxx */
     enum CodecID codec_id; /* see CODEC_ID_xxx */
  
     /**
      * fourcc (LSB first, so "ABCD" -> ('D'<<24) + ('C'<<16) + ('B'<<8) + 'A').
      * This is used to work around some encoder bugs.
      * A demuxer should set this to what is stored in the field used to identify the codec.
      * If there are multiple such fields in a container then the demuxer should choose the one
      * which maximizes the information about the used codec.
      * If the codec tag field in a container is larger then 32 bits then the demuxer should
      * remap the longer ID to 32 bits with a table or other structure. Alternatively a new
      * extra_codec_tag + size could be added but for this a clear advantage must be demonstrated
      * first.
      * - encoding: Set by user, if not then the default based on codec_id will be used.
      * - decoding: Set by user, will be converted to uppercase by libavcodec during init.
      */
     unsigned int codec_tag;           
  
     ......
  
     /**
      * Size of the frame reordering buffer in the decoder.
      * For MPEG-2 it is 1 IPB or 0 low delay IP.
      * - encoding: Set by libavcodec.
      * - decoding: Set by libavcodec.
      */
     int has_b_frames;
  
     /**
      * number of bytes per packet if constant and known or 0
      * Used by some WAV based audio codecs.
      */
     int block_align;
  
     ......
  
     /**
      * bits per sample/pixel from the demuxer (needed for huffyuv).
      * - encoding: Set by libavcodec.
      * - decoding: Set by user.
      */
      int bits_per_coded_sample; 
  
      ......
  
} AVCodecContext;

若是是單純使用libavcodec,這部分信息須要調用者進行初始化;若是是使用整個FFMPEG庫,這部分信息在調用 avformat_open_input和avformat_find_stream_info的過程當中根據文件的頭信息及媒體流內的頭部信息完成初始 化。其中幾個主要域的釋義以下:數據結構

  1. extradata/extradata_size:這個buffer中存放了解碼器可能會用到的額外信息,在av_read_frame中填充。通常來 說,首先,某種具體格式的demuxer在讀取格式頭信息的時候會填充extradata,其次,若是demuxer沒有作這個事情,好比可能在頭部壓根 兒就沒有相關的編解碼信息,則相應的parser會繼續從已經解複用出來的媒體流中繼續尋找。在沒有找到任何額外信息的狀況下,這個buffer指針爲 空。app

  2. time_base:less

  3. width/height:視頻的寬和高。ide

  4. sample_rate/channels:音頻的採樣率和信道數目。函數

  5. sample_fmt: 音頻的原始採樣格式。oop

  6. codec_name/codec_type/codec_id/codec_tag:編解碼器的信息。ui

    AVStream

    該結構體描述一個媒體流,定義以下:this

    ?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    typedef struct AVStream {
         int index;    /**< stream index in AVFormatContext */
         int id;       /**< format-specific stream ID */
         AVCodecContext *codec; /**< codec context */
         /**
          * Real base framerate of the stream.
          * This is the lowest framerate with which all timestamps can be
          * represented accurately (it is the least common multiple of all
          * framerates in the stream). Note, this value is just a guess!
          * For example, if the time base is 1/90000 and all frames have either
          * approximately 3600 or 1800 timer ticks, then r_frame_rate will be 50/1.
          */
         AVRational r_frame_rate;
      
         ......
      
         /**
          * This is the fundamental unit of time (in seconds) in terms
          * of which frame timestamps are represented. For fixed-fps content,
          * time base should be 1/framerate and timestamp increments should be 1.
          */
         AVRational time_base;
      
         ......
      
         /**
          * Decoding: pts of the first frame of the stream, in stream time base.
          * Only set this if you are absolutely 100% sure that the value you set
          * it to really is the pts of the first frame.
          * This may be undefined (AV_NOPTS_VALUE).
          * @note The ASF header does NOT contain a correct start_time the ASF
          * demuxer must NOT set this.
          */
         int64_t start_time;
         /**
          * Decoding: duration of the stream, in stream time base.
          * If a source file does not specify a duration, but does specify
          * a bitrate, this value will be estimated from bitrate and file size.
          */
         int64_t duration;
      
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
         char language[4]; /** ISO 639-2/B 3-letter language code (empty string if undefined) */
    #endif
      
         /* av_read_frame() support */
         enum AVStreamParseType need_parsing;
         struct AVCodecParserContext *parser;
      
         ......
      
         /* av_seek_frame() support */
         AVIndexEntry *index_entries; /**< Only used if the format does not
                                         support seeking natively. */
         int nb_index_entries;
         unsigned int index_entries_allocated_size;
      
         int64_t nb_frames;                 ///< number of frames in this stream if known or 0
      
         ......
      
         /**
          * Average framerate
          */
         AVRational avg_frame_rate;
         ......
    } AVStream;
     
    主要域的釋義以下,其中大部分域的值能夠由avformat_open_input根據文件頭的信息肯定,缺乏的信息須要經過調用avformat_find_stream_info讀幀及軟解碼進一步獲取:
     
         index/id:index對應流的索引,這個數字是自動生成的,根據index能夠從AVFormatContext::streams表中索引到該流;而id則是流的標識,依賴於具體的容器格式。好比對於MPEG TS格式,id就是pid。
         time_base:流的時間基準,是一個實數,該流中媒體數據的pts和dts都將以這個時間基準爲粒度。一般,使用av_rescale/av_rescale_q能夠實現不一樣時間基準的轉換。
         start_time:流的起始時間,以流的時間基準爲單位,一般是該流中第一個幀的pts。
         duration:流的總時間,以流的時間基準爲單位。
         need_parsing:對該流parsing過程的控制域。
         nb_frames:流內的幀數目。
         r_frame_rate/framerate/avg_frame_rate:幀率相關。
         codec:指向該流對應的AVCodecContext結構,調用avformat_open_input時生成。
         parser:指向該流對應的AVCodecParserContext結構,調用avformat_find_stream_info時生成。。
     
    AVFormatContext
     
    這個結構體描述了一個媒體文件或媒體流的構成和基本信息,定義以下:
     
    typedef struct AVFormatContext {
         const AVClass *av_class; /**< Set by avformat_alloc_context. */
         /* Can only be iformat or oformat, not both at the same time. */
         struct AVInputFormat *iformat;
         struct AVOutputFormat *oformat;
         void *priv_data;
         ByteIOContext *pb;
         unsigned int nb_streams;
         AVStream *streams[MAX_STREAMS];
         char filename[1024]; /**< input or output filename */
         /* stream info */
         int64_t timestamp;
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
         char title[512];
         char author[512];
         char copyright[512];
         char comment[512];
         char album[512];
         int year;  /**< ID3 year, 0 if none */
         int track; /**< track number, 0 if none */
         char genre[32]; /**< ID3 genre */
    #endif
      
         int ctx_flags; /**< Format-specific flags, see AVFMTCTX_xx */
         /* private data for pts handling (do not modify directly). */
         /** This buffer is only needed when packets were already buffered but
            not decoded, for example to get the codec parameters in MPEG
            streams. */
         struct AVPacketList *packet_buffer;
      
         /** Decoding: position of the first frame of the component, in
            AV_TIME_BASE fractional seconds. NEVER set this value directly:
            It is deduced from the AVStream values.  */
         int64_t start_time;
         /** Decoding: duration of the stream, in AV_TIME_BASE fractional
            seconds. Only set this value if you know none of the individual stream
            durations and also dont set any of them. This is deduced from the
            AVStream values if not set.  */
         int64_t duration;
         /** decoding: total file size, 0 if unknown */
         int64_t file_size;
         /** Decoding: total stream bitrate in bit/s, 0 if not
            available. Never set it directly if the file_size and the
            duration are known as FFmpeg can compute it automatically. */
         int bit_rate;
      
         /* av_read_frame() support */
         AVStream *cur_st;
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
         const uint8_t *cur_ptr_deprecated;
         int cur_len_deprecated;
         AVPacket cur_pkt_deprecated;
    #endif
      
         /* av_seek_frame() support */
         int64_t data_offset; /** offset of the first packet */
         int index_built;
      
         int mux_rate;
         unsigned int packet_size;
         int preload;
         int max_delay;
      
    #define AVFMT_NOOUTPUTLOOP -1
    #define AVFMT_INFINITEOUTPUTLOOP 0
         /** number of times to loop output in formats that support it */
         int loop_output;
      
         int flags;
    #define AVFMT_FLAG_GENPTS       0x0001 ///< Generate missing pts even if it requires parsing future frames.
    #define AVFMT_FLAG_IGNIDX       0x0002 ///< Ignore index.
    #define AVFMT_FLAG_NONBLOCK     0x0004 ///< Do not block when reading packets from input.
    #define AVFMT_FLAG_IGNDTS       0x0008 ///< Ignore DTS on frames that contain both DTS & PTS
    #define AVFMT_FLAG_NOFILLIN     0x0010 ///< Do not infer any values from other values, just return what is stored in the container
    #define AVFMT_FLAG_NOPARSE      0x0020 ///< Do not use AVParsers, you also must set AVFMT_FLAG_NOFILLIN as the fillin code works on frames and no parsing -> no frames. Also seeking to frames can not work if parsing to find frame boundaries has been disabled
    #define AVFMT_FLAG_RTP_HINT     0x0040 ///< Add RTP hinting to the output file
      
         int loop_input;
         /** decoding: size of data to probe; encoding: unused. */
         unsigned int probesize;
      
         /**
          * Maximum time (in AV_TIME_BASE units) during which the input should
          * be analyzed in avformat_find_stream_info().
          */
         int max_analyze_duration;
      
         const uint8_t *key;
         int keylen;
      
         unsigned int nb_programs;
         AVProgram **programs;
      
         /**
          * Forced video codec_id.
          * Demuxing: Set by user.
          */
         enum CodecID video_codec_id;
         /**
          * Forced audio codec_id.
          * Demuxing: Set by user.
          */
         enum CodecID audio_codec_id;
         /**
          * Forced subtitle codec_id.
          * Demuxing: Set by user.
          */
         enum CodecID subtitle_codec_id;
      
         /**
          * Maximum amount of memory in bytes to use for the index of each stream.
          * If the index exceeds this size, entries will be discarded as
          * needed to maintain a smaller size. This can lead to slower or less
          * accurate seeking (depends on demuxer).
          * Demuxers for which a full in-memory index is mandatory will ignore
          * this.
          * muxing  : unused
          * demuxing: set by user
          */
         unsigned int max_index_size;
      
         /**
          * Maximum amount of memory in bytes to use for buffering frames
          * obtained from realtime capture devices.
          */
         unsigned int max_picture_buffer;
      
         unsigned int nb_chapters;
         AVChapter **chapters;
      
         /**
          * Flags to enable debugging.
          */
         int debug;
    #define FF_FDEBUG_TS        0x0001
      
         /**
          * Raw packets from the demuxer, prior to parsing and decoding.
          * This buffer is used for buffering packets until the codec can
          * be identified, as parsing cannot be done without knowing the
          * codec.
          */
         struct AVPacketList *raw_packet_buffer;
         struct AVPacketList *raw_packet_buffer_end;
      
         struct AVPacketList *packet_buffer_end;
      
         AVMetadata *metadata;
      
         /**
          * Remaining size available for raw_packet_buffer, in bytes.
          * NOT PART OF PUBLIC API
          */
    #define RAW_PACKET_BUFFER_SIZE 2500000
         int raw_packet_buffer_remaining_size;
      
         /**
          * Start time of the stream in real world time, in microseconds
          * since the unix epoch (00:00 1st January 1970). That is, pts=0
          * in the stream was captured at this real world time.
          * - encoding: Set by user.
          * - decoding: Unused.
          */
         int64_t start_time_realtime;
    } AVFormatContext;

    這是FFMpeg中最爲基本的一個結構,是其餘全部結構的根,是一個多媒體文件或流的根本抽象。其中:

    一般,這個結構由avformat_open_input在內部建立並以缺省值初始化部分紅員。可是,若是調用者但願本身建立該結構,則須要顯式爲該結構的一些成員置缺省值——若是沒有缺省值的話,會致使以後的動做產生異常。如下成員須要被關注:

    AVPacket

    AVPacket定義在avcodec.h中,以下:

    ?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    typedef struct AVPacket {
         /**
          * Presentation timestamp in AVStream->time_base units; the time at which
          * the decompressed packet will be presented to the user.
          * Can be AV_NOPTS_VALUE if it is not stored in the file.
          * pts MUST be larger or equal to dts as presentation cannot happen before
          * decompression, unless one wants to view hex dumps. Some formats misuse
          * the terms dts and pts/cts to mean something different. Such timestamps
          * must be converted to true pts/dts before they are stored in AVPacket.
          */
         int64_t pts;
         /**
          * Decompression timestamp in AVStream->time_base units; the time at which
          * the packet is decompressed.
          * Can be AV_NOPTS_VALUE if it is not stored in the file.
          */
         int64_t dts;
         uint8_t *data;
         int   size;
         int   stream_index;
         int   flags;
         /**
          * Duration of this packet in AVStream->time_base units, 0 if unknown.
          * Equals next_pts - this_pts in presentation order.
          */
         int   duration;
         void  (*destruct)( struct AVPacket *);
         void  *priv;
         int64_t pos;                            ///< byte position in stream, -1 if unknown
      
         /**
          * Time difference in AVStream->time_base units from the pts of this
          * packet to the point at which the output from the decoder has converged
          * independent from the availability of previous frames. That is, the
          * frames are virtually identical no matter if decoding started from
          * the very first frame or from this keyframe.
          * Is AV_NOPTS_VALUE if unknown.
          * This field is not the display duration of the current packet.
          *
          * The purpose of this field is to allow seeking in streams that have no
          * keyframes in the conventional sense. It corresponds to the
          * recovery point SEI in H.264 and match_time_delta in NUT. It is also
          * essential for some types of subtitle streams to ensure that all
          * subtitles are correctly displayed after seeking.
          */
         int64_t convergence_duration;
    } AVPacket;

    FFMPEG使用AVPacket來暫存解複用以後、解碼以前的媒體數據(一個音/視頻幀、一個字幕包等)及附加信息(解碼時間戳、顯示時間戳、時長等)。其中:

    AVPacket結構自己只是個容器,它使用data成員引用實際的數據緩衝區。這個緩衝區一般是由av_new_packet建立的,但也可能由 FFMPEG的API建立(如av_read_frame)。當某個AVPacket結構的數據緩衝區再也不被使用時,要須要經過調用 av_free_packet釋放。av_free_packet調用的是結構體自己的destruct函數,它的值有兩種情 況:1)av_destruct_packet_nofree或0;2)av_destruct_packet,其中,狀況1)僅僅是將data和 size的值清0而已,狀況2)纔會真正地釋放緩衝區。

    FFMPEG內部使用AVPacket結構創建緩衝區裝載數據,同時提供destruct函數,若是FFMPEG打算本身維護緩衝區,則將 destruct設爲av_destruct_packet_nofree,用戶調用av_free_packet清理緩衝區時並不可以將其釋放;若是 FFMPEG打算將該緩衝區完全交給調用者,則將destruct設爲av_destruct_packet,表示它可以被釋放。安全起見,若是用戶但願 自由地使用一個FFMPEG內部建立的AVPacket結構,最好調用av_dup_packet進行緩衝區的克隆,將其轉化爲緩衝區可以被釋放的 AVPacket,以避免對緩衝區的不當佔用形成異常錯誤。av_dup_packet會爲destruct指針爲 av_destruct_packet_nofree的AVPacket新建一個緩衝區,而後將原緩衝區的數據拷貝至新緩衝區,置data的值爲新緩衝區 的地址,同時設destruct指針爲av_destruct_packet。

    • dts表示解碼時間戳,pts表示顯示時間戳,它們的單位是所屬媒體流的時間基準。

    • stream_index給出所屬媒體流的索引;

    • data爲數據緩衝區指針,size爲長度;

    • duration爲數據的時長,也是以所屬媒體流的時間基準爲單位;

    • pos表示該數據在媒體流中的字節偏移量;

    • destruct爲用於釋放數據緩衝區的函數指針;

    • flags爲標誌域,其中,最低爲置1表示該數據是一個關鍵幀。

    • probesize

    • mux_rate

    • packet_size

    • flags

    • max_analyze_duration

    • key

    • max_index_size

    • max_picture_buffer

    • max_delay

    • nb_streams和streams所表示的AVStream結構指針數組包含了全部內嵌媒體流的描述;

    • iformat和oformat指向對應的demuxer和muxer指針;

    • pb則指向一個控制底層數據讀寫的ByteIOContext結構。

    • start_time和duration是從streams數組的各個AVStream中推斷出的多媒體文件的起始時間和長度,以微妙爲單位。

相關文章
相關標籤/搜索