MP4文件格式解析

時間 2019-11-05

標籤 mp4 文件格式解析简体版

原文原文鏈接

文章轉自：http://www.cnblogs.com/CoderTian/p/8277965.htmlhtml

1.ISO/IEC 14496標準

ISO/IEC 14496是MPEG專家組制定的MPEG-4標準，分爲多個部分（仍在更新）。

參考： https://en.wikipedia.org/wiki/Category:ISO/IEC_14496

第一部分（ISO/IEC 14496-1）：系統：描述視頻和音頻數據流的控制、同步以及混合方式（即混流Multiplexing，簡寫爲MUX）。

第二部分（ISO/IEC 14496-2）：視頻：定義一個對各類視覺信息（包括天然視頻、靜止紋理、計算機合成圖形等等）的編解碼器。（例如XviD編碼就屬於MPEG-4 Part 2）

第三部分（ISO/IEC 14496-3）：音頻：定義一個對各類音頻信號進行編碼的編解碼器的集合。包括高級音頻編碼（Advanced Audio Coding，縮寫爲AAC）的若干變形和其餘一些音頻/語音編碼工具。

第四部分（ISO/IEC 14496-4）：一致性：定義對本標準其餘的部分進行一致性測試的程序。

第五部分（ISO/IEC 14496-5）：參考軟件：提供用於演示功能和說明本標準其餘部分功能的軟件。

第六部分（ISO/IEC 14496-6）：多媒體傳輸集成框架（DMIF for Delivery Multimedia Integration Framework）

第七部分（ISO/IEC 14496-7）：優化的參考軟件：提供對實現進行優化的例子（這裏的實現指的是第五部分）。

第八部分（ISO/IEC 14496-8）：在IP網絡上傳輸：定義在IP網絡上傳輸MPEG-4內容的方式。

第九部分（ISO/IEC 14496-9）：參考硬件：提供用於演示怎樣在硬件上實現本標準其餘部分功能的硬件設計方案。

第十部分（ISO/IEC 14496-10）：高級視頻編碼或稱高級視頻編碼（Advanced Video Coding，縮寫爲AVC）：定義一個視頻編解碼器（codec）。AVC和XviD都屬於MPEG-4編碼，但因爲AVC屬於MPEG-4 Part 10，在技術特性上比屬於MPEG-4 Part2的XviD要先進。另外，它和ITU-T H.264標準是一致的，故又稱爲H.264。

第十二部分（ISO/IEC 14496-12）：基於ISO的媒體文件格式：定義一個存儲媒體內容的文件格式。

第十三部分（ISO/IEC 14496-13）：知識產權管理和保護（IPMP for Intellectual Property Management and Protection）拓展。

第十四部分（ISO/IEC 14496-14）：MPEG-4文件格式：定義基於第十二部分的用於存儲MPEG-4內容的視頻文件格式。

第十五部分（ISO/IEC 14496-15）：AVC文件格式：定義基於第十二部分的用於存儲第十部分的視頻內容的文件格式。

第十六部分（ISO/IEC 14496-16）：動畫框架擴展（AFX : Animation Framework eXtension）。

第十七部分（ISO/IEC 14496-17）：同步文本字幕格式。

第十八部分（ISO/IEC 14496-18）：字體壓縮和流式傳輸（針對開放字體格式Open Font Format）。

第十九部分（ISO/IEC 14496-19）：合成材質流（Synthesized Texture Stream）。

第二十部分（ISO/IEC 14496-20）：簡單場景表示（LASeR for Lightweight Scene Representation。

第二十一部分（ISO/IEC 14496-21）：用於描繪（Rendering）的MPEG-J拓展。

第二十二部分（ISO/IEC 14496-22）：開放字體格式（Open Font Format）。

第二十三部分（ISO/IEC 14496-23）：符號化音樂表示（Symbolic Music Representation）。

第二十四部分（ISO/IEC 14496-24）：音頻與系統交互做用（Audio and systems interaction）。

第二十五部分（ISO/IEC 14496-25）：3D圖形壓縮模型（3D Graphics Compression Model）。

第二十六部分（ISO/IEC 14496-26）：音頻一致性檢查：定義測試音頻數據與ISO/IEC 14496-3是否一致的方法（Audio conformance）。

第二十七部分（ISO/IEC 14496-27）：3D圖形一致性檢查：定義測試3D圖形數據與ISO/IEC 14496-11:2005, ISO/IEC 14496-16:2006, ISO/IEC 14496-21:2006,和ISO/IEC 14496-25:2009是否一致的方法（3D Graphics conformance）。

MP4是在「ISO/IEC 14496-14」標準文件中定義的一種多媒體容器格式，它是MPEG4 （ISO/IEC 14496）標準的一部分。是「ISO/IEC 14496-12(MPEG-4 Part 12 ISO base media file format)」標準中所定義的媒體格式的一種實現。

1.1.本文章參考的標準文檔

本篇文章主要參考的標準文檔以下：

c068960_ISO_IEC_14496-12_2015.pdf
ISO_IEC_14496-14_2003-11-15.pdf

1.2.文檔中的部分術語解釋

box：由惟一類型標識符和長度定義的面向對象的構件
container box：用來容納一組相關box的box，container box一般都不是fullbox
chunk：同一軌道的一組連續的採樣
hint track：不包含媒體數據，但包含了將一個或多個軌打包到流頻道的指示
media data box：用來容納實體數據的box
movie box：子box定義了元數據（metadata）的容器box
sample：與單個時間戳相關聯的全部數據，video sample即爲一幀視頻，或一組連續視頻幀，audio sample即爲一段連續的壓縮音頻
sample description：定義和描述軌中的採樣的格式的結構
sample table：指明sampe時序和物理佈局的表
track：按時間排序的相關的採樣，對於媒體數據來講，track表示一個視頻或音頻序列

2.MP4容器格式

MP4是一種描述較爲全面的容器格式，被認爲能夠在其中嵌入任何形式的數據，以及各類編碼的音視頻等，不過咱們常見的大部分的MP4文件存放的AVC(H.264)或MPEG-4(Part 2)編碼的視頻和AAC編碼的音頻。

MP4的結構就像俄羅斯的套娃有不少box套box，也能夠理解爲一棵Box樹。下面這張圖是常見的box的樹結構圖，能夠用來大體瞭解MP4文件的構造：

2.1.MP4中的box結構

經過上面的介紹，咱們瞭解了MP4格式就是由一個個的box組合成的box樹，全部的數據都包含在box裏，下面來了解一下box的基本結構。一個box是由box header和box裏面包含的數據組成的，以下圖所示

整個box以box header開頭，box header中包含了box的大小（size）和類型（type）等信息。其中，size指明瞭整個box所佔用的大小，包括header部分，若是box很大(例如存放具體視頻數據的mdat box)，超過了uint32的最大數值，size就被設置爲1，並用接下來的8位uint64的largesize來存放大小。box中的字節序爲網絡字節序，也就是大端字節序（Big-Endian）。

box根據header部分包含的信息的不一樣能夠分爲box和full box，以下圖所示：

其中box和full box在ISO_IEC_14496-12_2015文檔中的定義爲

aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) 
{
    unsigned int(32) size;
    unsigned int(32) type = boxtype;
    if (size==1) 
    {
        unsigned int(64) largesize;
    } 
    else if (size==0) 
    {
        // box extends to end of file
    }
    if (boxtype==‘uuid’) 
    {
        unsigned int(8)[16] usertype = extended_type;
    }
}

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) 
{
    unsigned int(8) version = v;
    bit(24) flags = f;
}

full box中的version是一個用來指定該box的文件的格式的整數

flags 是一個標誌圖

2.2.MP4中各類box分析

ftyp：file type box，代表文件類型，該box只有一個而且只能被包含在文件層，不能被其餘box包含。同時，他應該出如今文件的最開始的位置。ftyp box包含一個32位的major brand（4個字符），一個32位的minor version（整數）和一個以32位爲單位的compatible數組。這些都是用來指示文件應用級別的信息，ftyp box在標準文檔中的定義以下。

aligned(8) class FileTypeBox extends Box(‘ftyp’) 
{
    unsigned int(32) major_brand;
    unsigned int(32) minor_version;
    unsigned int(32) compatible_brands[]; // to end of the box
}

1.major_brand： 是一個標識符，如mp42

2.minor_version： 是一個major brand 的次版本標識

3.compatible_brands：是一個list，一直到box的結尾

下面來分析一個示例文件

box的類型爲ftyp box大小爲24個字節，其中major_brand和minor_version都是mp42。數組

mdat：該box包含於文件層，能夠有多個，也能夠沒有（當媒體數據所有爲外部文件引用時），用來存儲媒體數據。數據直接跟在box type字段後面，它的結構是由metadata來描述的，metadata經過文件中的絕對偏移來引用媒體數據。它在標準文檔中的定義以下

aligned(8) class MediaDataBox extends Box(‘mdat’) 
{
    bit(8) data[];
}

free：free box中的內容是可有可無的，能夠被忽略。該box被刪除後，不會對播放產生任何影響，它的type域能夠是free或skip。freebox 在標準文檔中的定義以下

aligned(8) class FreeSpaceBox extends Box(free_type) 
{
    unsigned int(8) data[];
}

moov：movie box，用來存放媒體的metadata信息，其內容信息由子box詮釋。該box有且只有一個而且包含在文件層，通常狀況下moov box會緊隨ftyp box出現，但也有放在文件末尾的。它在標準文檔中的定義爲

aligned(8) class MovieBox extends Box(‘moov’)
{

}

mvhd：用來存放文件的整體信息，如時長和建立時間等。它是獨立於媒體的而且與整個播放相關。mvhd box在標準文檔中的定義以下

aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) 
{
    if (version==1) 
    {
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) timescale;
        unsigned int(64) duration;
    } 
    else 
    { // version==0
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) timescale;
        unsigned int(32) duration;
    }
    template int(32) rate = 0x00010000; // typically 1.0
    template int(16) volume = 0x0100; // typically, full volume
    const bit(16) reserved = 0;
    const unsigned int(32)[2] reserved = 0;
    // Unity matrix
    template int(32)[9] matrix = { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
    bit(32)[6] pre_defined = 0;
    unsigned int(32) next_track_ID;
}

1.version：用來指定該box的版本，取值爲0或1，通常爲0

2.creation_time：用來指定建立時間，單位爲相對於UTC時間1904-01-01零點的秒數

3.modification_time：用來指定最後修改時間

4.timescale：用來指定文件媒體在1秒時間內的刻度值，能夠理解爲1秒長度的時間單元數

5.duration：用來指定該track的時間長度，用duration和time scale值能夠計算track時長，好比audio track的time scale = 8000, duration = 560128，時長爲70.016，video track的time scale = 600, duration = 42000，時長爲70

6.rate：用來指定推薦播放速率，高16位和低16位分別爲小數點整數部分和小數部分，即[16.16] 格式，該值爲1.0（0x00010000）表示正常前向播放

7.volume：用來指定推薦的音量，與rate相似，[8.8] 格式，1.0（0x0100）表示最大音量

8.matrix：用來指定視頻變換矩陣

9.next_track_ID：用來指定下一個track使用的id號

下面來分析一個示例文件

trak：trak box也是一個container box，其子box包含了該track的媒體數據引用和描述（hint track除外）。一個MP4文件中的媒體能夠包含多個track，且至少有一個track，這些track之間彼此獨立，有本身的時間和空間信息。trak box必須包含一個tkhd box和一個mdia box，trak box 在標準文檔中的定義以下

aligned(8) class TrackBox extends Box(‘trak’) 
{
}

tkhd：包含了該track的特性和整體信息，如時長，寬高等。tkhd box在標準文檔中的定義以下

aligned(8) class TrackHeaderBox extends FullBox(‘tkhd’, version, flags)
{
    if (version==1) 
    {
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) track_ID;
        const unsigned int(32) reserved = 0;
        unsigned int(64) duration;
    } 
    else 
    { // version==0
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) track_ID;
        const unsigned int(32) reserved = 0;
        unsigned int(32) duration;
    }
    const unsigned int(32)[2] reserved = 0;
    template int(16) layer = 0;
    template int(16) alternate_group = 0;
    template int(16) volume = {if track_is_audio 0x0100 else 0};
    const unsigned int(16) reserved = 0;
    // unity matrix
    template int(32)[9] matrix= { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
    unsigned int(32) width;
    unsigned int(32) height;
}

1.creation_time ：指定建立時間（相對於UTC時間1904-01-01零點的秒數）

2.modification_time：指定修改時間

3.track_ID： 指定track的id號，不能重複且不能爲0

4.reserved： 保留位

5.duration： 指定track的時長

6.reserved： 保留位

7.layer：指定視頻層，默認爲0，值小的在上層

8.alternate_group：指定rack分組信息，默認爲0表示該track未與其餘track有羣組關係

9.volume： 指定[8.8] 格式的音量信息，若是爲音頻track，1.0（0x0100）表示最大音量；不然爲0

10.reserved： 保留位

11.matrix：指定視頻變換矩陣

12.width：寬,爲 [16.16] 格式值，與sample描述中的實際畫面大小比值，用於播放時的展現寬高

13.height：高

mdia：包含類整個track的媒體信息，好比媒體類型和sample信息，它在標準文檔中的定義以下

aligned(8) class MediaBox extends Box(‘mdia’) 
{
}

mdhd：包含了了該track的整體信息，mdhd 和 tkhd 內容大體都是同樣的。tkhd 一般是對指定的 track 設定相關屬性和內容，而 mdhd 是針對於獨立的 media 來設置的，通常狀況下兩者相同。它在標準文檔中的定義以下

aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) 
{
    if (version==1) 
    {
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) timescale;
        unsigned int(64) duration;
    } 
    else 
    { 
        // version==0
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) timescale;
        unsigned int(32) duration;
    }
    bit(1) pad = 0;
    unsigned int(5)[3] language; // ISO-639-2/T language code
    unsigned int(16) pre_defined = 0;
}

1.creation time：建立時間（相對於UTC時間1904-01-01零點的秒數）

2.modification time：修改時間

3.time scale：同前表

4.duration： track的時長

5.language：媒體語言碼。最高位爲0，後面15位爲3個字符（見ISO 639-2/T標準中定義）

hdlr：解釋了媒體的播放過程信息，該box也能夠被包含在meta box（meta）中，它在標準文檔中的定義以下

aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) 
{
    unsigned int(32) pre_defined = 0;
    unsigned int(32) handler_type;
    const unsigned int(32)[3] reserved = 0;
    string name;
}

1.handler_type： 在media box中，該值爲4個字符：

「vide」— video track

「soun」— audio track

「hint」— hint track

2.name ：human‐readable name for the track type

minf：Media Information Box,minf box包含了全部描述該track中的媒體信息的對象，信息存儲在其子box中，它在標準文檔中的定義爲

aligned(8) class MediaInformationBox extends Box(‘minf’) 
{
}

vmhd：用在視頻track中，包含當前track的視頻描述信息（如視頻編碼等信息）。它在標準文檔中的定義爲

aligned(8) class VideoMediaHeaderBox extends FullBox(‘vmhd’, version = 0, 1) 
{
    template unsigned int(16) graphicsmode = 0; // copy, see below
    template unsigned int(16)[3] opcolor = {0, 0, 0};
}

1.graphicsmode： 視頻合成模式，爲0時拷貝原始圖像，不然與opcolor進行合成

2.opcolor：｛red，green，blue｝

smhd：用在音頻track中，包含當前track的音頻描述信息（如編碼格式等信息）。它在標準文檔中的定義爲

aligned(8) class SoundMediaHeaderBox extends FullBox(‘smhd’, version = 0, 0) 
{
    template int(16) balance = 0;
    const unsigned int(16) reserved = 0;
}

1.balance：立體聲平衡，[8.8] 格式值，通常爲0，-1.0表示所有左聲道，1.0表示所有右聲道

dinf：dinf box解釋如何定位媒體信息，是一個container box。dinf box通常包含一個dref box，即data reference box。它在標準文檔中的定義以下

aligned(8) class DataInformationBox extends Box(‘dinf’) 
{
}

dref：dref box是用來設置當前 Box 描述信息的 data_entry,dref box下會包含若干個「url」或「urn」，這些box組成一個表，用來定位track數據。簡單的說，track能夠被分紅若干段，每一段均可以根據「url」或「urn」指向的地址來獲取數據，sample描述中會用這些片斷的序號將這些片斷組成一個完整的track。通常狀況下，當數據被徹底包含在文件中時，「url」或「urn」中的定位字符串是空的。它在標準文檔中的定義以下

aligned(8) class DataEntryUrlBox (bit(24) flags) extends FullBox(‘url ’, version = 0, flags)
{
    string location;
}
aligned(8) class DataEntryUrnBox (bit(24) flags) extends FullBox(‘urn ’, version = 0, flags) 
{
    string name;
    string location;
}
aligned(8) class DataReferenceBox extends FullBox(‘dref’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) 
    {
        DataEntryBox(entry_version, entry_flags) data_entry;
    }
}

1.entry_version: 用來指明當前 entry 的格式

2.entry_flags: 其值不是固定的，可是有一個特殊的值, 0x000001 用來表示當前 media 的數據和 moov 包含的數據一致

stbl： stbl box幾乎是普通的MP4文件中最複雜的一個box了，首先須要回憶一下sample的概念。sample是媒體數據存儲的單位，存儲在media的chunk中，chunk和sample的長度都可互不相同，以下圖所示

「stbl」包含了關於track中sample全部時間和位置的信息，以及sample的編解碼等信息。利用這個表，能夠解釋sample的時序、類型、大小以及在各自存儲容器中的位置。「stbl」是一個container box，其子box包括：sample description box（stsd）、time to sample box（stts）、sample size box（stsz或stz2）、sample to chunk box（stsc）、chunk offset box（stco或co64）、composition time to sample box（ctts）、sync sample box（stss）等。

「stsd」必不可少，且至少包含一個條目，該box包含了data reference box進行sample數據檢索的信息。沒有「stsd」就沒法計算media sample的存儲位置。「stsd」包含了編碼的信息，其存儲的信息隨媒體類型不一樣而不一樣。

stbl box在標準文檔中的定義以下

aligned(8) class SampleTableBox extends Box(‘stbl’) 
{
}

stsd：box header和version字段後會有一個entry count字段，根據entry的個數，每一個entry會有type信息，如「vide」、「sund」等，根據type不一樣sample description會提供不一樣的信息，例如對於video track，會有「VisualSampleEntry」類型信息，對於audio track會有「AudioSampleEntry」類型信息。視頻的編碼類型、寬高、長度，音頻的聲道、採樣等信息都會出如今這個box中

aligned(8) abstract class SampleEntry (unsigned int(32) format) extends Box(format)
{
    const unsigned int(8)[6] reserved = 0;
    unsigned int(16) data_reference_index;
}

aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', version, 0)
{
    int i ;
    unsigned int(32) entry_count;
    for (i = 1 ; i <= entry_count ; i++)
    {
        SampleEntry(); // an instance of a class derived from SampleEntry
    }
}

視頻track的stsd網絡

          [stsd] size=12+149
            entry-count = 1
            [avc1] size=8+137
              data_reference_index = 1
              width = 720
              height = 576
              compressor = 
              [avcC] size=8+51
                Configuration Version = 1
                Profile = High
                Profile Compatibility = 0
                Level = 40
                NALU Length Size = 4
                Sequence Parameter = [67 64 00 28 ac d1 00 b4 12 6c 08 40 00 00 03 00 40 00 00 0c b8 08 00 16 e3 40 00 5b 8d e4 93 00 f8 c1 88 90]
                Picture Parameter = [68 eb ef 2c]

音頻track的stsd框架

          [stsd] size=12+79
            entry-count = 1
            [mp4a] size=8+67
              data_reference_index = 1
              channel_count = 2
              sample_size = 16
              sample_rate = 48000
              [esds] size=12+27
                [ESDescriptor] size=2+25
                  es_id = 0
                  stream_priority = 31
                  [DecoderConfig] size=2+17
                    stream_type = 5
                    object_type = 64
                    up_stream = 0
                    buffer_size = 531
                    max_bitrate = 129336
                    avg_bitrate = 125368
                    DecoderSpecificInfo = 11 90 
                  [Descriptor:06] size=2+1

ctts：cts box的做用能夠參考下面文章http://blog.csdn.net/w839687571/article/details/41725811，它在標準文檔中的定義爲

aligned(8) class CompositionOffsetBox extends FullBox(‘ctts’, version, 0) 
{
    unsigned int(32) entry_count;
    int i;
    if (version==0) 
    {
        for (i=0; i < entry_count; i++) 
        {
            unsigned int(32) sample_count;
            unsigned int(32) sample_offset;
        }
    }
    else if (version == 1) 
    {
        for (i=0; i < entry_count; i++) 
        {
            unsigned int(32) sample_count;
            signed int(32) sample_offset;
        }
    }
}

stts：stts box存儲了sample的duration，描述了sample時序的映射方法，咱們經過它能夠找到任什麼時候間的sample。stts box能夠包含一個壓縮的表來映射時間和sample序號，用其餘的表來提供每一個sample的長度和指針。表中每一個條目提供了在同一個時間偏移量裏面連續的sample序號，以及samples的偏移量。遞增這些偏移量，就能夠創建一個完整的time to sample表（時間戳到sample序號的映射表）。它在標準文檔中的定義爲

aligned(8) class TimeToSampleBox extends FullBox(’stts’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    int i;
    for (i=0; i < entry_count; i++) 
    {
        unsigned int(32) sample_count;
        unsigned int(32) sample_delta;
    }
}

stsz：「stsz」定義了每一個sample的大小，包含了媒體中所有sample的數目和一張給出每一個sample大小的表。這個box相對來講體積是比較大的。它在標準文檔中的定義爲

aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) 
{
    unsigned int(32) sample_size;
    unsigned int(32) sample_count;
    if (sample_size==0) 
    {
        for (i=1; i <= sample_count; i++) 
        {
            unsigned int(32) entry_size;
        }
    }
}

stsc：用chunk組織sample能夠方便優化數據獲取，一個thunk包含一個或多個sample。「stsc」中用一個表描述了sample與chunk的映射關係，查看這張表就能夠找到包含指定sample的thunk，從而找到這個sample。它在標準文檔中的定義爲

aligned(8) class SampleToChunkBox extends FullBox(‘stsc’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) 
    {
        unsigned int(32) first_chunk;
        unsigned int(32) samples_per_chunk;
        unsigned int(32) sample_description_index;
    }
}

stss：「stss」肯定media中的關鍵幀。對於壓縮媒體數據，關鍵幀是一系列壓縮序列的開始幀，其解壓縮時不依賴之前的幀，然後續幀的解壓縮將依賴於這個關鍵幀。「stss」能夠很是緊湊的標記媒體內的隨機存取點，它包含一個sample序號表，表內的每一項嚴格按照sample的序號排列，說明了媒體中的哪個sample是關鍵幀。若是此表不存在，說明每個sample都是一個關鍵幀，是一個隨機存取點。它在標準文檔中的定義爲

aligned(8) class SyncSampleBox extends FullBox(‘stss’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    int i;
    for (i=0; i < entry_count; i++) 
    {
        unsigned int(32) sample_number;
    }
}

stco：「stco」定義了每一個thunk在媒體流中的位置，sample的偏移能夠根據其餘box推算出來。位置有兩種可能，32位的和64位的，後者對很是大的電影頗有用。在一個表中只會有一種可能，這個位置是在整個文件中的，而不是在任何box中的，這樣作就能夠直接在文件中找到媒體數據，而不用解釋box。須要注意的是一旦前面的box有了任何改變，這張表都要從新創建，由於位置信息已經改變了。它在標準文檔中的定義爲：

aligned(8) class ChunkOffsetBox extends FullBox(‘stco’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) 
    {
        unsigned int(32) chunk_offset;
    }
}
aligned(8) class ChunkLargeOffsetBox extends FullBox(‘co64’, version = 0, 0) 
{
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) 
    {
        unsigned int(64) chunk_offset;
    }
}