Speex詳解(2019年09月25日更新)

Speex詳解 html

整理者:赤勇玄心行天道 node

QQ號:280604597 linux

微信號:qq280604597 android

QQ羣:511046632 web

博客:www.cnblogs.com/gaoyaguo redis

你們有什麼不明白的地方,或者想要詳細瞭解的地方能夠聯繫我,我會認真回覆的! 算法

你能夠隨意轉載,無需註明出處! express

寫文檔實屬不易,我但願你們能支持我、捐助我,金額隨意,1塊也是支持,我會繼續幫助你們解決問題! 編程

  1. 簡介

Speex is an Open Source/Free Software patent-free audio compression format designed for speech. The Speex Project aims to lower the barrier of entry for voice applications by providing a free alternative to expensive proprietary speech codecs. Moreover, Speex is well-adapted to Internet applications and provides useful features that are not present in most other codecs. Finally, Speex is part of the GNU Project and is available under the revised BSD license.api

Speex(音標[spi:ks])是一套開源免費的、無專利保護的、針對語音設計的音頻壓縮格式。Speex項目經過以提供昂貴的專用語音編解碼器的免費替代方案爲目標,來下降語音應用程序的進入門檻。此外,Speex很是適用於互聯網應用程序,並提供了其餘大多數編解碼器中不存在的有用特性。最後,Speex是GNU項目的一部分,能夠在修訂後的BSD許可證下使用。

 

Speex庫官方網站:http://www.speex.org/

Speex庫API官方英文詳解:http://www.speex.org/docs/api/speex-api-reference/index.html

NSpeex庫(用於.Net和Silverlight的Speex庫)官方網站:http://nspeex.codeplex.com/

 

Speex庫目前最新的版本是Speex 1.2.0和SpeexDSP 1.2.0。

 

注意:Speex編解碼器已經被Opus編解碼器淘汰,Speex仍是能夠繼續使用,因爲Opus比Speex在各方面都更好,因此鼓勵你們切換到Opus,可是Opus只支持編碼和解碼,不支持噪音抑制、聲學回音消除等其餘處理功能。

  1. 歷史

    1. SpeexDSP 1.2.0 out

June 7, 2019

 

This is the latest stable release of the SpeexDSP library.

  1. Speex 1.2.0 out

December 7, 2016

 

This is the latest stable release of the Speex 1.2.0 codec library.

  1. SpeexDSP 1.2rc3 is out

January 3, 2015

 

This brown-paper-bag release adds two headers that should have been included with SpeexDSP 1.2rc2. These are needed to build the resampler with NEON optimizations and to build SpeexDSP without the Speex codec library.

  1. Speex 1.2rc2 and SpeexDSP 1.2rc2 are out

December 6, 2014

 

This release splits the speex codec library and the speex DSP library into separate source trees. Both projects received build-system improvements, bugfixes, and cleanup. The speex codec's VBR tuning was improved, while the speexdsp resampler got some NEON optimizations.

  1. Speex 1.2rc1 is out

July 23, 2008

 

This release adds support for acoustic echo cancellation with multiple microphones and multiple loudspeakers. It also adds an API to decorrelate loudspeaker signals to improve multi-channel performance. In the bugfix department, there are fixes for a few bugs in the echo canceller, jitter buffer and preprocessor. At this point, the API for 1.2 should be stable and only a few very minor additions are planned.

  1. Speex 1.2beta3 is out

December 11, 2007

 

The most obvious change in this release is that all the non-codec components (preprocessor, echo cancellation, jitter buffer) have been moved to a new libspeexdsp library. Other changes include a new jitter buffer algorithm and resampler improvements/fixes. This is also the first release where libspeex can be built without any floating point support. To do this, the float compatibility API must be disabled (--disable-float-api or DISABLE_FLOAT_API) and the VBR feature must be disabled (--disable-vbr or DISABLE_VBR).

  1. Speex 1.2beta2: Fixed-point improvements and more

May 24, 2007

 

Again, this new releases brings many improvements. The RAM requirement for wideband has gone down drastically (i.e. more than 2x). A new resampler module has been added, providing arbitrary sampling rate conversion -- fast. The echo canceller has also been improved. A bug in 1.2beta1 that made the echo canceller unstable has been fixed. The echo canceller should now converge faster, be robust and tolerant of incorrect capture-playback synchronisation. The preprocessor has also been greatly improved. Not only should the quality be better, but it is now fully converted to fixed-point. At last, early TriMedia support (incomplete) has been merged.

  1. Speex 1.2beta1: Better, smaller, faster and more

September 4, 2006

 

This new release brings many significant improvements. The quality has been improved, both at the encoder level and the decoder level. These include enhancer improvements (now on by default), input/output high-pass filters, as well as fixing minor regressions in previous 1.1.x releases. A strange and rare instability problem with pure sinusoids has also been fixed. On top of that, memory use has been greatly reduced, especially for fixed-point and narrowband. The fixed-point narrowband encoder+decoder memory use has been cut by more than half, making it possible to fit both in less than 6 kB of RAM. In general, CPU requirement had gone down, especially for the fixed-point port. The Blackfin port has been speeded up significantly, thanks to David Rowe. There are also a few fixes for the TI C5X DSPs, as well as better support for C++ compilers and crappy MS compilers. Oh, and before anyone starts worrying, the format (bit-stream) itself has not changed, so Speex is still compatible with version 1.0 and will continue to be in the future.

 

Non-codec improvements include a extension (easier to use) to the echo canceller API and a Speex-independent version of the jitter buffer. The echo canceller should also be more robust to saturation in the capture path. Last, but not least, the documentation has been updated.

  1. How the Echo Canceller works

August 7, 2006

 

Always wanted to know how the echo canceller works? On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk has just been accepted for IEEE Trans. on Audio, Speech and Language Processing.

  1. New website design

August 6, 2006

 

We have a new website design. There may be a few problems, so please report them and be patient.

  1. On How Speex Works

March 19, 2006

 

Interested in how Speex works? Have a look at "Improved Noise Weighting in CELP Coding of Speech - Applying the Vorbis Psychoacoustic Model To Speex", authored by Jean-Marc Valin and Christopher Montgomery.

  1. Speex 1.1.12 Released

February 19, 2006

 

New things:

 

echo canceller converted to fixed-point (sponsored by Analog Devices)

Improvements to the experimental Vorbis-based masking model (use --enable-vorbis-psy as an argument to the configure script)

several bug fixes

  1. Speex 1.1.11.1 Released

December 2, 2005

 

This is a brown-paper-bag release fixing a pretty bad bug that affected the fixed-point port in 1.1.11. Architectures that use float were not affected at all. Architectures that use fixed-point had a big drop in audio quality. Only version 1.1.11 is affected. Sorry about the inconvenience.

  1. Speex 1.1.11 Released

November 20, 2005

 

This release includes lots of bug fixes. These include SSE, fixed-point and Blackfin. Also, the echo canceller and packet loss concealment have been improved.

  1. Speex 1.1.10 Released

June 11, 2005

 

The main improvement in this release is a Blackfin port funded by Analog Devices. This includes Blackfin assembly optimizations that reduce cpu time by a factor of two. Also, the packet loss concealment code has now been converted to fixed-point and some of bugs for 16-bit architectures were fixed.

  1. Speex 1.1.9 Released

June 1, 2005

 

The main improvement in this release is that the acoustic echo canceller is finally usable. This work has been sponsored by Tipic Inc. Also, several bugs have been fixed for the TI C5x port.

  1. Speex 1.1.8 Released

May 7, 2005

 

Lots of changes in this release. Initial TI C5x port, some fixed-point improvements and fixes, better temporary memory allocation (smaller), and the size of integer types are now detected automatically.

 

There is also a new SPEEX_PLC_TUNING option.

  1. Speex 1.0.5 Released

May 6, 2005

 

The main change with this release is that it includes API additions from the 1.1.x branch (while being backward compatible), so that transition from 1.0.x to 1.1.x can be made easier.

  1. Speex 1.1.7 Released

March 2, 2005

 

The changes for this release are very broad and include generic optimizations in the encoder, ARM-specific optimizations (gcc inline assembly), optional shortcuts in the encoder sacrificing quality for speed, fixed-point improvements (perceptual enhancement converted), reduction in memory usage, the Symbian code now uses the same API, and several bug fixes.

  1. New Specialized libs Released

November 18, 2004

 

libspeex_emce.lib is an x86 emulator build (with debug information)

ibspeex_armce.lib is an ARMV4 release build

  1. Speex 1.1.6 Released

July 28, 2004

 

There are seven changes in this release.

 

Improved jitter buffer (now actually works)

Denoiser tuning

Improved echo canceller (please send feedback)

Support for Symbian OS

Gapless playback for speexenc/speexdec

Run-time identification of Speex version with a new speex_lib_ctl() call

Moved the includes to /usr/include/speex/

  1. Speex 1.0.4 Released

July 21, 2004

 

There are three changes in this release.

 

Headers are now in /usr/include/speex/ (but a copy is still in /usr/include for compat reasons).

Pseudo-gapless playback (i.e. playback has the same number of samples)

Fixed a potential bug (unconfirmed) that might cause a segfault in special circumstances.

  1. Speex 1.1.5 Released

April 21, 2004

 

The main change in this release is that the 1.1.5 API and ABI are now compatible with 1.0.x. The versions of the functions taking a short* now have an "_int" suffix, as in speex_encode_int().

  1. Speex 1.1.4 Released

January 20, 2004

 

Happy Belated New Year. This release has minor fixed-point improvements and a code cleanup. The SSE code has been converted from inline assembly to SSE intrinsics, so it should now work on win32. More functions have been written to use SSE.

  1. Speex 1.1.3 Released

December 2, 2003

 

This unstable release brings more improvements to the fixed-point port. Many new functions have been converted and most modes now work in real-time. I encourage everyone to test this code by compiling with --enable-fixed-point and --enable-fixed-point-debug and report any error messages and send the (smallest possible) file which reproduces the problem.

  1. Speex 1.0.3 Released

November 19, 2003

 

In this bugfix release: a fix for a multithreading bug and a correction for an underflow problem that could slow decoding dramatically on x86 processors.

  1. Speex 1.1.2 Released

November 11, 2003

 

This new unstable release improves on the fixed-point port started in 1.1.1. The port is not yet complete, but many modes are now usable in real-time on ARM processors. The fixed-point version is enabled with --enable-fixed-point and ARM-specific optimizations can be enabled with --enable-arm-asm.

  1. Speex 1.1.1 Released

November 1, 2003

 

This release adds a partial fixed-point port which can be enabled using the --enable-fixed-point option at configure time. Not all floating-point operations have been converted yet, but all the code should work.

  1. Speex 1.0.2 Released

September 24, 2003

 

Just a bugfix release. This update adds soundcard support for Solaris and the BSDs as well as minor bugfixes and a documentation update.

  1. Features 特性

Speex is based on CELP and is designed to compress voice at bitrates ranging from 2 to 44 kbps. Some of Speex's features include:

Speex編解碼器是基於CELP(Code Excited Linear Prediction)激勵線性預測編碼的,而且專門爲2至44kbps的語音壓縮而設計的。Speex的一些特性包括:

  • 只支持8000Hz窄帶(Narrow Band)、16000Hz寬帶(Wide Band)、32000Hz超寬帶(Ultra Wide Band)的三種帶模式進行編解碼,不支持其餘採樣頻率。
  • 只支持單聲道,不支持多聲道。
  • 只能對音頻數據進行處理,不支持音頻數據的輸入輸出,也就是不支持錄音和播放。
  • 支持強化立體聲編碼(Intensity Stereo Encoding)。
  • 支持數據包丟失隱藏(Packet Loss Concealment、PLC)。
  • 支持固定比特率(Constant Bit Rate、CBR)。
  • 支持可變比特率(Variable Bit Rate、VBR)。
  • 支持平均比特率(Average Bit Rate、ABR)。
  • 支持非連續傳輸(Discontinuous transmission、DTX)。
  • 支持定點執行(Fixed-point implementation)。
  • 支持浮點執行(Floating-point implementation)。
  • 支持聲學回音消除(Acoustic Echo Canceller、AEC)。
  • 支持殘餘迴音消除(Residual Echo Canceller、REC)。
  • 支持噪音抑制(Noise Suppression、NS)。
  • 支持混響音消除(Dereverb)。
  • 支持自動增益控制(Automatic Gain Control、AGC)。
  • 支持語音活動檢測(Voice Activity Detection、VAD)。
  • 支持多速率(multi-rate)。
  • 支持嵌入式(Embedded)。
  • 支持重採樣(Resample)。

 

Codec Quality Comparison

編解碼器的質量比較

Warning: these are machine-generated results (not from real listeners) and hence should be taken with a grain of salt.

注意: 這是機器自動生成的結果(不是來源於真正的聽衆),所以不可全信。

 

Codec Feature Comparison

編解碼器的特性比較

Codec

編解碼器

Rate

頻率

(kHz)

bitrate

比特率

(kbps)

delay

延時

frame+lookahead

(ms)

multi-rate

多速率

embedded

嵌入式

VBR

可變比特率

PLC

數據包丟失隱藏

bit-robust

license

受權

Speex

8,16,32

2.15-24.6(NB)

4-44.2(WB)

20+10(NB)

20+14(WB)

yes

yes

yes

yes

 

open-source/

free software

iLBC

8

15.2 or 13.3

20+5 or 30+10

     

yes

 

no charge, but not open-source

AMR-NB

8

4.75-12.2

20+5?

yes

   

yes

yes

proprietary

AMR-WB

(G.722.2)

16

6.6-23.85

20+5?

yes

   

yes

yes

proprietary

G.722.1

(Siren7)

16

(16) 24, 32

20+20

yes

   

yes

yes

no charge, but not open-source

G.729

8

8

10+5

     

yes

yes

proprietary

GSM-FR

8

13

20+?

     

?

?

patented?

GSM-EFR

8

12.2

20+?

     

yes

yes

proprietary

G.723.1

8

5.3 6.3

37.5

     

yes

?

proprietary

G.728

8

16

0.625

         

proprietary

G.722

16

48 56 64

?

 

yes

   

?

?

 

Definitions

定義

 

multi-rate

多速率

Allows the codec to change bitrate dynamically, at any moment

容許編解碼器能夠在任什麼時候候動態改變比特率。

 

embedded

嵌入式

A codec that embeds narrowband bitstreams in wideband bitstreams

編解碼器會將窄帶比特流嵌入到寬帶比特流中。

 

VBR

可變比特率

Variable bitrate

可變比特率

 

PLC

數據包丟失隱藏

Packet loss concealment

數據包丟失隱藏

 

bit-robust

Robust to corruption at the bit level, as found on wireless networks

 

Special Features

重要特性

 

Speex

Speex

Speex supports intensity stereo encoding and 32 kHz sampling

Speex支持強化立體聲編碼和32kHz採樣。

 

iLBC

iLBC

iLBC frames are encoded completely independently; while this provides better quality when 10% (or more) of the packets are being dropped, this makes the codec suboptimal for clean line conditions.

iLBC的幀編碼是徹底獨立的;當丟包率達到10%(或更大)時,它能提供更好的質量,這使編解碼器適合不太乾淨的線路環境。

  1. 下載

    1. speex-1.2.0

speex-1.2.0包含了如下幾個項目:

  1. libspeex:libspeex靜態庫,庫裏面包含了編碼和解碼相關的函數。
  2. speexenc:speex編碼器控制檯程序,輸入是ogg格式封裝的speex編碼的spx文件,輸出是格式爲raw PCM或者WAVE文件,有控制檯參數提示信息。依賴libogg庫。
  3. speexdec:speex解碼器控制檯程序,輸入是格式爲raw PCM或者WAVE文件,輸出是ogg格式封裝的speex編碼的spx文件,有控制檯參數提示信息。依賴libogg庫。
  4. testenc:測試窄帶編碼。
  5. testenc_wb:測試寬帶編碼。
  6. testenc_uwb:測試超寬帶編碼。
  1. speexdsp-1.2.0

speexdsp-1.2.0包含了如下幾個項目:

  1. libspeexdsp:libspeexdsp靜態庫,庫裏面包含了預處理器、聲學回音消除器、重採樣器、自適應抖動緩衝器等相關的函數。
  2. testdenoise:測試噪音抑制。
  3. testecho:測試聲學回音消除。
  4. testresample:測試重採樣。
  1. 編譯

    1. Visual Studio編譯libspeex

      1. Visual Studio + speex-1.2.0
      2. 編譯speex-1.2.0就能夠了。
    2. Visual Studio編譯speexenc、speexdec

      1. libogg下載地址:http://www.linuxfromscratch.org/blfs/view/svn/multimedia/libogg.html
      2. 打開libogg-1.3.2\win32\VS2010\libogg_static.sln。
      3. 編譯libogg_static項目。
      4. 在speex-1.2.0\include文件夾下,新建ogg文件夾。
      5. 複製libogg-1.3.2\include\ogg文件夾下的ogg.h和os_types.h到speex-1.2.0\include\ogg文件夾下。
      6. 在speex-1.2.0文件夾下,新建lib文件夾。
      7. 複製libogg-1.3.2\win32\VS2010\Win32\Debug文件夾下的libogg_static.lib到speex-1.2.0\lib文件夾下。
      8. 打開speex-1.2.0\win32\VS2008\libspeex.sln。
      9. 將speex-1.2.0\lib\libogg_static.lib添加到speexenc和speexdec中。
      10. 編譯speexenc和speexdec項目。
      11. 編譯後的speexenc.exe和speexdec.exe就在speex-1.2.0\win32\VS2008\Debug中。
  2. 編碼流程

使用Speex的API函數對音頻數據進行壓縮編碼要通過以下步驟:

  • 定義一個Speex格式數據流的內存指針變量vSpeexBits和一個Speex編碼器的內存指針變量enc。
  • 調用speex_bits_init( &vSpeexBits )函數初始化vSpeexBits。
  • 調用enc = speex_encoder_init(&speex_nb_mode)函數初始化enc。其中speex_nb_mode是SpeexMode類型的變量,表示的是窄帶模式。還有speex_wb_mode表示寬帶模式、speex_uwb_mode表示超寬帶模式。
  • 調用int speex_encoder_ctl( void * state, int request, void * ptr )函數來設定編碼器的參數,其中參數state表示編碼器的內存指針;參數request表示要定義的參數類型,如SPEEX_GET_FRAME_SIZE表示設置幀大小,SPEEX_SET_QUALITY表示編碼的質量等級;參數ptr表示要設定的值。
  • 初始化完畢後,對每一幀聲音做以下處理:調用函數speex_bits_reset( &vSpeexBits )重置vSpeexBits,而後調用函數speex_encode( enc_state, input_frame, &vSpeexBits )進行編碼,參數bits中保存編碼後的Speex格式數據幀,最後調用speex_bits_write( SpeexBits * bits, char * bytes, int max_len )函數將參數vSpeexBits中保存編碼後的Speex格式數據幀讀取到參數bytes中。
  • 編碼結束後,調用函數speex_bits_destroy(&bits),speex_encoder_destroy(enc_state)來銷燬SpeexBits和編碼器。
  1. 解碼流程

對已經編碼過的Speex格式音頻數據幀進行解碼要通過如下步驟:

  • 定義一個SpeexBits類型變量bits和一個Speex解碼器的內存指針變量dec。
  • 調用speex_bits_init(&bits) 函數初始化bits。
  • 調用dec = speex_decoder_init(&speex_nb_mode) 函數初始化dec。
  • 調用函數speex_decoder_ctl(void * state, int request, void * ptr)來設定解碼器的參數。
  • 調用函數 speex_decode(void * state, SpeexBits * bits, float * out)對參數bits中的Speex格式音頻數據幀進行解碼,參數out中存放解碼後的音頻數據幀。
  • 調用函數speex_bits_destroy(&bits), speex_decoder_destroy(void * state)來銷燬SpeexBits和解碼器。

 

下面是一段實例代碼:

#include <speex.h>

#include <stdio.h>

/*幀的大小在這個例程中是一個固定的值,但它並非必須這樣*/

#define FRAME_SIZE 160

 

int main(int argc, char **argv)

{

    char * inFile;

    FILE * fin;

    short in[FRAME_SIZE];

    float input[FRAME_SIZE];

    char cbits[200];

    int nbBytes;

    void * state; /*保存編碼的狀態*/

    SpeexBits bits; /*保存字節所以他們能夠被speex常規讀寫*/

    int i, tmp;

    //新建一個新的編碼狀態在窄寬(narrowband)模式下

    state = speex_encoder_init(&speex_nb_mode);

    //設置質量爲8(15kbps)

    tmp=8;

    speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);

    inFile = argv[1];

    

    fin = fopen(inFile, "r");

    //初始化結構使他們保存數據

    speex_bits_init(&bits);

    while( 1 )

    {

        //讀入一幀16bits的聲音

        fread(in, sizeof(short), FRAME_SIZE, fin);

        if( feof(fin) )

            break;

        //把16bits的值轉化爲float,以便speex庫能夠在上面工做

        for (i=0;i<FRAME_SIZE;i++)

            input[i]=in[i];

        

        //清空這個結構體裏全部的字節,以便咱們能夠編碼一個新的幀

        speex_bits_reset(&bits);

        //對幀進行編碼

        speex_encode(state, input, &bits);

        //把bits拷貝到一個利用寫出的char型數組

        nbBytes = speex_bits_write(&bits, cbits, 200);

        //首先寫出幀的大小,這是sampledec文件須要的一個值,可是你的應用程序中可能不同

        fwrite(&nbBytes, sizeof(int), 1, stdout);

        //寫出壓縮後的數組

        fwrite(cbits, 1, nbBytes, stdout);

    }

    

    //釋放編碼器狀態量

    speex_encoder_destroy(state);

    //釋放bit_packing結構

    speex_bits_destroy(&bits);

    fclose(fin);

    return 0;

}

 

  1. 利用speex實現語音流壓縮(轉載)

原文地址:https://blog.csdn.net/u011473714/article/details/47010445

 

最近須要作一個基於udp的實時語音聊天的應用,語音流的壓縮方面,我選擇了使用speex。

Speex是一套主要針對語音的開源免費,無專利保護的音頻壓縮格式。Speex工程着力於經過提供一個能夠替代高性能語音編解碼來下降語音應用輸入門檻 。另外,相對於其它編解碼器,Speex也很適合網絡應用,在網絡應用上有着本身獨特的優點。同時,Speex仍是GNU工程的一部分,在改版的BSD協議中獲得了很好的支持。

而後,看了一下speex手冊和speex的api文檔,寫了一個簡單的例程。

1、speex api的簡單介紹

1. 編碼:

a) 定義一個SpeexBits類型變量ebits和一個Speex編碼器狀態變量enc_state。

b) 調用speex_bits_init(&ebits)初始化。

c) 調用speex_encoder_init(&speex_nb_mode)來初始化enc_state。其中speex_nb_mode是SpeexMode類型的變量,表示的是窄帶模式。還有speex_wb_mode表示寬帶模式、speex_uwb_mode表示超寬帶模式。

d) 調用函數int speex_encoder_ ctl(void *state, int request, void *ptr)來設定編碼器的參數,其中參數state表示編碼器的狀態;參數request表示要定義的參數類型,如SPEEX_ GET_ FRAME_SIZE表示設置幀大小,SPEEX_ SET_QUALITY表示量化大小,這決定了編碼的質量;參數ptr表示要設定的值。

e) 初始化完畢後,對每一幀聲音做以下處理:調用函數speex_bits_reset(&ebits)再次設定SpeexBits,而後調用函數speex_encode_int(enc_state, input_frame, &ebits),參數ebits中保存編碼後的數據流。

f) 編碼結束後,調用函數speex_bits_destroy (&ebits),speex_encoder_destroy (enc_state)來銷燬編碼器

2. 解碼

接口與編碼相似,這裏就很少說了~~

 

2、配置安裝

在使用speex以前,首先固然要配置一下speex的環境,到官網下載speex源碼,我使用的是1.2rc1版本。

       tar zxvf speex-1.2rc1.tar.gz

       cd speex-1.2rc1

       ./configure --prefix=/home/yzf/lib/speex   (路徑改爲本身喜歡的)

       make && make install

       編譯安裝後,把/home/yzf/lib/speex/include 下的文件拷貝到 /usr/include下  

       把/home/yzf/lib/speex/lib/libspeex.so.1.5.0 拷貝到 /usr/lib下

       並重命名爲libspeex.so

       並創建該文件的軟連接 libspeex.so.1  :  ln -s libspeex.so libspeex.so.1

       由於有些系統-lspeex使用的是 libspeex.so.1,好比我用的一個服務器的redhat

 

3、例程:

下面是我寫的一個例程,我用"僞單例模式"封裝了一下speex的接口,方便本身使用~~

voice.h

#ifndef VOICE_H
#define VOICE_H

/*
* 初始化和銷燬
*/
void voice_encode_init();
void voice_encode_release();
void voice_decode_init();
void voice_decode_release();
/*
* 壓縮編碼
* short lin[] 語音數據
* int size 語音數據長度
* char encoded[] 編碼後保存數據的數組
* int max_buffer_size 保存編碼數據數組的最大長度
*/
int voice_encode(short in[], int size,
char encoded[], int max_buffer_size);
/*
* 解碼
* char encoded[] 編碼後的語音數據
* int size 編碼後的語音數據的長度
* short output[] 解碼後的語音數據
* int max_buffer_size 保存解碼後的數據的數組的最大長度
*/
int voice_decode(char encoded[], int size,
short output[], int max_buffer_size);
#endif //define VOICE_H

voice.cpp

#include <speex/speex.h>
#include <cstring>
#include <cstdio>
#include "voice.h"

static int enc_frame_size;//壓縮時的幀大小
static int dec_frame_size;//解壓時的幀大小

static void *enc_state;
static SpeexBits ebits;
static bool is_enc_init = false;

static void *dec_state;
static SpeexBits dbits;
static bool is_dec_init = false;
//初始話壓縮器
void voice_encode_init() {
printf("enc init\n");
int quality = 8;
speex_bits_init(&ebits);
enc_state = speex_encoder_init(&speex_nb_mode);
speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &quality);
speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &enc_frame_size);
is_enc_init = true;
}
//銷燬壓縮器
void voice_encode_release() {
printf("enc release\n");
speex_bits_destroy(&ebits);
speex_encoder_destroy(enc_state);
is_enc_init = false;
}
//初始化解壓器
void voice_decode_init() {
printf("dec init\n");
int enh = 1;
speex_bits_init(&dbits);
dec_state = speex_decoder_init(&speex_nb_mode);
speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &dec_frame_size);
speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
is_dec_init = true;
}
//銷燬解壓器
void voice_decode_release() {
printf("dec release\n");
speex_bits_destroy(&dbits);
speex_decoder_destroy(dec_state);
is_dec_init = false;
}
//壓縮語音流
int voice_encode(short in[], int size,
char encoded[], int max_buffer_size) {

if (! is_enc_init) {
voice_encode_init();
}

short buffer[enc_frame_size];
char output_buffer[1024 + 4];
int nsamples = (size - 1) / enc_frame_size + 1;
int tot_bytes = 0;
for (int i = 0; i < nsamples; ++ i)

{
speex_bits_reset(&ebits);
memcpy(buffer, in + i * enc_frame_size, enc_frame_size * sizeof(short));

speex_encode_int(enc_state, buffer, &ebits);
int nbBytes = speex_bits_write(&ebits, output_buffer + 4, 1024 - tot_bytes);
memcpy(output_buffer, &nbBytes, 4);

int len =
max_buffer_size >= tot_bytes + nbBytes + 4 ?
nbBytes + 4 : max_buffer_size - tot_bytes;

memcpy(encoded + tot_bytes, output_buffer, len * sizeof(char));

tot_bytes += nbBytes + 4;
}
return tot_bytes;
}
//解壓語音流
int voice_decode(char encoded[], int size,
short output[], int max_buffer_size) {

if (! is_dec_init) {
voice_decode_init();
}

char* buffer = encoded;
short output_buffer[1024];
int encoded_length = size;
int decoded_length = 0;
int i;

for (i = 0; decoded_length < encoded_length; ++ i)

{
speex_bits_reset(&dbits);
int nbBytes = *(int*)(buffer + decoded_length);
speex_bits_read_from(&dbits, (char *)buffer + decoded_length + 4, nbBytes);
speex_decode_int(dec_state, &dbits, output_buffer);

decoded_length += nbBytes + 4;
int len = (max_buffer_size >= dec_frame_size * (i + 1)) ?
dec_frame_size : max_buffer_size - dec_frame_size * i;
memcpy(output + dec_frame_size * i, output_buffer, len * sizeof(short));
}
return dec_frame_size * i;
}

main.cpp 主程序

#include "voice.h"
#include <cstdio>

#define FRAME_SIZE 160
#define HEAD_SIZE 44

int main(int argc, char **argv) {

char head[HEAD_SIZE];
short in[FRAME_SIZE];
char encoded[FRAME_SIZE * 2];
short decoded[FRAME_SIZE];

size_t read_count;
size_t encoded_count;
size_t decoded_count;

FILE *fp = fopen("female.wav", "r");
FILE *fp2 = fopen("encoded", "w");
FILE *fp3 = fopen("decoded.wav", "w");
//把wav的頭信息寫到解壓後的文件中去,壓縮和解壓都是對純語音數據進行操做的
fread(head, sizeof(char), HEAD_SIZE, fp);
fwrite(head, sizeof(char), HEAD_SIZE, fp3);

voice_encode_init();
voice_decode_init();
while (true) {
read_count = fread(in, sizeof(short), FRAME_SIZE, fp);
if (feof(fp)) {
break;
}
encoded_count = voice_encode(in, read_count, encoded, FRAME_SIZE * 2);
decoded_count = voice_decode(encoded, encoded_count, decoded, FRAME_SIZE);
fwrite(encoded, sizeof(char), encoded_count, fp2);
fwrite(decoded, sizeof(short), decoded_count, fp3);
}
voice_encode_release();
voice_decode_release();

fclose(fp);
fclose(fp2);
fclose(fp3);

return 0;
}

編譯運行:

g++ main.cpp voice.cpp -lspeex -lm -Wall -o speex

./speex

運行後當前目錄生成了encoded(壓縮後的數據)和decoded.wav文件,另外female.wav是在speex官網下載的一個語音文件。decoded.wav播放起來,感受和female.wav沒太大的區別,反正我是聽不出來。壓縮效果的話,按我所設置的參數,是將160個short(320個字節)壓縮成38個字節,由於除了加密數據外,解壓時還須要用到每塊加密數據的字節數,在這裏是38(int),因此總共佔用42個字節,感受壓縮效果仍是挺好的。

  1. 關於Speex延遲問題(轉載)

原文地址:https://blog.csdn.net/lishaoqi_scau/article/details/7548934

 

這裏說的語音延遲問題不是網絡延遲,那個取決於網絡情況,基本上是固定的,除非換個傳輸方法

 

這裏說的語音延遲問題形成的緣由是這樣:

 

A發送說了十秒鐘的話,網絡延遲是3秒

 

那麼正常狀況B會在3秒後開始聽到這句話,並在13秒的時候聽完

 

但若是這時候在第8秒的時候,B的網絡卡了1秒(這種狀況出現很正常)

 

那麼A說的後面5秒的內容,B會在9~14秒聽到

 

那麼這裏問題就出來了,若是多卡幾回,B聽到的內容延遲就會愈來愈大,緩衝區裏面的數據也會愈來愈多

 

可是後面收到的數據又必須等到以前收到的數據被播放完之後再播放

 

因此結果就是延遲會愈來愈長

 

那麼解決這個問題有下面這些辦法

 

一、由於我如今的這款軟件原本就是採用中轉式的傳輸,原本就延遲很慢,很難知足正常通話要求,乾脆換成對講機的形式,就不會有這種狀況出現了,按住一個鍵說話,鬆開話就被髮送出去了,這樣原本就是異步的

 

JS:我仍是以爲對講機不太友好,爭取努力解決延遲問題,實在不行的話做爲最後的選擇吧

 

二、丟棄掉那些延遲的包,就好比說剛纔的問題,B在9秒同時收到了A在5秒和6秒說的內容,這時候直接把5秒的包丟了,播放6秒的內容,用這種方法來遇上對方的說話速度

 

JS:這種方法當然能解決延遲愈來愈長的問題,但問題是某些內容被丟棄了,用戶體驗會不好,總是莫名其妙少了一句話,會讓人抓狂的

 

三、若是積累了不少過多的包,則不播放那些沒聲音的包。這個方法就是利用人說話的空隙時間,接收方收到了過多的包,則說明出現了網絡延遲的問題,這時候去分析包,若是沒有聲音,就乾脆不播放直接丟棄,去播放後面的包,以此來遇上說話放的速度。

 

上面的方法中我最終選擇了第三種,由於首先不會影響用戶體驗,只丟棄那些沒聲音的包來空出時間,利用對方不說話那段時間把速度遇上來

 

誰會無止境的說上幾個小時呢是吧

 

但也有弊端,就是背景太過嘈雜的話,就很差分辨了,沒法得知對方對方是否是在說話,但這個問題暫時不考慮吧

 

最開始我是想去把包解碼而後分析wave數據,求這個包的平均值,若是平均值低於某個零界則認爲是無聲包,丟棄

 

要作到這個功能其實還挺簡單的,由於wave數據仍是很好看懂的

 

不事後來找到了更好的辦法,那就是speex自己提供的 靜音檢測VAD 這個選項來作

 

靜音檢測(VAD)將檢測被編碼的音頻數據是語音仍是靜音或背景噪聲。這個特性在用變比特率(VBR)進行編碼是老是開啓的,因此選項設置只對非變比特率(VBR)起做用。在這種狀況下,Speex檢測非語音週期並對用足夠的比特數從新生成的背景噪聲進行編碼。這個叫"溫馨噪聲生成(CNG)"。

 

int dtx = 1;

speex_encoder_ctl(state, SPEEX_SET_VAD, &dtx);

 

我跟蹤看了一下,不開啓這個選項的時候,每一個包都是固定大小,若是開啓的話,有的包會是15字節,有的則只有2字節

 

因此我想當積累的過多的包時,直接丟棄掉只有2字節的包,固然如今仍是在理論階段,能不能成功還得試驗

 

另外還有兩個與此相關的功能 變比特率(VBR)和 非連續傳輸(DTX)

 

變比特率是比較重要的功能,默認狀況下speex壓縮後每一個包大小都是固定的,若是採用了變比特率那麼會根據每一個段內實際的語音內容而壓縮出不一樣長度的內容

 

不連續傳輸(DTX)是靜音檢測(VAD)/變比特率(VBR)操做的額外選項,它可以在背景噪聲固定時,徹底的中止傳輸。若是是基於文件的操做,因爲咱們不能中止對文件的寫入,會有5個比特被用到這種幀內(相對於250bps)。

 

若是這三個選項開啓,可以極大的減小編碼後的數據長度,我測試了一下,大概減小了一倍左右

 

不過惋惜,由於wince上由於性能緣由我把浮點運算禁用掉了,而變比特率徹底是基於浮點運算的,所以也得禁用掉

 

不過只開啓靜音檢測和不連續傳輸的話也能必定量的減小傳輸量

 

 

 

pc上測試沒有問題後我就去看wince平臺上表現怎麼樣了

 

結果發現根本就徹底沒反應,加了VAD和DTX特性後和沒加效果同樣

 

後來想起來由於M8上的浮點運算能力有限,因此禁用掉了浮點運算,而VBR是基於浮點運算的,所以得一塊兒禁用

 

而在到網上找了下資料發現VAD和DTX都是基於VBR的

 

這下難道又進死衚衕啦,難道wince平臺上就無法使用VAD特性?

 

我從新把speex說明書上面關於CPU性能優化那段拿了出來好好看了一下

 

The single that will affect the CPU usage of Speex the most is whether it is compiled for floating point or fixed-point. If your

CPU/DSP does not have a floating-point unit FPU, then compiling as fixed-point will be orders of magnitudes faster. If there

is an FPU present, then it is important to test which version is faster. On the x86 architecture, floating-point is generally

faster, but not always. To compile Speex as fixed-point, you need to pass –fixed-point to the configure script or define the

FIXED_POINT macro for the compiler. As of 1.2beta3, it is now possible to disable the floating-point compatibility API,

which means that your code can link without a float emulation library. To do that configure with –disable-float-api or define

the DISABLE_FLOAT_API macro. Until the VBR feature is ported to fixed-point, you will also need to configure with

–disable-vbr or define DISABLE_VBR.

 

這纔想起來好像當時在編譯speex1.2rc1版時確實看到一個宏定義叫fixed-point,當時也沒在乎

 

說不定以M8上這麼強大的CPU運行定點數也徹底夠呢

 

因而把DISABLE_FLOAT_API和DISABLE_VBR特性刪掉之後再次編譯了一遍libspeex

 

而後再次運行wince上的程序,發現速度很快,幾乎和禁用掉浮點運算速度差很少

 

並且由於有了VBR的特性,VAD和DTX都運行得很好

 

我拿了一個826k 19秒的一個wav文件作測試,由於爲了測試靜音檢測功能,因此19秒中只稍微講了幾句話,其餘都是沒有聲音

 

統一都採用了speex_uwb_mode的壓縮方式

 

平臺 speex版本 不開啓任何特性 不開啓任何特性 開啓VAD和DTX 開啓VAD和DTX 開啓VAD和DTX和VBR 開啓VAD和DTX和VBR

質量8 質量2 質量8 質量2 質量8 質量2

PC 1.0.5 50K 17K 32K 12K 24K 24K

WINCE 1.2rc1 40K 16K 19K 8K 16K 16k

 

 

 

這裏測試結果能夠看出,無論是高質量仍是低質量,VAD和DTX這兩個屬性均可以減小壓縮後的數據量

 

而VBR這個屬性在高品種的時候能夠減小數據量,但低品質的時候反而增長了數據量

 

並且在wince上加入了VBR之後運行速度明顯要慢一兩秒

 

最後再汗一個800K壓縮到8K,壓縮了100倍,而解壓後仍然很清晰,技術的力量真強大啊

  1. Android下jni實現speex編解碼(轉載)

原文地址:http://www.javashuo.com/article/p-ehqzbeef-kc.html

 

一、去Speex官網下載最新Speex源碼。

二、建立新的android工程,並建立jni文件夾。

三、把speex源碼目錄下的libspeex和include目錄及其子目錄文件所有拷貝到$project/jni目錄下。

四、在jni目錄下新增Android.mk文件,編輯內容以下:

LOCAL_PATH := $(call my-dir)

 

include $(CLEAR_VARS)

 

LOCAL_MODULE := libspeex

LOCAL_CFLAGS = -DFIXED_POINT -DUSE_KISS_FFT -DEXPORT="" -UHAVE_CONFIG_H

LOCAL_C_INCLUDES := $(LOCAL_PATH)/include

 

LOCAL_SRC_FILES :=\

libspeex/bits.c \

libspeex/buffer.c \

libspeex/cb_search.c \

libspeex/exc_10_16_table.c \

libspeex/exc_10_32_table.c \

libspeex/exc_20_32_table.c \

libspeex/exc_5_256_table.c \

libspeex/exc_5_64_table.c \

libspeex/exc_8_128_table.c \

libspeex/fftwrap.c \

libspeex/filterbank.c \

libspeex/filters.c \

libspeex/gain_table.c \

libspeex/gain_table_lbr.c \

libspeex/hexc_10_32_table.c \

libspeex/hexc_table.c \

libspeex/high_lsp_tables.c \

libspeex/jitter.c \

libspeex/kiss_fft.c \

libspeex/kiss_fftr.c \

libspeex/lpc.c \

libspeex/lsp.c \

libspeex/lsp_tables_nb.c \

libspeex/ltp.c \

libspeex/mdf.c \

libspeex/modes.c \

libspeex/modes_wb.c \

libspeex/nb_celp.c \

libspeex/preprocess.c \

libspeex/quant_lsp.c \

libspeex/resample.c \

libspeex/sb_celp.c \

libspeex/scal.c \

libspeex/smallft.c \

libspeex/speex.c \

libspeex/speex_callbacks.c \

libspeex/speex_header.c \

libspeex/stereo.c \

libspeex/vbr.c \

libspeex/vq.c \

libspeex/window.c \

speex_jni.cpp \

 

include $(BUILD_SHARED_LIBRARY)

 

5.在jni目錄下新增Application.mk文件,編輯內容以下

APP_ABI := armeabi armeabi-v7a

 

6.在$project/jni/include/speex/目錄下新增speex_config_types.h文件,編輯內容以下

 

#ifndef __SPEEX_TYPES_H__

#define __SPEEX_TYPES_H__

typedef short spx_int16_t;

typedef unsigned short spx_uint16_t;

typedef int spx_int32_t;

typedef unsigned int spx_uint32_t;

#endif

 

7.建立JNI包裝類speex_jni.cpp,用來調用Speex中的C代碼函數,編輯內容以下

 

#include

 

#include

#include

 

#include

 

static int codec_open = 0;

 

static int dec_frame_size;

static int enc_frame_size;

 

static SpeexBits ebits, dbits;

void *enc_state;

void *dec_state;

 

static JavaVM *gJavaVM;

 

extern "C"

JNIEXPORT jint JNICALL Java_com_trunkbow_speextest_Speex_open

(JNIEnv *env, jobject obj, jint compression) {

int tmp;

 

if (codec_open++ != 0)

return (jint)0;

 

speex_bits_init(&ebits);

speex_bits_init(&dbits);

 

enc_state = speex_encoder_init(&speex_nb_mode);

dec_state = speex_decoder_init(&speex_nb_mode);

tmp = compression;

speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &tmp);

speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &enc_frame_size);

speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &dec_frame_size);

 

return (jint)0;

}

 

extern "C"

JNIEXPORT jint Java_com_trunkbow_speextest_Speex_encode

(JNIEnv *env, jobject obj, jshortArray lin, jint offset, jbyteArray encoded, jint size) {

 

jshort buffer[enc_frame_size];

jbyte output_buffer[enc_frame_size];

int nsamples = (size-1)/enc_frame_size + 1;

int i, tot_bytes = 0;

 

if (!codec_open)

return 0;

 

speex_bits_reset(&ebits);

 

for (i = 0; i < nsamples; i++) {

env->GetShortArrayRegion(lin, offset + i*enc_frame_size, enc_frame_size, buffer);

speex_encode_int(enc_state, buffer, &ebits);

}

//env->GetShortArrayRegion(lin, offset, enc_frame_size, buffer);

//speex_encode_int(enc_state, buffer, &ebits);

 

tot_bytes = speex_bits_write(&ebits, (char *)output_buffer,

enc_frame_size);

env->SetByteArrayRegion(encoded, 0, tot_bytes,

output_buffer);

 

return (jint)tot_bytes;

}

 

extern "C"

JNIEXPORT jint JNICALL Java_com_trunkbow_speextest_Speex_decode

(JNIEnv *env, jobject obj, jbyteArray encoded, jshortArray lin, jint size) {

 

jbyte buffer[dec_frame_size];

jshort output_buffer[dec_frame_size];

jsize encoded_length = size;

 

if (!codec_open)

return 0;

 

env->GetByteArrayRegion(encoded, 0, encoded_length, buffer);

speex_bits_read_from(&dbits, (char *)buffer, encoded_length);

speex_decode_int(dec_state, &dbits, output_buffer);

env->SetShortArrayRegion(lin, 0, dec_frame_size,

output_buffer);

 

return (jint)dec_frame_size;

}

 

extern "C"

JNIEXPORT jint JNICALL Java_com_trunkbow_speextest_Speex_getFrameSize

(JNIEnv *env, jobject obj) {

 

if (!codec_open)

return 0;

return (jint)enc_frame_size;

 

}

 

extern "C"

JNIEXPORT void JNICALL Java_com_trunkbow_speextest_Speex_close

(JNIEnv *env, jobject obj) {

 

if (--codec_open != 0)

return;

 

speex_bits_destroy(&ebits);

speex_bits_destroy(&dbits);

speex_decoder_destroy(dec_state);

speex_encoder_destroy(enc_state);

}

 

8.在Java層建立Speex工具類,內容以下:

 

package com.trunkbow.speextest;

 

public class Speex {

 

private static final int DEFAULT_COMPRESSION = 8;

 

Speex() {

}

 

public void init() {

load();

open(DEFAULT_COMPRESSION);

}

 

private void load() {

try {

System.loadLibrary("speex");

} catch (Throwable e) {

e.printStackTrace();

}

 

}

 

public native int open(int compression);

public native int getFrameSize();

public native int decode(byte encoded[], short lin[], int size);

public native int encode(short lin[], int offset, byte encoded[], int size);

public native void close();

}

 

九、使用cygwin編譯,生成so文件。

  1. 微信speex語音開發

https://mp.weixin.qq.com/advanced/wiki?t=t=resource/res_main&id=mp1444738727

  1. Opus的FEC前向糾錯

前向糾錯也叫前向糾錯(Forward Error Correction,簡稱FEC),是增長數據通信可信度的方法。在單向通信信道中,一旦錯誤被發現,其接收器將無權再請求傳輸。FEC 是利用數據進行傳輸冗餘信息的方法,當傳輸中出現錯誤,將容許接收器再建數據。

 

FEC經過冗餘編碼的方式將當前幀數據冗餘一些到後一幀數據,所以當發現當前幀丟失,能夠經過後一幀數據恢復。

int opus_decode ( OpusDecoder ∗ st, const unsigned char ∗ data, opus_int32 len, opus_int16 ∗ pcm, int frame_size, int decode_fec )

opus_decode能夠經過放空包或者打開的FEC的狀況下嘗試恢復數據。

當data爲NULL時,len應該爲0, 此時opus嘗試解一幀pcm數據,猜出這一幀數據;

當decode_fec爲1時,使用FEC機制,嘗試恢復前一幀數據;不然編碼當前幀;

 

羅列如下三種狀況並例舉僞代碼:

1. 前一幀與當前幀均正常, 前一幀數據正常解碼;

opus_decode(decoder, previous_frame, frame_size, pcm, pcm_size, 0);

2. 前一幀丟失,當前幀正常,能夠經過打開FEC的方式解碼當前幀,嘗試恢復前一幀;

opus_decode(decoder, current_frame, frame_size, pcm, pcm_size, 1);

3. 前一幀與當前幀均丟失,經過放空包的方式,嘗試猜出前一幀數據;

opus_decoder(decoder, NULL, 0, pcm, pcm_size, 0);

 

所以能夠經過預先緩存一幀數據的方式進行解碼,每次收到一幀數據後,解碼前一幀,此時須要考慮上述三種狀況決定放入何種數據。

 

FEC的恢復效果跟預期丟包率設置、還有碼率模式設置、還有比特率都有關係。

 

  1. The Speex Codec Manual Speex編解碼器手冊

     

     

    The Speex Codec Manual
    Version 1.2 Beta 3

    Speex編解碼器手冊
    版本1.2 Beta 3

     

Author: Jean-Marc Valin
翻譯:赤勇玄心行天道,AMG

 

December 8, 2007
2007年12月8日

 

Copyright 2002-2007 Jean-Marc Valin/Xiph.org Foundation

版權全部2002-2007 Jean-Marc Valin / Xiph.org Foundation

 

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Section, with no Front-Cover Texts, and with no Back-Cover. A copy of the license is included in the section entitled "GNU Free Documentation License".

在自由軟件基金會發布的GNU自由文檔許可證版本1.1及任何之後發佈的版本下,保證本文檔被賦予複製、分發、或修改的權利;沒有不可變章節,沒有封面文字,而且沒有封底。該協議副本包含"GNU自由文檔許可證"的部分標題。

  1. Contents 目錄

13    The Speex Codec Manual Speex編解碼器手冊    14

13.1    Contents 目錄    15

13.2    Speex 介紹    15

13.2.1    Getting help 獲取幫助    15

13.2.2    About this document 關於本文檔    16

13.3    Codec description 編解碼器描述    16

13.3.1    Concepts 概念    16

13.3.2    Codec 編解碼器    17

13.3.3    Preprocessor 預處理器    17

13.3.4    Adaptive Jitter Buffer 自適應抖動緩衝器    18

13.3.5    Acoustic Echo Canceller 聲學回音消除器    18

13.3.6    Resampler 從新採樣器    18

13.4    Compiling and Porting 編譯和移植    18

13.4.1    Platforms 平臺    19

13.4.2    Porting and Optimising 移植和優化    19

13.5    Command-line encoder/decoder    20

13.5.1    speexenc    20

13.5.2    speexdec    20

13.6    Using the Speex Codec API (libspeex) 使用Speex編解碼器API(libspeex)    21

13.7    Encoding 編碼    21

13.7.1    Decoding 解碼    22

13.7.2    Codec Options (speex_*_ctl) 編解碼器選項(speex_*_ctl)    22

13.7.3    Mode queries 模式查詢    23

13.7.4    Packing and in-band signalling 封包和帶內信號    24

13.8    Speech Processing API (libspeexdsp) 語音處理API(libspeexdsp)    24

13.8.1    Preprocessor 預處理器    24

13.8.2    Echo Cancellation    25

13.8.3    Jitter Buffer 抖動緩衝器    26

13.8.4    Resampler 重採樣器    26

13.8.5    Ring Buffer 環形緩衝器    27

 

  1. Introduction to Speex Speex介紹

The Speex codec (http://www.speex.org/) exists because there is a need for a speech codec that is open-source and free from software patent royalties. These are essential conditions for being usable in any open-source software. In essence, Speex is to speech what Vorbis is to audio/music. Unlike many other speech codecs, Speex is not designed for mobile phones but rather for packet networks and voice over IP (VoIP) applications. File-based compression is of course also supported.

Speex編解碼器(http://www.speex.org/)之因此存在,是由於須要一個開源的免軟件專利使用費的語音編解碼器。這些都是在任何開源軟件中可用的必要條件。從本質上講,Speex是針對於語音,就像音頻壓縮格式是針對於音頻或音樂同樣。與許多其餘語音編解碼器不一樣,Speex不是爲移動電話而設計的,而是爲分組網絡和網絡電話(VoIP)應用程序而設計的。固然也支持基於文件的壓縮。

 

The Speex codec is designed to be very flexible and support a wide range of speech quality and bit-rate. Support for very good quality speech also means that Speex can encode wideband speech (16 kHz sampling rate) in addition to narrowband speech (telephone quality, 8 kHz sampling rate).

Speex編解碼器設計得很是靈活,並支持不少種語音質量和比特率。對高質量語音的支持也意味着,Speex除了能編碼窄帶語音(電話質量,8kHz採樣頻率)外,還能編碼寬帶語音(16kHz採樣頻率)。

 

Designing for VoIP instead of mobile phones means that Speex is robust to lost packets, but not to corrupted ones. This is based on the assumption that in VoIP, packets either arrive unaltered or don't arrive at all. Because Speex is targeted at a wide range of devices, it has modest (adjustable) complexity and a small memory footprint.

爲網絡電話而不是移動電話而設計,意味着Speex對丟失的數據包魯棒,但對損壞的數據包不魯棒。這是基於這樣的假設:在網絡電話中,數據包要麼原封不動地到達,要麼根本不到達。因爲Speex的目標是在不少種設備上,所以它具備適度(可調)的複雜性和較小的內存佔用量。

 

All the design goals led to the choice of CELP as the encoding technique. One of the main reasons is that CELP has long proved that it could work reliably and scale well to both low bit-rates (e.g. DoD CELP @ 4.8 kbps) and high bit-rates (e.g. G.728 @ 16 kbps).

全部設計目標都致使選擇CELP做爲編碼技術。主要緣由之一是CELP在長期以來就證實了它能夠可靠地工做,而且能夠很好地擴展到低比特率(例如DoD CELP @ 4.8 kbps)和高比特率(例如G.728 @ 16 kbps)。

  1. Getting help 獲取幫助

As for many open source projects, there are many ways to get help with Speex. These include:

對於許多開源項目,有不少途徑能夠獲取到Speex的幫助。它們包括:

  • This manual

    本手冊

  • Other documentation on the Speex website (http://www.speex.org/)

    Speex網站上的其餘文檔(http://www.speex.org/)

  • Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not just for developers)

    郵件發送清單:討論任何有關Speex的話題發送到speex-dev@xiph.org(不只限於開發人員)

  • IRC: The main channel is #speex on irc.freenode.net. Note that due to time differences, it may take a while to get someone, so please be patient.

    IRC:在irc.freenode.net上的主要頻道是#speex。請注意,因爲時間不一樣,找人可能須要一段時間,所以請耐心等待。

  • Email the author privately at jean-marc.valin@usherbrooke.ca only for private/delicate topics you do not wish to discuss publically.

    給做者私人郵箱jean-marc.valin@usherbrooke.ca發郵件,但僅限於你不想被公開討論的私人的或精妙的主題。

 

Before asking for help (mailing list or IRC), it is important to first read this manual (OK, so if you made it here it's already a good sign). It is generally considered rude to ask on a mailing list about topics that are clearly detailed in the documentation. On the other hand, it's perfectly OK (and encouraged) to ask for clarifications about something covered in the manual. This manual does not (yet) cover everything about Speex, so everyone is encouraged to ask questions, send comments, feature requests, or just let us know how Speex is being used.

在尋求幫助(郵件列表或IRC)以前,請務必先閱讀本手冊(好的,若是你已經閱讀到此處,這已是一個好兆頭)。一般,在郵件列表中詢問有關文檔中已經明確詳細說明的主題的作法是不禮貌的。另外一方面,徹底能夠(並鼓勵)詢問在手冊中沒有說清楚的問題。本手冊還沒有涵蓋有關Speex的全部內容,所以鼓勵每一個人提出問題,發送評論,特性要求,或者只是讓咱們知道Speex是如何被使用的。

 

Here are some additional guidelines related to the mailing list. Before reporting bugs in Speex to the list, it is strongly recommended (if possible) to first test whether these bugs can be reproduced using the speexenc and speexdec (see Section 4) command-line utilities. Bugs reported based on 3rd party code are both harder to find and far too often caused by errors that have nothing to do with Speex.

這裏是與郵件列表有關的一些其餘準則。在將Speex中的錯誤報告給郵件列表以前,強烈建議(若是可能)首先測試是否可使用speexenc和speexdec(請參見第4部分)命令行實用程序來重現這些錯誤。

  1. About this document 關於本文檔

This document is divided in the following way. Section 2 describes the different Speex features and defines many basic terms that are used throughoutthis manual. Section 4 documentsthe standard command-line tools providedin the Speex distribution. Section 5 includes detailed instructions about programming using the libspeex API. Section 7 has some information related to Speex and standards.

本文檔按如下方式劃分。第2章介紹了Speex的不一樣特性,並定義了本手冊中使用的許多基本術語。第4章介紹了Speex發行版中提供的標準命令行工具。第5章包含有關使用libspeex API進行編程的詳細說明。第7章提供了一些與Speex和標準有關的信息。

 

The three last sections describe the algorithms used in Speex. These sections require signal processing knowledge, but are not required for merely using Speex. They are intended for people who want to understand how Speex really works and/or want to do research based on Speex. Section 8 explains the general idea behind CELP, while sections 9 and 10 are specific to Speex.

最後三個章節描述了Speex中使用的算法。這些章節須要信號處理知識,但僅僅是使用Speex則不須要這些知識。它們適用於但願瞭解Speex的工做原理和/或但願基於Speex進行研究的人員。第8章解釋了CELP背後的通常思想,而第9和10章則專門針對Speex。

  1. Codec description 編解碼器描述

This section describes Speex and its features into more details

本章深刻詳細介紹Speex和它的特性。

  1. Concepts 概念

Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual. Although some are general concepts in speech/audio processing, others are specific to Speex.

在介紹全部的Speex特性以前,這裏有語音編碼的一些概念,能夠有助於咱們更好地理解本手冊的其它部分。雖然一些概念在語音/音頻處理過程當中是常見的,可是也有一些是Speex特有的。

  1. Sampling rate 採樣頻率

The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate of Fs kHz, the highest frequency that can be represented is equal to Fs/2 kHz (Fs/2 is known as the Nyquist frequency). This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively refered to as narrowband, wideband and ultra-wideband.

採樣頻率就是每秒鐘從信號中採樣的樣本數量,以赫茲(Hz)爲單位。相對於Fs kHz的採樣頻率而言,其最高的頻率能夠達到Fs/2 kHz(Fs/2也被稱爲奈奎斯特頻率)。這是在信號處理中的一個基本屬性,並經過採樣定理說明。Speex主要被設計用於三種不一樣的採樣率:8kHz,16kHz和32kHz。它們分別被稱爲窄帶,寬帶和超寬帶。

  1. Bit-rate 比特率

When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It is measured in bits per second (bps), or generally kilobits per second. It is important to make the distinction between kilobits per second (kbps) and kilobytes per second (kBps).

在對語音信號編碼時,比特率被定義爲單位時間內所須要的比特數。它是以每秒比特位數(bps)來測量的,或者通常爲每秒千比特位數(kbps)。在每秒千比特位數(kbps)和每秒千字節數(kBps)之間進行區分是很是重要的。

  1. Quality(variable) 質量(可變的)

Speex is a lossy codec, which means that it archives compression at the expense of fidelity of the input speech signal. Unlike some other speech codecs, it is possible to control the tradeoff made between quality and bit-rate. The Speex encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.

Speex是一種有損編解碼器,這意味着它的存檔壓縮是以語音輸入信號的保真度爲代價的。不像一些其餘的語音編解碼器,它會盡量的去控制質量和比特率之間的平衡。在大多數時間,Speex的編碼處理是用一個0到10範圍內的質量參數來控制的。在固定比特率(CBR)操做中,質量參數是整型,對於可變比特率(VBR),則參數爲浮點型。

  1. Complexity(variable) 複雜度(可變的)

With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to gzip and bzip2 compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones.

在Speex中,它能夠容許咱們改變編碼器的複雜度。用1到10的整數來控制如何執行搜索,就像gzip或bzip2壓縮工具的-1至-9選項同樣。對於正常使用時,複雜度爲1時的噪聲等級會比複雜度爲10時高1至2dB,可是複雜度爲10時對CPU需求是複雜度爲1時的5倍。實踐證實,最佳的權衡是在複雜度爲2至4時,然而較高的複雜度則對非語音進行編碼時(如DTMF雙音多頻音調)較爲有用。

  1. Variable Bit-Rate(VBR) 可變比特率(VBR)

Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achive lower bit-rate for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.

可變比特率(VBR)容許編解碼器動態改變比特率以適應音頻編碼的"難度"。拿Speex來講,聽起來像元音和瞬間高音的則需較高比特率來達到較好質量,而摩擦音(如S,F音)則適當用較少的比特位數進行編碼。出於這種緣由,可變比特率(VBR)能夠用較低的比特率(bit-rate)達到固定比特率(bit-rate)一樣的質量,或比固定比特率(bit-rate)質量更好。儘管它有優點,但可變比特率(VBR)也有兩個主要缺點:第一,只能指定質量,不能保證最終的平均比特率(ABR);第二,在一些實時應用如IP電話(VoIP)中,儘管擁有高的比特率(bit-rate),但爲了適應通訊信道仍是必需要下降。

  1. Average Bit-Rate(ABR) 平均比特率(ABR)

Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate.

平均比特率(ABR)解決了在可變比特率(VBR)中的一個問題,就是平均比特率(ABR)經過動態調整可變比特率(VBR)的質量來得到一個特定目標的比特率。因爲平均比特率(ABR)是實時(開環)調整質量/比特率(bit-rate)的,因此總體質量會略低於經過變比特率(VBR)設置的接近於目標平均比特率進行編碼得到的質量。

  1. Voice Activity Detection(VAD) 語音活動檢測(VAD)

When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called "comfort noise generation" (CNG).

當啓用語音活動檢測(VAD)時,它將檢測出被編碼的音頻是語音仍是靜音/背景噪聲。語音活動檢測(VAD)在用可變比特率(VBR)進行編碼時老是默認開啓的,因此這個選項只能用於非變比特率(VBR)。在這種狀況下,Speex能夠檢測到非語音週期,並對它們用足夠的比特位數從新編碼成背景噪聲。這個就叫"溫馨噪聲生成(CNG)"。

  1. Discontinuous Transmission(DTX) 非連續傳輸(DTX)

Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps).

不連續傳輸(DTX)是對靜音檢測(VAD)/變比特率(VBR)操做的一個補充,它可以在背景噪聲固定的時候徹底中止傳輸。在基於文件的操做中,因爲咱們不能中止對文件的寫入,因此只會有5個比特被用到這種幀內(相對於250bps)。

  1. Perceptual enhancement 知覺加強

Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion produced by the encoding/decoding process. In most cases, perceptual enhancement brings the sound further from the original objectively (e.g. considering only SNR), but in the end it still sounds better (subjective improvement).

知覺加強中解碼器的一部分,當被開啓後,將嘗試減小在編碼/解碼過程當中產生的噪音/失真的感知。大多數狀況下,知覺加強產生的會和最原始的聲音會相差較遠(如只考慮信噪比(SNR)),但最終仍然聽起來更好(主觀改善)。

  1. Latency and algorithmic delay 延遲和算法延遲

Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.

每一個語音編解碼器在傳輸過程當中都會有延遲。就Speex來講,它的延遲就等於每幀大小,再加上每幀須要處理的一些"預先的"操做。在窄帶(8kHz)操做中,延遲大概是30ms,寬帶操做中,延遲大概是34ms。這些數據是沒有將CPU進行編解碼幀的時間計算在內的。

  1. Codec 編解碼器

The main characteristics of Speex can be summarized as follows:

Speex的主要特性能夠歸納以下:

  • Free software/open-source, patent and royalty-free

    開源的自由軟件,免專利,免版權

  • Integration of narrowband and wideband using an embedded bit-stream

    經過嵌入的比特流來集成的窄帶和寬帶

  • Wide range of bit-rates available (from 2.15 kbps to 44 kbps)

    可用比特率的範圍廣(bit-rate)(從2.15kbps到44kbps)

  • Dynamic bit-rate switching (AMR) and Variable Bit-Rate (VBR) operation

    動態比特率交換(AMR)和可變比特率(VBR)操做

  • Voice Activity Detection (VAD, integrated with VBR) and discontinuous transmission (DTX)

    語音活動檢測(VAD,和變比特率(VBR)集成)和不連續傳輸(DTX)

  • Variable complexity

    可變複雜度

  • Embedded wideband structure (scalable sampling rate)

    嵌入的寬帶結構(可變的比特率)

  • Ultra-wideband sampling rate at 32 kHz

    32kHz的超寬帶採樣率

  • Intensity stereo encoding option

    強化立體聲編碼選項

  • Fixed-point implementation

    定點執行

  1. Preprocessor 預處理器

This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the audio before running the encoder. The preprocessor provides three main functionalities:

這部分涉及1.1.x分支介紹的預處理器模塊。預處理器被設計在音頻被編碼前使用。預處理器提供了三個主要功能:

  • noise suppression

    噪音抑制

  • automatic gain control (AGC)

    自動增益控制(AGC)

  • voice activity detection (VAD)

    語音活動檢測(VAD)

 

The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect.

降噪器是用來減小輸入信號中的背景噪音的數量。這樣可提供更高質量的語音,即便降噪的信號沒有通過Speex編碼(或其餘編碼)也同樣。然而,當降噪後的信號與編解碼器一塊兒使用時,有一個額外的好處。通常的語音編解碼器(也包括Speex)每每在噪音輸入方面都表現不佳,一般會放大噪音。而降噪器大大下降了這種影響。

 

Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping.

自動增益控制(AGC)是用來處理不一樣設備錄製的音量有很大變化的狀況。它提供了一種方法來調整信號到參考音量。這對IP電話(voice over IP)是很是有用的,由於它避免了須要手動去調整麥克風增益。第二個好處是,將麥克風增益設置爲保守(低)級別,可有效避免削波。

 

The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec.

預處理器提供的語音活動檢測(VAD)比直接在編解碼器裏提供的更高級。

  1. Adaptive Jitter Buffer 自適應抖動緩衝器

When transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay,or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than necessary) so they can be sent to be decoded.

在用UDP或RTP協議傳輸聲音(或其餘相關內容)的時候,數據包可能會丟失、到達延遲不一樣、亂序到達。自適應抖動緩衝器的目的就是將數據包緩衝到足夠長(但不超過必要的時間)並對這些包進行重排序,而後才送給解碼器進行解碼。

  1. Acoustic Echo Canceller 聲學回音消除器

Figure 2.1: Acoustic echo model

圖 2.1:聲學回音模型

 

In any hands-free communication system (Fig. 2.1), speech from the remote end is played in the local loudspeaker, propagates in the room and is captured by the microphone. If the audio captured from the microphone is sent directly to the remote end, then the remote user hears an echo of his voice. An acoustic echo canceller is designed to remove the acoustic echo before it is sent to the remote end. It is important to understand that the echo canceller is meant to improve the quality on the remote end.

在任何免提通訊系統中(如圖2.1),來自遠端的語音會在本地擴音器中進行播放,而後在房間中傳播,並被麥克風捕獲。若是將這些被麥克風捕獲的音頻被直接發送給遠端,而後遠端用戶就會聽到它本身的聲音。聲學回音消除器就是在發送給遠端用戶以前將聲學回音消除。重要的是要明白,迴音消除是用來提升遠端用戶接收到的語音質量。

  1. Resampler 從新採樣器

In some cases, it may be useful to convert audio from one sampling rate to another. There are many reasons for that. It can be for mixing streams that have different sampling rates, for supporting sampling rates that the soundcard doesn't support, for transcoding, etc. That's why there is now a resampler that is part of the Speex project. This resampler can be used to convert between any two arbitrary rates (the ratio must only be a rational number) and there is control over the quality/complexity tradeoff.

在一些狀況下,會用到將音頻從一種採樣頻率轉換到另外一種。這會有不少緣由。例如將擁有不一樣採樣頻率的流進行混合、爲了支持聲卡不支持的採樣頻率、代碼轉換等。這就是爲何如今有一個從新採樣器會成爲Speex項目的一部分。從新採樣器可用於在兩種任意頻率之間轉換(頻率必須是有理數),它是基於質量/複雜度進行折中的控制。

  1. Compiling and Porting 編譯和移植

Compiling Speex under UNIX/Linux or any other platform supported by autoconf (e.g. Win32/cygwin) is as easy as typing:

在UNIX/Linux、或任何其餘支持autoconf的平臺(例如Win32/cygwin)上編譯Speex,就像打字同樣容易:

% ./configure [options選項]

% make

% make install

 

The options supported by the Speex configure script are:

Speex配置腳本支持的選項有:

-prefix=<path> Specifies the base path for installing Speex (e.g. /usr)

指定Speex的安裝路徑(好比/usr)

-enable-shared/-disable-shared Whether to compile shared libraries

是否編譯成動態庫

-enable-static/-disable-static Whether to compile static libraries

是否編譯成靜態庫

-disable-wideband Disable the wideband part of Speex (typically to save space)

禁用Speex的寬帶部分(一般爲了節約空間)

-enable-valgrind Enable extra hits for valgrind for debugging purposes (do not use by default)

爲了調試啓用Valgrind(一款用於內存調試、內存泄漏檢測以及檢查內存其餘問題的工具)的額外的匹配記錄(默認不使用)

-enable-sse Enable use of SSE instructions (x86/float only)

啓用SSE指令(僅限x86/float)

-enable-fixed-point Compile Speex for a processor that does not have a floating point unit (FPU)

在不支持浮點運算單元(FPU)的處理器上編譯Speex

-enable-arm4-asm Enable assembly specific to the ARMv4 architecture (gcc only)

啓用ARMv4架構的指令集(僅限gcc)

-enable-arm5e-asm Enable assembly specific to the ARMv5E architecture (gcc only)

啓用ARMv5E架構的指令集(僅限gcc)

-enable-fixed-point-debug Use only for debugging the fixed-point code (very slow)

僅用於調試定點執行代碼(很是慢)

-enable-epic-48k Enable a special (and non-compatible) 4.8 kbps narrowband mode (broken in 1.1.x and 1.2beta)

啓用特別的(和不兼容的)4.8kbps窄寬模式(1.1.x和1.2beta中不支持)

-enable-ti-c55x Enable support for the TI C5x family

啓用對TI C5x系列的支持

-enable-blackfin-asm Enable assembly specific to the Blackfin DSP architecture (gcc only)

啓用Blackfin DSP架構的指令集(僅限gcc)

-enable-vorbis-psycho Make the encoder use the Vorbis psycho-acoustic model. This is very experimental and may be removed in the future.

啓用編碼器的Vorbis心理聲學模型。這是很是實驗性的,之後可能會被移除。

  1. Platforms 平臺

Speex is known to compile and work on a large number of architectures, both floating-point and fixed-point. In general, any architecture that can natively compute the multiplication of two signed 16-bit numbers (32-bit result) and runs at a sufficient clock rate (architecture-dependent)is capable of running Speex. Architectures on which Speex is known to work (it probably works on many others) are:

  • x86 & x86-64
  • Power
  • SPARC
  • ARM
  • Blackfin
  • Coldfire (68k family)
  • TI C54xx & C55xx
  • TI C6xxx
  • TriMedia (experimental)

Operating systems on top of which Speex is known to work include (it probably works on many others):

  • Linux
  • µClinux
  • MacOS X
  • BSD
  • Other UNIX/POSIX variants
  • Symbian

The source code directory include additional information for compiling on certain architectures or operating systems in README.xxx files.

  1. Porting and Optimising 移植和優化

Here are a few things to consider when porting or optimising Speex for a new platform or an existing one.

  1. CPU optimisation CPU優化

The single that will affect the CPU usage of Speex the most is whether it is compiled for floating point or fixed-point. If your CPU/DSP does not have a floating-point unit FPU, then compiling as fixed-point will be orders of magnitudes faster. If there is an FPU present, then it is important to test which version is faster. On the x86 architecture, floating-point is generally faster, but not always. To compile Speex as fixed-point, you need to pass –fixed-point to the configure script or define the FIXED_POINT macro for the compiler. As of 1.2beta3, it is now possible to disable the floating-point compatibility API, which means that your code can link without a float emulation library. To do that configure with –disable-float-api or define the DISABLE_FLOAT_API macro. Until the VBR feature is ported to fixed-point, you will also need to configure with –disable-vbr or define DISABLE_VBR.

Other important things to check on some DSP architectures are:

•    Make sure the cache is set to write-back mode

•    If the chip has SRAM instead of cache, make sure as much code and data are in SRAM, rather than in RAM

If you are going to be writing assembly, then the following functions are usually the first ones you should consider optimising:

•    filter_mem16()

•    iir_mem16()

•    vq_nbest()

•    pitch_xcorr()

•    interp_pitch()

The filtering functions filter_mem16() and iir_mem16() are implemented in the direct form II transposed (DF2T). However, for architectures based on multiply-accumulate (MAC), DF2T requires frequent reload of the accumulator, which can make the code very slow. For these architectures (e.g. Blackfin and Coldfire), a better approach is to implement those functions as direct form I (DF1), which is easier to express in terms of MAC. When doing that however, it is important to make sure that the DF1 implementation still behaves like the original DF2T behaviour when it comes to filter values. This is necessary because the filter is time-varryingand must compute exactly the same value (not counting machine rounding) on any encoder or decoder.

  1. Memory optimisation 內存優化

Memory optimisation is mainly something that should be considered for small embedded platforms. For PCs, Speex is already so tiny that it's just not worth doing any of the things suggested here. There are several ways to reduce the memory usage of Speex, both in terms of code size and data size. For optimising code size, the trick is to first remove features you do not need. Some examples of things that can easily be disabled if you don't need them are:

•    Wideband support (–disable-wideband)

•    Support for stereo (removing stereo.c)

•    VBR support (–disable-vbr or DISABLE_VBR)

•    Static codebooks that are not needed for the bit-rates you are using (*_table.c files)

Speex also has several methodsfor allocatingtemporaryarrays. When using a compilerthat supports C99 properly(as of 2007, Microsoft compilers don't, but gcc does), it is best to define VAR_ARRAYS. That makes use of the variable-size array feature of C99. The next best is to define USE_ALLOCA so that Speex can use alloca() to allocate the temporary arrays. Note that on many systems, alloca() is buggy so it may not work. If none of VAR_ARRAYS and USE_ALLOCA are defined, then Speex falls back to allocating a large "scratch space" and doing its own internal allocation. The main disadvantage of this solution is that it is wasteful. It needs to allocate enough stack for the worst case scenario (worst bit-rate, highest complexity setting, ...) and by default, the memory isn't shared between multiple encoder/decoder states. Still, if the "manual" allocation is the only option left, there are a few things that can be improved. By overriding the speex_alloc_scratch() call in os_support.h, it is possible to always return the same memory area for all states . In addition to that, by redefining the NB_ENC_STACK and NB_DEC_STACK (or similar for wideband), it is possible to only allocate memory for a scenario that is known in advange. In this case, it is important to measure the amount of memory required for the specific sampling rate, bit-rate and complexity level being used.

  1. Command-line encoder/decoder 命令行的編碼器/解碼器

The base Speex distribution includes a command-line encoder (speexenc) and decoder (speexdec). Those tools produce and read Speex files encapsulated in the Ogg container. Although it is possible to encapsulate Speex in any container, Ogg is the recommended container for files. This section describes how to use the command line tools for Speex files in Ogg.

  1. speexenc

The speexenc utility is used to create Speex files from raw PCM or wave files. It can be used by calling:

speexenc [options] input_file output_file

The value '-' for input_file or output_file corresponds respectively to stdin and stdout. The valid options are:

–narrowband (-n) Tell Speex to treat the input as narrowband (8 kHz). This is the default

–wideband (-w) Tell Speex to treat the input as wideband (16 kHz)

–ultra-wideband (-u) Tell Speex to treat the input as "ultra-wideband" (32 kHz)

–quality n Set the encoding quality (0-10), default is 8

–bitrate n Encoding bit-rate (use bit-rate n or lower)

–vbr Enable VBR (Variable Bit-Rate), disabled by default

–abr n Enable ABR (Average Bit-Rate) at n kbps, disabled by default

–vad Enable VAD (Voice Activity Detection), disabled by default

–dtx Enable DTX (Discontinuous Transmission), disabled by default

–nframes n Pack n frames in each Ogg packet (this saves space at low bit-rates)

–comp n Set encoding speed/quality tradeoff. The higher the value of n, the slower the encoding (default is 3)

-V Verbose operation, print bit-rate currently in use

–help (-h) Print the help

–version (-v) Print version information

  1. Speex comments

–comment Add the given string as an extra comment. This may be used multiple times.

–author Author of this track.

–title Title for this track.

  1. Raw input options

–rate n Sampling rate for raw input

–stereo Consider raw input as stereo

–le Raw input is little-endian

–be Raw input is big-endian

–8bit Raw input is 8-bit unsigned

–16bit Raw input is 16-bit signed

  1. speexdec

The speexdec utility is used to decode Speex files and can be used by calling:

speexdec [options] speex_file [output_file]

The value '-' for input_file or output_file corresponds respectively to stdin and stdout. Also, when no output_file is specified, the file is played to the soundcard. The valid options are:

–enh enable post-filter (default)

–no-enh disable post-filter

–force-nb Force decoding in narrowband

–force-wb Force decoding in wideband

–force-uwb Force decoding in ultra-wideband

–mono Force decoding in mono

–stereo Force decoding in stereo

–rate n Force decoding at n Hz sampling rate

–packet-loss n Simulate n % random packet loss

-V Verbose operation, print bit-rate currently in use

–help (-h) Print the help

–version (-v) Print version information

  1. Using the Speex Codec API (libspeex) 使用Speex編解碼器API(libspeex

The libspeex library contains all the functions for encoding and decoding speech with the Speex codec. When linking on a UNIX system, one must add -lspeex -lm to the compiler command line. One important thing to know is that libspeex calls are reentrant, but not thread-safe. That means that it is fine to use calls from many threads, but calls using the same state from multiple threads must be protected by mutexes. Examples of code can also be found in Appendix A and the complete API documentation is included in the Documentation section of the Speex website (http://www.speex.org/).

libspeex庫包括了全部Speex編解碼器的語音編碼和解碼函數。在Linux系統中連接時,必須在編譯器命令行中加入-lspeex和-lm選項。須要知道的是,雖然libspeex的函數調用是可重入的,但不是線程安全的。這意味着它能夠被多線程調用,可是多線程使用相同的狀態須要用互斥鎖保護。附錄A中有代碼實例,在Speex站點(http://www.speex.org/)的文檔部分能下到完整的API文檔。

  1. Encoding 編碼

In order to encode speech using Speex, one first needs to:

爲了使用Speex進行語音編碼,首先要:

#include <speex/speex.h>

 

Then in the code, a Speex bit-packing struct must be declared, along with a Speex encoder state:

而後在代碼中,必需要聲明一個Speex比特包結構體,和一個Speex編碼器狀態一塊兒聲明:

SpeexBits bits;

void *enc_state;

 

The two are initialized by:

這兩個初始化以下:

speex_bits_init(&bits);

enc_state = speex_encoder_init(&speex_nb_mode);

 

For wideband coding, speex_nb_mode will be replaced by speex_wb_mode. In most cases, you will need to know the frame size used at the sampling rate you are using. You can get that value in the frame_size variable (expressed in samples, not bytes) with:

對於寬帶編碼,將speex_nb_mode替換爲speex_wb_mode。在大多數狀況中,你將須要知道你所使用的採樣頻率的幀大小。你能夠用以下方法獲取該值到frame_size變量(表示爲採樣個數,不是字節個數)中:

speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);

 

In practice, frame_size will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate. There are many parameters that can be set for the Speex encoder, but the most useful one is the quality parameter that controls the quality vs bit-rate tradeoff. This is set by:

實際上,當使用八、16或32kHz採樣頻率的時候,frame_size將對應於20ms。Speex編碼器有不少參數能夠設置,可是其中最有用的一個是質量參數,它控制着質量和比特率的權衡,這個設置以下:

speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);

 

where quality is an integer value ranging from 0 to 10 (inclusively). The mapping between quality and bit-rate is described in Table 9.2 for narrowband.

quality是一個從0到10(包含10)範圍的整數值,窄帶(narrowband)的質量和比特率(bit-rate)的對應關係如表9.2所示。

 

Once the initialization is done, for every input frame:

一旦初始化完成後,對於每一個輸入幀:

speex_bits_reset(&bits);

speex_encode_int(enc_state, input_frame, &bits);

nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);

 

where input_frame is a (short *) pointing to the beginning of a speech frame, byte_ptr is a (char *) where the encoded frame will be written, MAX_NB_BYTES is the maximum number of bytes that can be written to byte_ptr without causing an overflow and nbBytes is the number of bytes actually written to byte_ptr (the encoded size in bytes). Before calling speex_bits_write, it is possible to find the number of bytes that need to be written by calling speex_bits_nbytes(&bits), which returns a number of bytes.

input_frame是一個(short *)指針,指向一個語音幀的開始,byte_ptr是一個(char *)指針,已編碼幀將寫入進去,MAX_NB_BYTES是寫入到byte_ptr不會形成溢出的最大字節數,而且nbBytes是實際上寫入到byte_ptr的字節數(就是已編碼的字節長度)。在調用speex_bits_write()以前,能夠經過調用speex_bits_nbytes(&bits)來知道須要被寫入多少個字節,這個函數將返回一個字節數。

 

It is still possible to use the speex_encode() function, which takes a (float *) for the audio. However, this would make an eventual port to an FPU-less platform (like ARM) more complicated. Internally, speex_encode() and speex_encode_int() are processed in the same way. Whether the encoder uses the fixed-point version is only decided by the compile-time flags, not at the API level.

對於拿到(float *)的音頻,仍然可使用speex_encode()函數。但是,這將使移植到缺乏浮點運算單元(FPU)的平臺(如ARM)變得更復雜。本質上,speex_encode()speex_encode_int()使用相同的方法處理的。編碼器是否使用定點版本僅僅是被編譯選項決定的,不是在API級別。

 

After you're done with the encoding, free all resources with:

在你完成編碼以後,用如下方式釋放全部的資源:

speex_bits_destroy(&bits);

speex_encoder_destroy(enc_state);

 

That's about it for the encoder.

以上是關於編碼器的內容。

  1. Decoding 解碼

In order to decode speech using Speex, you first need to:

爲了使用Speex解碼語音,你首先須要:

#include <speex/speex.h>

 

You also need to declare a Speex bit-packing struct

你也須要聲明一個Speex比特包結構體

SpeexBits bits;

 

and a Speex decoder state

和一個Speex解碼器狀態

void *dec_state;

 

The two are initialized by:

這兩個初始化以下:

speex_bits_init(&bits);

dec_state = speex_decoder_init(&speex_nb_mode);

 

For wideband decoding, speex_nb_mode will be replaced by speex_wb_mode. If you need to obtain the size of the frames that will be used by the decoder, you can get that value in the frame_size variable (expressed in samples, not bytes) with:

對於寬帶解碼,將speex_nb_mode替換爲speex_wb_mode。若是你須要得到用於解碼器的幀大小,你能夠用以下方法獲取該值到frame_size變量(表示爲採樣個數,不是字節個數)中:

speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);

 

There is also a parameter that can be set for the decoder: whether or not to use a perceptual enhancer. This can be set by:

這裏也有一個設置解碼器的參數:是否使用知覺加強。這個設置以下:

speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);

 

where enh is an int with value 0 to have the enhancer disabled and 1 to have it enabled. As of 1.2-beta1, the default is now to enable the enhancer.

enh是一個整數0就會禁用這個加強,整數1就會啓用這個加強。從1.2-beta1開始,默認啓用這個加強。

 

Again, once the decoder initialization is done, for every input frame:

再次,一旦解碼器初始化完成後,對於每一個輸入幀:

speex_bits_read_from(&bits, input_bytes, nbBytes);

speex_decode_int(dec_state, &bits, output_frame);

 

where input_bytes is a (char *) containing the bit-stream data received for a frame, nbBytes is the size (in bytes) of that bit-stream, and output_frame is a (short *) and points to the area where the decoded speech frame will be written. A NULL value as the second argument indicates that we don't have the bits for the current frame. When a frame is lost, the Speex decoder will do its best to "guess" the correct signal.

input_bytes是一個(char *)指針,包含接收到的一幀比特流數據,nbBytes是這幀比特流數據的長度,output_frame是一個(short *)指針,指向的區域將被寫入已解碼的語音幀。若是一個NULL值做爲第二個參數,則表示咱們沒有當前這幀的比特流。當一幀已經丟失,Speex解碼器將盡量猜想出正確的信號。

 

As for the encoder, the speex_decode() function can still be used, with a (float *) as the output for the audio. After you're done with the decoding, free all resources with:

和編碼器相似,仍然可使用speex_decode()函數,獲取一個(float *)型的音頻輸出。在你完成解碼以後,用如下方式釋放全部的資源:

speex_bits_destroy(&bits);

speex_decoder_destroy(dec_state);

  1. Codec Options (speex_*_ctl) 編解碼器選項(speex_*_ctl)

Entities should not be multiplied beyond necessity – William of Ockham.

實體對象不該該超過所必需的 - William of Ockham。

Just because there's an option for it doesn't mean you have to turn it on – me.

僅僅由於有了一個選項,並不意味着你要打開它 - Speex做者。

 

The Speex encoder and decoder support many options and requests that can be accessed through the speex_encoder_ctl and speex_decoder_ctl functions. These functions are similar to the ioctl system call and their prototypes are:

Speex編碼器和解碼器支持不少選項和請求,它們能夠經過speex_encoder_ctl和speex_decoder_ctl函數訪問。這些函數相似於操做系統的ioctl,它們的原型是:

void speex_encoder_ctl(void *encoder, int request, void *ptr);

void speex_decoder_ctl(void *encoder, int request, void *ptr);

 

Despite those functions, the defaults are usually good for many applications and optional settings should only be used when one understands them and knows that they are needed. A common error is to attempt to set many unnecessary settings.

雖然有這些函數,默認狀況下對於大部分應用程序都是好的,僅當懂得它們並知道它們須要多少纔去修改這些設置。一般犯的錯誤就是嘗試去設置無益的設置。

 

Here is a list of the values allowed for the requests. Some only apply to the encoder or the decoder. Because the last argument is of type void *, the _ctl() functions are not type safe, and shoud thus be used with care. The type spx_int32_t is the same as the C99 int32_t type.

這裏列出了全部需求的容許值。某些僅僅適用於編碼器或者解碼器。由於最後一個參數類型是void *,因此_ctl()函數不是類型安全的,須要當心使用。這個spx_int32_t類型至關於C99標準的int32_t類型。

 

SPEEX_SET_ENH‡ Set perceptual enhancer to on (1) or off (0) (spx_int32_t, default is on)

SPEEX_SET_ENH‡ 設置知覺加強爲打開(1)或者關閉(0) (spx_int32_t,默認爲打開)

SPEEX_GET_ENH‡ Get perceptual enhancer status (spx_int32_t)

SPEEX_GET_ENH‡ 獲取知覺加強狀態 (spx_int32_t)

SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current mode (spx_int32_t)

SPEEX_GET_FRAME_SIZE 獲取當前模式下每幀的採樣個數 (spx_int32_t)

SPEEX_SET_QUALITY† Set the encoder speech quality (spx_int32_t from 0 to 10, default is 8)

SPEEX_SET_QUALITY† 設置編碼器語音質量 (spx_int32_t從0到10,默認爲8)

SPEEX_GET_QUALITY† Get the current encoder speech quality (spx_int32_t from 0 to 10)

SPEEX_GET_QUALITY† 獲取當前編碼器語音質量 (spx_int32_t從0到10)

SPEEX_SET_MODE† Set the mode number, as specified in the RTP spec (spx_int32_t)

SPEEX_SET_MODE† 設置模式編號,指定在RTP規範中 (spx_int32_t)

SPEEX_GET_MODE† Get the current mode number, as specified in the RTP spec (spx_int32_t)

SPEEX_GET_MODE† 獲取當前模式編號,指定在RTP規範中 (spx_int32_t)

SPEEX_SET_VBR† Set variable bit-rate (VBR) to on (1) or off (0) (spx_int32_t, default is off)

SPEEX_SET_VBR† 設置動態比特率(VBR)爲打開(1)或者關閉(0) (spx_int32_t,默認爲關閉)

SPEEX_GET_VBR† Get variable bit-rate (VBR) status (spx_int32_t)

SPEEX_GET_VBR† 獲取動態比特率(VBR)狀態 (spx_int32_t)

SPEEX_SET_VBR_QUALITY† Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)

SPEEX_SET_VBR_QUALITY† 設置編碼器的動態比特率語音質量 (float從0.0到10.0,默認爲8.0)

SPEEX_GET_VBR_QUALITY† Get the current encoder VBR speech quality (float 0 to 10)

SPEEX_GET_VBR_QUALITY† 獲取當前編碼器的動態比特率語音質量 (float從0.0到10.0)

SPEEX_SET_COMPLEXITY† Set the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)

SPEEX_SET_COMPLEXITY† 設置編碼器容許使用的CPU資源 (spx_int32_t從0到10,默認爲2)

SPEEX_GET_COMPLEXITY† Get the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)

SPEEX_GET_COMPLEXITY† 獲取編碼器容許使用的CPU資源 (spx_int32_t從0到10,默認爲2)

SPEEX_SET_BITRATE† Set the bit-rate to use the closest value not exceeding the parameter (spx_int32_t in bits per second)

SPEEX_SET_BITRATE† 設置不超過參數設置的最佳比特率 (spx_int32_t,單位每秒比特)

SPEEX_GET_BITRATE Get the current bit-rate in use (spx_int32_t in bits per second)

SPEEX_GET_BITRATE 獲取當前使用的比特率 (spx_int32_t,單位每秒比特)

SPEEX_SET_SAMPLING_RATE Set real sampling rate (spx_int32_t in Hz)

SPEEX_SET_SAMPLING_RATE 設置實時採樣頻率 (spx_int32_t,單位赫茲)

SPEEX_GET_SAMPLING_RATE Get real sampling rate (spx_int32_t in Hz)

SPEEX_GET_SAMPLING_RATE 獲取實時採樣頻率 (spx_int32_t,單位赫茲)

SPEEX_RESET_STATE Reset the encoder/decoder state to its original state, clearing all memories (no argument)

SPEEX_RESET_STATE 重置編碼器或者解碼器狀態爲原始狀態,清除全部的記憶 (無參數)

SPEEX_SET_VAD† Set voice activity detection (VAD) to on (1) or off (0) (spx_int32_t, default is off)

SPEEX_SET_VAD† 設置語音活動檢測(VAD)爲打開(1)或者關閉(0) (spx_int32_t,默認爲關閉)

SPEEX_GET_VAD† Get voice activity detection (VAD) status (spx_int32_t)

SPEEX_GET_VAD† 獲取語音活動檢測(VAD)狀態 (spx_int32_t)

SPEEX_SET_DTX† Set discontinuous transmission (DTX) to on (1) or off (0) (spx_int32_t, default is off)

SPEEX_SET_DTX† 設置非持續性傳輸(DTX)爲打開(1)或者關閉(0) (spx_int32_t,默認爲關閉)

SPEEX_GET_DTX† Get discontinuous transmission (DTX) status (spx_int32_t)

SPEEX_GET_DTX† 獲取非持續性傳輸(DTX)狀態 (spx_int32_t)

SPEEX_SET_ABR† Set average bit-rate (ABR) to a value n in bits per second (spx_int32_t in bits per second)

SPEEX_SET_ABR† 設置平均比特率(ABR)的值,單位每秒比特 (spx_int32_t,單位每秒比特)

SPEEX_GET_ABR† Get average bit-rate (ABR) setting (spx_int32_t in bits per second)

SPEEX_GET_ABR† 獲取平均比特率(ABR)的值 (spx_int32_t,單位每秒比特)

SPEEX_SET_PLC_TUNING† Tell the encoder to optimize encoding for a certain percentage of packet loss (spx_int32_t in percent)

SPEEX_SET_PLC_TUNING† 告訴編碼器對於已肯定的丟包率進行優化編碼 (spx_int32_t,單位百分比)

SPEEX_GET_PLC_TUNING† Get the current tuning of the encoder for PLC (spx_int32_t in percent)

SPEEX_GET_PLC_TUNING† 獲取編碼器的包丟失隱藏的當前調整值 (spx_int32_t,單位百分比)

SPEEX_SET_VBR_MAX_BITRATE† Set the maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)

SPEEX_SET_VBR_MAX_BITRATE† 設置可變比特率容許的最大比特率 (spx_int32_t,單位每秒比特)

SPEEX_GET_VBR_MAX_BITRATE† Get the current maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)

SPEEX_GET_VBR_MAX_BITRATE† 獲取當前可變比特率容許的最大比特率 (spx_int32_t,單位每秒比特)

SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (spx_int32_t, default is on)

SPEEX_SET_HIGHPASS 設置高通濾波器爲打開(1)或者關閉(0) (spx_int32_t,默認爲打開)

SPEEX_GET_HIGHPASS Get the current high-pass filter status (spx_int32_t)

SPEEX_GET_HIGHPASS 獲取當前高通濾波器狀態 (spx_int32_t)

    applies only to the encoder

    僅適用於編碼器

    applies only to the decoder

    僅適用於解碼器

  1. Mode queries 模式查詢

Speex modes have a query system similar to the speex_encoder_ctl and speex_decoder_ctl calls. Since modes are read-only, it is only possible to get information about a particular mode. The function used to do that is:

Speex的模式有一個查詢系統,相似於speex_encoder_ctl和speex_decoder_ctl這樣的調用。由於模式是隻讀的,因此它只能獲取模式的詳細信息。函數用法以下:

void speex_mode_query(SpeexMode *mode, int request, void *ptr);

 

The admissible values for request are (unless otherwise note, the values are returned through ptr):

容許請求的值爲(除非另有說明,返回值都是放入prt):

SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode

SPEEX_MODE_FRAME_SIZE 獲取模式的幀大小(單位採樣個數)

SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through ptr (integer in bps).

SPEEX_SUBMODE_BITRATE 獲取子模式數量的比特率放入ptr(整數,單位bps)

  1. Packing and in-band signalling 封包和帶內信號

Sometimes it is desirable to pack more than one frame per packet (or other basic unit of storage). The proper way to do it is to call speex_encode N times before writing the stream with speex_bits_write. In cases where the number of frames is not determined by an out-of-band mechanism, it is possible to include a terminator code. That terminator consists of the code 15 (decimal) encoded with 5 bits, as shown in Table 9.2. Note that as of version 1.0.2, calling speex_bits_write automatically inserts the terminator so as to fill the last byte. This doesn't involves any overhead and makes sure Speex can always detect when there is no more frame in a packet.

有時咱們但願每一個包(或其餘基本存儲單元)打包超過一幀。正確作法是在調用speex_bits_write寫入流以前調用N次speex_encode。這種狀況下的幀數不是由帶外機制決定的,它會包含一個終結碼。如表9.2所示,這個終結碼是由用5bits編碼的Mode 15組成。若是是1.0.2版本需注意,調用speex_bits_write時,爲了填充最後字節,它會自動添加終結碼。這不會增長開銷,並能確保Speex一直檢測到包中沒有更多幀爲止。

 

It is also possible to send in-band "messages" to the other side. All these messages are encoded as "pseudo-frames" of mode 14 which contain a 4-bit message type code, followed by the message. Table 5.1 lists the available codes, their meaning and the size of the message that follows. Most of these messages are requests that are sent to the encoder or decoder on the other end, which is free to comply or ignore them. By default, all in-band messages are ignored.

固然也能夠經過帶內"消息"的方法,全部這些消息是做爲Mode14的"僞幀"編碼的,Mode14包含4bit的消息類型代碼。表5.1列出了可用代碼的說明和大小,發送給編/解碼器的的消息大部分均可隨意的被接受或被忽略。默認狀況下,全部帶內消息都被忽略掉了。

 

Code

Size (bits)

Content

0

1

Asks decoder to set perceptual enhancement off (0) or on(1)

1

1

Asks (if 1) the encoder to be less "agressive" due to high packet loss

2

4

Asks encoder to switch to mode N

3

4

Asks encoder to switch to mode N for low-band

4

4

Asks encoder to switch to mode N for high-band

5

4

Asks encoder to switch to quality N for VBR

6

4

Request acknowloedge (0=no, 1=all, 2=only for in-band data)

7

4

Asks encoder to set CBR (0), VAD(1), DTX(3), VBR(5), VBR+DTX(7)

8

8

Transmit (8-bit) character to the other end

9

8

Intensity stereo information

10

16

Announce maximum bit-rate acceptable (N in bytes/second)

11

16

reserved

12

32

Acknowledge receiving packet N

13

32

reserved

14

64

reserved

15

64

reserved

Table 5.1: In-band signalling codes

表5.1:帶內信號代碼

 

Finally, applications may define custom in-band messages using mode 13. The size of the message in bytes is encoded with 5 bits, so that the decoder can skip it if it doesn't know how to interpret it.

最後,一些應用會使用Mode 13自定義帶內消息,消息的字節大小是用5bits編碼的,因此若是編碼器不知道如何解析它就會跳過。

  1. Speech Processing API (libspeexdsp) 語音處理API(libspeexdsp

As of version 1.2beta3, the non-codec parts of the Speex package are now in a separate library called libspeexdsp. This library includes the preprocessor, the acoustic echo canceller, the jitter buffer, and the resampler. In a UNIX environment, it can be linked into a program by adding -lspeexdsp -lm to the compiler command line. Just like for libspeex, libspeexdsp calls are reentrant, but not thread-safe. That means that it is fine to use calls from many threads, but calls using the same state from multiple threads must be protected by mutexes.

從1.2beta3版本開始,Speex包的非編解碼器部分如今在單獨的庫,叫libspeexdsp。這個庫包含了預處理器、聲學回音消除器、抖動緩衝器、重採樣。在UNIX環境下,程序連接時須要給編譯器添加-lspeexdsp -lm命令行選項。像libspeex、libspeexdsp的調用是可重入的,但不是線程安全的。這意味着他能夠正常的使用多線程調用,可是多線程使用相同的狀態須要用互斥鎖保護

  1. Preprocessor 預處理器

In order to use the Speex preprocessor, you first need to:

#include <speex/speex_preprocess.h>

Then, a preprocessor state can be created as:

SpeexPreprocessState *preprocess_state = speex_preprocess_state_init(frame_size, sampling_rate);

and it is recommended to use the same value for frame_size as is used by the encoder (20 ms).

For each input frame, you need to call:

speex_preprocess_run(preprocess_state, audio_frame);

where audio_frame is used both as input and output. In cases where the output audio is not useful for a certain frame, it is possible to use instead:

speex_preprocess_estimate_update(preprocess_state, audio_frame);

This call will update all the preprocessor internal state variables without computing the output audio, thus saving some CPU cycles.

The behaviour of the preprocessor can be changed using:

speex_preprocess_ctl(preprocess_state, request, ptr);

which is used in the same way as the encoder and decoder equivalent. Options are listed in Section 6.1.1.

The preprocessor state can be destroyed using:

speex_preprocess_state_destroy(preprocess_state);

  1. Preprocessor options 預處理器選項

As with the codec, the preprocessor also has options that can be controlled using an ioctl()-like call. The available options are:

 

 

SPEEX_PREPROCESS_SET_DENOISE Turns denoising on(1) or off(2) (spx_int32_t)

SPEEX_PREPROCESS_GET_DENOISE Get denoising status (spx_int32_t)

SPEEX_PREPROCESS_SET_AGC Turns automatic gain control (AGC) on(1) or off(2) (spx_int32_t)

SPEEX_PREPROCESS_GET_AGC Get AGC status (spx_int32_t)

SPEEX_PREPROCESS_SET_VAD Turns voice activity detector (VAD) on(1) or off(2) (spx_int32_t)

SPEEX_PREPROCESS_GET_VAD Get VAD status (spx_int32_t)

SPEEX_PREPROCESS_SET_AGC_LEVEL

SPEEX_PREPROCESS_GET_AGC_LEVEL

SPEEX_PREPROCESS_SET_DEREVERB Turns reverberation removal on(1) or off(2) (spx_int32_t)

SPEEX_PREPROCESS_GET_DEREVERB Get reverberation removal status (spx_int32_t)

SPEEX_PREPROCESS_SET_DEREVERB_LEVEL Not working yet, do not use

SPEEX_PREPROCESS_GET_DEREVERB_LEVEL Not working yet, do not use

SPEEX_PREPROCESS_SET_DEREVERB_DECAY Not working yet, do not use

SPEEX_PREPROCESS_GET_DEREVERB_DECAY Not working yet, do not use

SPEEX_PREPROCESS_SET_PROB_START

SPEEX_PREPROCESS_GET_PROB_START

SPEEX_PREPROCESS_SET_PROB_CONTINUE

SPEEX_PREPROCESS_GET_PROB_CONTINUE

SPEEX_PREPROCESS_SET_NOISE_SUPPRESS Set maximumattenuationofthe noise in dB (negativespx_int32_t )

SPEEX_PREPROCESS_GET_NOISE_SUPPRESS Get maximumattenuationofthe noise in dB (negativespx_int32_t )

SPEEX_PREPROCESS_SET_ECHO_SUPPRESS Set maximumattenuationof the residual echoin dB (negativespx_int32_t )

SPEEX_PREPROCESS_GET_ECHO_SUPPRESS Set maximumattenuationof the residual echo in dB (negativespx_int32_t )

SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the echo in dB when near end is active (negative spx_int32_t)

SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the echo in dB when near end is active (negative spx_int32_t)

SPEEX_PREPROCESS_SET_ECHO_STATE Set the associated echo canceller for residual echo suppression (pointer or NULL for no residual echo suppression)

SPEEX_PREPROCESS_GET_ECHO_STATE Get the associated echo canceller (pointer)

  1. Echo Cancellation 迴音消除器

The Speex library now includes an echo cancellation algorithm suitable for Acoustic Echo Cancellation (AEC). In order to use the echo canceller, you first need to

#include <speex/speex_echo.h>

Then, an echo canceller state can be created by:

SpeexEchoState *echo_state = speex_echo_state_init(frame_size, filter_length);

where frame_size is the amount of data (in samples) you want to process at once and filter_length is the length (in samples) of the echo cancelling filter you want to use (also known as tail length). It is recommended to use a frame size in the order of 20 ms (or equal to the codec frame size) and make sure it is easy to perform an FFT of that size (powers of two are better than prime sizes). The recommendedtail length is approximately the third of the room reverberation time. For example, in a small room, reverberation time is in the order of 300 ms, so a tail length of 100 ms is a good choice (800 samples at 8000 Hz sampling rate).

Once the echo canceller state is created, audio can be processed by:

speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);

where input_frame is the audio as captured by the microphone, echo_frame is the signal that was played in the

speaker (and needs to be removed) and output_frame is the signal with echo removed.

One important thing to keep in mind is the relationship between input_frame and echo_frame. It is important that, at any time, any echo that is present in the input has already been sent to the echo canceller as echo_frame. In other words, the echo canceller cannot remove a signal that it hasn't yet received. On the other hand, the delay between the input signal and the echo signal must be small enough because otherwise part of the echo cancellation filter is inefficient. In the ideal case, you code would look like:

write_to_soundcard(echo_frame, frame_size); read_from_soundcard(input_frame, frame_size);

speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);

If you wish to further reduce the echo present in the signal, you can do so by associating the echo canceller to the preprocessor (see Section 6.1). This is done by calling:

speex_preprocess_ctl(preprocess_state, SPEEX_PREPROCESS_SET_ECHO_STATE,echo_state);

in the initialisation.

As of version 1.2-beta2, there is an alternative, simpler API that can be used instead of speex_echo_cancellation(). When audio capture and playback are handled asynchronously (e.g. in different threads or using the poll() or select() system call), it can be difficult to keep track of what input_frame comes with what echo_frame. Instead, the playback comtext/thread can simply call:

speex_echo_playback(echo_state, echo_frame);

every time an audio frame is played. Then, the capture context/thread calls:

speex_echo_capture(echo_state, input_frame, output_frame);

for every frame captured. Internally, speex_echo_playback() simply buffers the playback frame so it can be used by speex_echo_capture() to call speex_echo_cancel(). A side effect of using this alternate API is that the playback audio is delayed by two frames, which is the normal delay caused by the soundcard. When capture and playback are already synchronised, speex_echo_cancellation() is preferable since it gives better control on the exact input/echo timing.

The echo cancellation state can be destroyed with:

speex_echo_state_destroy(echo_state);

It is also possible to reset the state of the echo canceller so it can be reused without the need to create another state with:

speex_echo_state_reset(echo_state);

  1. Troubleshooting 發現並修理故障

There are several things that may prevent the echo canceller from working properly. One of them is a bug (or something suboptimal) in the code, but there are many others you should consider first

•    Using a different soundcard to do the capture and plaback will not work, regardless of what you may think. The only exception to that is if the two cards can be made to have their sampling clock "locked" on the same clock source. If not, the clocks will always have a small amount of drift, which will prevent the echo canceller from adapting.

•    The delay between the record and playback signals must be minimal. Any signal played has to "appear" on the playback (far end) signal slightly before the echo canceller "sees" it in the near end signal, but excessive delay means that part of the filter length is wasted. In the worst situations, the delay is such that it is longer than the filter length, in which case, no echo can be cancelled.

•    When it comes to echo tail length (filter length), longer is *not* better. Actually, the longer the tail length, the longer it takes for the filter to adapt. Of course, a tail length that is too short will not cancel enough echo, but the most common problem seen is that people set a very long tail length and then wonder why no echo is being cancelled.

•    Non-linear distortion cannot (by definition) be modeled by the linear adaptive filter used in the echo canceller and thus cannot be cancelled. Use good audio gear and avoid saturation/clipping.

Also useful is reading Echo Cancellation Demystified by Alexey Frunze , which explains the fundamental principles of echo cancellation. The details of the algorithm described in the article are different, but the general ideas of echo cancellation through adaptive filters are the same.

As of version 1.2beta2, a new echo_diagnostic.m tool is included in the source distribution. The first step is to define DUMP_ECHO_CANCEL_DATA during the build. This causes the echo canceller to automatically save the near-end, far-end and output signals to files (aec_rec.sw aec_play.sw and aec_out.sw). These are exactly what the AEC receives and outputs. From there, it is necessary to start Octave and type:

echo_diagnostic('aec_rec.sw', 'aec_play.sw', 'aec_diagnostic.sw', 1024);

The value of 1024 is the filter length and can be changed. There will be some (hopefully) useful messages printed and echo cancelled audio will be saved to aec_diagnostic.sw . If even that output is bad (almost no cancellation) then there is probably problem with the playback or recording process.

  1. Jitter Buffer 抖動緩衝器

The jitter buffer can be enabled by including:

包含頭文件能夠啓用抖動緩衝器:

#include <speex/speex_jitter.h>

 

and a new jitter buffer state can be initialised by:

並初始化一個新的抖動緩衝器狀態:

JitterBuffer * state = jitter_buffer_init(step);

 

where the step argument is the default time step (in timestamp units) used for adjusting the delay and doing concealment. A value of 1 is always correct, but higher values may be more convenient sometimes. For example, if you are only able to do concealment on 20ms frames, there is no point in the jitter buffer asking you to do it on one sample. Another example is that for video, it makes no sense to adjust the delay by less than a full frame. The value provided can always be changed at a later time.

其中step參數是用於調整延遲和作隱藏的默認時間步長(以時間戳爲單位)。值爲1始終是正確的,可是有時更高的值會更方便。例如,若是你能在20ms幀上作隱蔽,則抖動緩衝器中的點無需一個一個作。另外一個例子是針對視頻,對少於一幀的延時進行調整沒有意義,後面能夠隨時改變這個值。

 

The jitter buffer API is based on the JitterBufferPacket type, which is defined as:

抖動緩衝器API基於JitterBufferPacket類型,定義以下:

 

typedef struct {

char    *data;    /* Data bytes contained in the packet 包含在數據包中的字節數據*/

spx_uint32_t len;    /* Length of the packet in bytes 數據包長度,單位字節*/

spx_uint32_t timestamp;    /* Timestamp for the packet 數據包的時間戳*/

spx_uint32_t span; } JitterBufferPacket;    /* Time covered by the packet (timestamp units) 數據包覆蓋的時間(單位時間戳)*/

 

As an example, for audio the timestamp field would be what is obtained from the RTP timestamp field and the span would be the number of samples that are encoded in the packet. For Speex narrowband, span would be 160 if only one frame is included in the packet.

例如,對於音頻,時間戳字段將是從RTP時間戳字段得到的值,span參數將是在數據包中已編碼的採樣數量。對於Speex窄帶,若是這個數據包只包含一幀,則span參數將爲160。

 

When a packet arrives, it need to be inserter into the jitter buffer by:

當一個數據包到達時,它將被插入到抖動緩衝器中:

JitterBufferPacket packet;

/* Fill in each field in the packet struct 填充數據包結構體中每個字段*/

jitter_buffer_put(state, &packet);

 

When the decoder is ready to decode a packet the packet to be decoded can be obtained by:

當解碼器準備解碼一個數據包時,能夠經過如下方式得到要解碼的數據包:

int start_offset;

err = jitter_buffer_get(state, &packet, desired_span, &start_offset);

 

If jitter_buffer_put() and jitter_buffer_get() are called from different threads, then you need to protect the jitter buffer state with a mutex.

若是jitter_buffer_put()和jitter_buffer_get()函數被不一樣的線程調用,那麼你須要用互斥鎖來保護抖動緩衝器狀態。

 

Because the jitter buffer is designed not to use an explicit timer, it needs to be told about the time explicitly. This is done by calling:

由於抖動緩衝器被設計爲不使用顯式計時器,因此須要明確地告知它時間,經過以下調用實現:

jitter_buffer_tick(state);

 

This needs to be done periodically in the playing thread. This will be the last jitter buffer call before going to sleep (until more data is played back). In some cases, it may be preferable to use:

這須要在播放線程中按期調用。這是播放線程在休眠以前的最後一個抖動緩衝器調用(直到更多數據被回放)。在一些狀況下,以下調用會更好:

jitter_buffer_remaining_span(state, remaining);

 

The second argument is used to specify that we are still holding data that has not been written to the playback device. For instance, if 256 samples were needed by the soundcard (specified by desired_span), but jitter_buffer_get() returned 320 samples, we would have remaining=64.

第二個參數用指定還沒被寫入回放設備的仍保留在抖動緩衝器中的數據。好比,若聲卡須要256個採樣數據(由deaired_span指定),但jitter_buffer_get()函數返回320個採樣數據,則remaining=64。

  1. Resampler 重採樣器

As of version 1.2beta2, Speex includes a resampling modules. To make use of the resampler, it is necessary to include its header file:

#include <speex/speex_resampler.h>

For each stream that is to be resampled, it is necessary to create a resampler state with:

SpeexResamplerState *resampler;

resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality, & err);

where nb_channels is the number of channels that will be used (either interleaved or non-interleaved), input_rate is the sampling rate of the input stream, output_rate is the sampling rate of the output stream and quality is the requested quality setting (0 to 10). The quality parameter is useful for controlling the quality/complexity/latency tradeoff. Using a higher quality setting means less noise/aliasing, a higher complexity and a higher latency. Usually, a quality of 3 is acceptable for most desktop uses and quality 10 is mostly recommended for pro audio work. Quality 0 usually has a decent sound (certainly better than using linear interpolation resampling), but artifacts may be heard.

The actual resampling is performed using

err = speex_resampler_process_int(resampler, channelID, in, &in_length, out, & out_length);

where channelID is the ID of the channel to be processed. For a mono stream, use 0. The in pointer points to the first sample of the input buffer for the selected channel and out points to the first sample of the output. The size of the input and output buffers are specified by in_length and out_length respectively. Upon completion, these values are replaced by the number of samples read and written by the resampler. Unless an error occurs, either all input samples will be read or all output samples will be written to (or both). For floating-point samples, the function speex_resampler_process_float() behaves similarly.

It is also possible to process multiple channels at once. To be continued...

 

As of version 1.2beta2, Speex includes a resampling modules. To make use of the resampler, it is necessary to include its header file:

#include <speex/speex_resampler.h>

For each stream that is to be resampled, it is necessary to create a resampler state with:

SpeexResamplerState *resampler;

resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality, &err);

where nb_channels is the number of channels that will be used (either interleaved or non-interleaved), input_rate is the sampling rate of the input stream, output_rate is the sampling rate of the output stream and quality is the requested quality setting (0 to 10). The quality parameter is useful for controlling the quality/complexity/latency tradeoff. Using a higher quality setting means less noise/aliasing, a higher complexity and a higher latency. Usually, a quality of 3 is acceptable for most desktop uses and quality 10 is mostly recommended for pro audio work. Quality 0 usually has a decent sound (certainly better than using linear interpolation resampling), but artifacts may be heard.

The actual resampling is performed using

err = speex_resampler_process_int(resampler, channelID, in, &in_length, out, &out_length);

where channelID is the ID of the channel to be processed. For a mono stream, use 0. The in pointer points to the first sample of the input buffer for the selected channel and out points to the first sample of the output. The size of the input and output buffers are specified by in_length and out_length respectively. Upon completion, these values are replaced by the number of samples read and written by the resampler. Unless an error occurs, either all input samples will be read or all output samples will be written to (or both). For floating-point samples, the function speex_resampler_process_float() behaves similarly.

It is also possible to process multiple channels at once.

  1. Ring Buffer 環形緩衝器

Put some stuff there...

  1. Formats and standards

Speex can encode speech in both narrowband and wideband and provides different bit-rates. However, not all features need to be supported by a certain implementation or device. In order to be called "Speex compatible" (whatever that means), an implementation must implement at least a basic set of features.

At the minimum, all narrowband modes of operation MUST be supported at the decoder. This includes the decoding of a wideband bit-stream by the narrowband decoder(The wideband bit-stream contains an embedded narrowband bit-stream which can be decoded alone). If present, a wideband decoder MUST be able to decode a narrowband stream, and MAY either be able to decode all wideband modes or be able to decode the embedded narrowband part of all modes (which includes ignoring the high-band bits).

For encoders, at least one narrowband or wideband mode MUST be supported. The main reason why all encoding modes do not have to be supported is that some platforms may not be able to handle the complexity of encoding in some modes.

  1. RTP Payload Format

The RTP payloaddraft is includedin appendixC and the latest versionis available at http://www.speex.org/drafts/ latest. This draft has been sent (2003/02/26) to the Internet Engineering Task Force (IETF) and will be discussed at the March 18th meeting in San Francisco.

  1. MIME Type

For now, you should use the MIME type audio/x-speex for Speex-in-Ogg. We will apply for type audio/speex in the near future.

  1. Ogg file format

Speex bit-streams can be stored in Ogg files. In this case, the first packet of the Ogg file contains the Speex header described in table 7.1. All integer fields in the headers are stored as little-endian. The speex_string field must contain the "Speex " (with 3 trailing spaces), which identifies the bit-stream. The next field, speex_version contains the version of Speex that encoded the file. For now, refer to speex_header.[ch] for more info. The beginning of stream (b_o_s) flag is set to 1 for the header. The header packet has packetno=0 and granulepos=0.

The second packet contains the Speex comment header. The format used is the Vorbis comment format described here: http://www.xiph.org/ogg/vorbis/doc/v-comment.html . This packet has packetno=1 and granulepos=0.

The third and subsequent packets each contain one or more (number found in header) Speex frames. These are identified with packetno starting from 2 and the granulepos is the number of the last sample encoded in that packet. The last of these packets has the end of stream (e_o_s) flag is set to 1.

7 Formatsandstandards

Field

Type

Size

speex_string

char[]

8

speex_version

char[]

20

speex_version_id

int

4

header_size

int

4

rate

int

4

mode

int

4

mode_bitstream_version

int

4

nb_channels

int

4

bitrate

int

4

frame_size

int

4

vbr

int

4

frames_per_packet

int

4

extra_headers

int

4

reserved1

int

4

reserved2

int

4

Table 7.1: Ogg/Speex header packet

  1. Introduction to CELP Coding

Do not meddle in the affairs of poles, for they are subtle and quick to leave the unit circle.

Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section 9. The CELP technique is based on three ideas:

  1. The use of a linear prediction (LP) model to model the vocal tract
  2. The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
  3. The search performed in closed-loop in a "perceptually weighted domain"

This section describes the basic ideas behind CELP. This is still a work in progress.

  1. Source-Filter Model of Speech Prediction

The source-filter model of speech productionassumes that the vocal cords are the source of spectrally flat sound (the excitation signal), and that the vocal tract acts as a filter to spectrally shape the various sounds of speech. While still an approximation, the model is widely used in speech coding because of its simplicity.Its use is also the reason why most speech codecs (Speex included) perform badly on music signals. The different phonemes can be distinguished by their excitation (source) and spectral shape (filter). Voiced sounds (e.g. vowels) have an excitation signal that is periodic and that can be approximated by an impulse train in the time domain or by regularly-spaced harmonics in the frequency domain. On the other hand, fricatives (such as the "s", "sh" and "f" sounds) have an excitation signal that is similar to white Gaussian noise. So called voice fricatives (such as "z" and "v") have excitation signal composed of an harmonic part and a noisy part.

The source-filter model is usually tied with the use of Linear prediction. The CELP model is based on source-filter model, as can be seen from the CELP decoder illustrated in Figure 8.1.

  1. Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal x[n] using a linear combination of its past samples:

N y[n] = ∑aix[ni]

i=1

where y[n] is the linear prediction of x[n]. The prediction error is thus given by:

N

e[n] = x[n]y[n]= x[n]aix[ni]

i=1

The goal of the LPC analysis is to find the best prediction coefficients ai which minimize the quadratic error function:

    E    

That can be done by making all derivatives E equal to zero:

i

E

    ai ai n=0    i=1 i

 

Figure 8.1: The CELP model of speech synthesis (decoder)

For an order N filter, the filter coefficients ai are found by solving the system N ×N linear system Ra = r, where

R

R(1) r =     ...    

R(2) 

R(N)

with R(m), the auto-correlation of the signal x[n], computed as:

N1

R(m) = ∑ x[i]x[im]

i=0

Because R is Hermitian Toeplitz, the Levinson-Durbin algorithm can be used, making the solution to the problem O N2 instead of O N3. Also, it can be proven that all the roots of A(z) are within the unit circle, which means that 1/A(z) is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply R(0) by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

  1. Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal e[n] by a gain times the past of the excitation:

e[n] p[n] =βe[nT] ,

where T is the pitch period,β is the pitch gain. We call that long-term prediction since the excitation is predicted from e[nT] with T N.

Figure 8.2: Standard noise shaping in CELP. Arbitrary y-axis offset.

  1. Innovation Codebook

The final excitation e[n] will be the sum of the pitch prediction and an innovation signal c[n] taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by

e[n] = p[n]+c[n]=βe[nT]+c[n] .

The quantization of c[n] is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal X(z) as

C(z)

X(z) = A(z)(1βzT)

  1. Noise Weighting

Most (if not all) modern audio codecs attempt to "shape" the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. In order to maximize speech quality, CELP codecs minimize the mean square of the error (noise) in the perceptually weighted domain. This means that a perceptual noise weighting filter W(z) is applied to the error signal in the encoder. In most CELP codecs, W(z) is a pole-zero weighting filter derived from the linear prediction coefficients (LPC), generally using bandwidth expansion. Let the spectral envelope be represented by the synthesis filter 1/A(z), CELP codecs typically derive the noise weighting filter as

    W ,    (8.1)

A(z/γ2)

where γ1 = 0.9 and γ2 = 0.6 in the Speex reference implementation. If a filter A(z) has (complex) poles at pi in the z-plane, the filter A(z/γ) will have its poles at pi pi, making it a flatter version of A(z).

The weighting filter is applied to the error signal used to optimize the codebook search through analysis-by-synthesis (AbS). This results in a spectral shape of the noise that tends towards 1/W(z). While the simplicity of the model has been an important reason for the success of CELP, it remains that W(z) is a very rough approximation for the perceptually optimal noise weighting function. Fig. 8.2 illustrates the noise shaping that results from Eq. 8.1. Throughout this paper, we refer to W(z) as the noise weighting filter and to 1/W(z) as the noise shaping filter (or curve).

  1. Analysis-by-Synthesis

One of the main principles behind CELP is called Analysis-by-Synthesis (AbS), meaning that the encoding (analysis) is performed by perceptually optimising the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the "best sounding" selection criterion implies a human listener.

In order to achieve real-time encoding using limited computing resources, the CELP optimisation is broken down into smaller, more manageable, sequential searches using the perceptual weighting function described earlier.

  1. Speex narrowband mode

This section looks at how Speex works for narrowband (8kHz sampling rate) operation. The frame size for this mode is 20ms, corresponding to 160 samples. Each frame is also subdivided into 4 sub-frames of 40 samples each. Also many design decisions were based on the original goals and assumptions:

  • Minimizing the amount of information extracted from past frames (for robustness to packet loss)
  • Dynamically-selectable codebooks (LSP, pitch and innovation)
  • sub-vector fixed (innovation) codebooks
  1. Whole-Frame Analysis

In narrowband, Speex frames are 20 ms long (160 samples) and are subdivided in 4 sub-frames of 5 ms each (40 samples). For most narrowband bit-rates (8 kbps and above), the only parameters encoded at the frame level are the Line Spectral Pairs (LSP) and a global excitation gain gframe, as shown in Fig. 9.1. All other parameters are encoded at the sub-frame level.

Linear prediction analysis is performed once per frame using an asymmetric Hamming window centered on the fourth subframe. Because linear prediction coefficients (LPC) are not robust to quantization, they are first are converted to line spectral pairs (LSP). The LSP's are considered to be associated to the 4th sub-frames and the LSP's associated to the first 3 sub-frames are linearly interpolated using the current and previous LSP coefficients. The LSP coefficients and converted back to the LPC filter Aˆ(z). The non-quantized interpolated filter is denoted A(z) and can be used for the weighting filter W(z) because it does not need to be available to the decoder.

To make Speex more robust to packet loss, no prediction is applied on the LSP coefficients prior to quantization. The LSPs are encoded using vector quantizatin (VQ) with 30 bits for higher quality modes and 18 bits for lower quality.

  1. Sub-Frame Analysis-by-Synthesis

The analysis-by-synthesis (AbS) encoder loop is described in Fig. 9.2. There are three main aspects where Speex significantly differs from most other CELP codecs. First, while most recent CELP codecs make use of fractional pitch estimation with a single gain, Speex uses an integer to encode the pitch period, but uses a 3-tap predictor (3 gains). The adaptive codebook contribution ea[n] can thus be expressed as:

    ea[n] = g0e[nT 1]+g1e[nT]+g2e[nT +1]    (9.1)

where g0, g1 and g2 are the jointly quantized pitch gains and e[n] is the codec excitation memory. It is worth noting that when the pitch is smaller than the sub-frame size, we repeat the excitation at a period T. For example, when nT +1 ≥ 0, we use n2T +1 instead. In most modes, the pitch period is encoded with 7 bits in the [17,144] range and the βi coefficients are vector-quantized using 7 bits at higher bit-rates (15 kbps narrowband and above) and 5 bits at lower bit-rates (11 kbps narrowband and below).

Figure 9.1: Frame open-loop analysis

Figure 9.2: Analysis-by-synthesis closed-loop optimization on a sub-frame.

Many current CELP codecs use moving average (MA) prediction to encode the fixed codebook gain. This provides slightly better coding at the expense of introducing a dependency on previously encoded frames. A second difference is that Speex encodes the fixed codebook gain as the product of the global excitation gain gframe with a sub-frame gain corrections gsubf. This increases robustness to packet loss by eliminating the inter-frame dependency. The sub-frame gain correction is encoded before the fixed codebook is searched (not closed-loop optimized) and uses between 0 and 3 bits per sub-frame, depending on the bit-rate.

The third difference is that Speex uses sub-vector quantization of the innovation (fixed codebook) signal instead of an algebraic codebook. Each sub-frame is divided into sub-vectors of lengths ranging between 5 and 20 samples. Each subvector is chosen from a bitrate-dependent codebook and all sub-vectors are concatenated to form a sub-frame. As an example, the 3.95 kbps mode uses a sub-vector size of 20 samples with 32 entries in the codebook (5 bits). This means that the innovation is encoded with 10 bits per sub-frame, or 2000 bps. On the other hand, the 18.2 kbps mode uses a sub-vector size of 5 samples with 256 entries in the codebook (8 bits), so the innovation uses 64 bits per sub-frame, or 12800 bps.

  1. Bit allocation

There are 7 different narrowband bit-rates defined for Speex, ranging from 250 bps to 24.6 kbps, although the modes below 5.9 kbps should not be used for speech. The bit-allocation for each mode is detailed in table 9.1. Each frame starts with the mode ID encoded with 4 bits which allows a range from 0 to 15, though only the first 7 values are used (the others are reserved). The parameters are listed in the table in the order they are packed in the bit-stream. All frame-based parameters are packed before sub-frame parameters. The parameters for a certain sub-frame are all packed before the following sub-frame is packed. Note that the "OL" in the parameter description means that the parameter is an open loop estimation based on the whole frame.

Parameter

Update rate

0

1

2

3

4

5

6

7

8

Wideband bit

frame

1

1

1

1

1

1

1

1

1

Mode ID

frame

4

4

4

4

4

4

4

4

4

LSP

frame

0

18

18

18

18

30

30

30

18

OL pitch

frame

0

7

7

0

0

0

0

0

7

OL pitch gain

frame

0

4

0

0

0

0

0

0

4

OL Exc gain

frame

0

5

5

5

5

5

5

5

5

Fine pitch

sub-frame

0

0

0

7

7

7

7

7

0

Pitch gain

sub-frame

0

0

5

5

5

7

7

7

0

Innovation gain

sub-frame

0

1

0

1

1

3

3

3

0

Innovation VQ

sub-frame

0

0

16

20

35

48

64

96

10

Total

frame

5

43

119

160

220

300

364

492

79

Table 9.1: Bit allocation for narrowband modes

So far, no MOS (Mean Opinion Score) subjective evaluation has been performed for Speex. In order to give an idea of the quality achievable with it, table 9.2 presents my own subjective opinion on it. It sould be noted that different people will perceive the quality differently and that the person that designed the codec often has a bias (one way or another) when it comes to subjective evaluation. Last thing, it should be noted that for most codecs (including Speex) encoding quality sometimes varies dependingon the input. Note that the complexityis only approximate(within 0.5 mflops and using the lowest complexity setting). Decoding requires approximately 0.5 mflops in most modes (1 mflops with perceptual enhancement).

  1. Perceptual enhancement

This section was only valid for version 1.1.12 and earlier. It does not apply to version 1.2-beta1 (and later), for which the new perceptual enhancement is not yet documented.

This part of the codec only applies to the decoder and can even be changed without affecting inter-operability. For that reason, the implementation provided and described here should only be considered as a reference implementation. The enhancement system is divided into two parts. First, the synthesis filter S(z) = 1/A(z) is replaced by an enhanced filter:

S′(z) = A(z/a2)A(z/a3)

A(z)A(z/a1)

Mode

Quality

Bit-rate (bps)

mflops

Quality/description

0

-

250

0

No transmission (DTX)

1

0

2,150

6

Vocoder (mostly for comfort noise)

2

2

5,950

9

Very noticeable artifacts/noise, good intelligibility

3

3-4

8,000

10

Artifacts/noise sometimes noticeable

4

5-6

11,000

14

Artifacts usually noticeable only with headphones

5

7-8

15,000

11

Need good headphones to tell the difference

6

9

18,200

17.5

Hard to tell the difference even with good headphones

7

10

24,600

14.5

Completely transparent for voice, good quality music

8

1

3,950

10.5

Very noticeable artifacts/noise, good intelligibility

9

-

-

-

reserved

10

-

-

-

reserved

11

-

-

-

reserved

12

-

-

-

reserved

13

-

-

-

Application-defined, interpreted by callback or skipped

14

-

-

-

Speex in-band signaling

15

-

-

-

Terminator code

Table 9.2: Quality versus bit-rate

where a1 and a2 depend on the mode in use and a with r = .9. The second part of the enhancement consists of using a comb filter to enhance the pitch in the excitation domain.

  1. Speex wideband mode (sub-band CELP)

For wideband, the Speex approach uses a quadrature mirror filter (QMF) to split the band in two. The 16 kHz signal is thus divided into two 8 kHz signals, one representing the low band (0-4 kHz), the other the high band (4-8 kHz). The low band is encoded with the narrowband mode described in section 9 in such a way that the resulting "embedded narrowband bit-stream" can also be decoded with the narrowband decoder. Since the low band encoding has already been described, only the high band encoding is described in this section.

  1. Linear Prediction

The linear prediction part used for the high-band is very similar to what is done for narrowband. The only difference is that we use only 12 bits to encode the high-band LSP's using a multi-stage vector quantizer (MSVQ). The first level quantizes the 10 coefficients with 6 bits and the error is then quantized using 6 bits, too.

  1. Pitch Prediction

That part is easy: there's no pitch prediction for the high-band. There are two reasons for that. First, there is usually little harmonic structure in this band (above 4 kHz). Second, it would be very hard to implement since the QMF folds the 4-8 kHz band into 4-0 kHz (reversing the frequency axis), which means that the location of the harmonics is no longer at multiples of the fundamental (pitch).

  1. Excitation Quantization

The high-band excitation is coded in the same way as for narrowband.

  1. Bit allocation

For the wideband mode, the entire narrowband frame is packed before the high-band is encoded. The narrowband part of the bit-stream is as defined in table 9.1. The high-band follows, as described in table 10.1. For wideband, the mode ID is the same as the Speex quality setting and is defined in table 10.2. This also means that a wideband frame may be correctly decoded by a narrowband decoder with the only caveat that if more than one frame is packed in the same packet, the decoder will need to skip the high-band parts in order to sync with the bit-stream.

Parameter

Update rate

0

1

2

3

4

Wideband bit

frame

1

1

1

1

1

Mode ID

frame

3

3

3

3

3

LSP

frame

0

12

12

12

12

Excitation gain

sub-frame

0

5

4

4

4

Excitation VQ

sub-frame

0

0

20

40

80

Total

frame

4

36

112

192

352

Table 10.1: Bit allocation for high-band in wideband mode

10 Speexwidebandmode(sub-bandCELP)

Mode/Quality

Bit-rate (bps)

Quality/description

0

3,950

Barely intelligible (mostly for comfort noise)

1

5,750

Very noticeable artifacts/noise, poor intelligibility

2

7,750

Very noticeable artifacts/noise, good intelligibility

3

9,800

Artifacts/noise sometimes annoying

4

12,800

Artifacts/noise usually noticeable

5

16,800

Artifacts/noise sometimes noticeable

6

20,600

Need good headphones to tell the difference

7

23,800

Need good headphones to tell the difference

8

27,800

Hard to tell the difference even with good headphones

9

34,200

Hard to tell the difference even with good headphones

10

42,200

Completely transparent for voice, good quality music

Table 10.2: Quality versus bit-rate for the wideband encoder

  1. A Sample code

This section shows sample code for encoding and decoding speech using the Speex API. The commands can be used to encode and decode a file by calling:

% sampleenc in_file.sw | sampledec out_file.sw where both files are raw (no header) files encoded at 16 bits per sample (in the machine natural endianness).

  1. A.1 sampleenc.c

sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex stream to stdout. Note that the packing used is not compatible with that of speexenc/speexdec.

Listing A.1: Source code for sampleenc

  1. #include <speex/speex.h>
  2. #include <stdio.h>

3

  1. /*The frame size in hardcoded for this sample code but it doesn't have to be*/
  2. #define FRAME_SIZE 160
  3. int main( int argc, char **argv)
  4. {
  5. char *inFile;
  6. FILE *fin;
  7. short in[FRAME_SIZE];
  8. float input[FRAME_SIZE];
  9. char cbits[200];
  10. int nbBytes;
  11. /*Holds the state of the encoder*/
  12. void *state;
  13. /*Holds bits so they can be read and written to by the Speex routines*/
  14. SpeexBits bits;
  15. int i, tmp;

19

  1. /*Create a new encoder state in narrowband mode*/
  2. state = speex_encoder_init(&speex_nb_mode);

22

  1. /*Set the quality to 8 (15 kbps)*/
  2. tmp=8;
  3. speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);

26

  1. inFile = argv[1];
  2. fin = fopen(inFile, "r");

29

  1. /*Initialization of the structure that holds the bits*/
  2. speex_bits_init(&bits);
  3. while (1)
  4. {
  5. /*Read a 16 bits/sample audio frame*/
  6. fread(in, sizeof( short), FRAME_SIZE, fin);
  7. if (feof(fin))
  8. break;
  9. /*Copy the 16 bits values to float so Speex can work on them*/

A Samplecode

  1. for (i=0;i<FRAME_SIZE;i++)
  2. input[i]=in[i];

41

  1. /*Flush all the bits in the struct so we can encode a new frame*/
  2. speex_bits_reset(&bits);

44

  1. /*Encode the frame*/
  2. speex_encode(state, input, &bits);
  3. /*Copy the bits to an array of char that can be written*/
  4. nbBytes = speex_bits_write(&bits, cbits, 200);

49

  1. /*Write the size of the frame first. This is what sampledec expects but
  2. it's likely to be different in your own application*/
  3. fwrite(&nbBytes, sizeof( int), 1, stdout);
  4. /*Write the compressed data*/
  5. fwrite(cbits, 1, nbBytes, stdout);

55

56    }

57

  1. /*Destroy the encoder state*/
  2. speex_encoder_destroy(state);
  3. /*Destroy the bit-packing struct*/
  4. speex_bits_destroy(&bits);
  5. fclose(fin);
  6. return 0;
  7. }
    1. A.2 sampledec.c

sampledec reads a Speex stream from stdin, decodes it and outputs it to a raw 16 bits/sample file. Note that the packing used is not compatible with that of speexenc/speexdec.

Listing A.2: Source code for sampledec

  1. #include <speex/speex.h>
  2. #include <stdio.h>

3

  1. /*The frame size in hardcoded for this sample code but it doesn't have to be*/
  2. #define FRAME_SIZE 160
  3. int main( int argc, char **argv)
  4. {
  5. char *outFile;
  6. FILE *fout;
  7. /*Holds the audio that will be written to file (16 bits per sample)*/ 11     short out[FRAME_SIZE];
  8. /*Speex handle samples as float, so we need an array of floats*/
  9. float output[FRAME_SIZE];
  10. char cbits[200];
  11. int nbBytes;
  12. /*Holds the state of the decoder*/
  13. void *state;
  14. /*Holds bits so they can be read and written to by the Speex routines*/ 19    SpeexBits bits;

20    int i, tmp;

21

  1. /*Create a new decoder state in narrowband mode*/
  2. state = speex_decoder_init(&speex_nb_mode);

A Samplecode

24

  1. /*Set the perceptual enhancement on*/
  2. tmp=1;
  3. speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);

28

  1. outFile = argv[1];
  2. fout = fopen(outFile, "w");

31

  1. /*Initialization of the structure that holds the bits*/
  2. speex_bits_init(&bits);
  3. while (1)
  4. {
  5. /*Read the size encoded by sampleenc, this part will likely be
  6. different in your application*/
  7. fread(&nbBytes, sizeof( int), 1, stdin);
  8. fprintf (stderr, "nbBytes:%d\n", nbBytes);

  1. if (feof(stdin))
  2. break;

42

  1. /*Read the "packet" encoded by sampleenc*/
  2. fread(cbits, 1, nbBytes, stdin);
  3. /*Copy the data into the bit-stream struct*/
  4. speex_bits_read_from(&bits, cbits, nbBytes);

47

  1. /*Decode the data*/
  2. speex_decode(state, &bits, output);

50

  1. /*Copy from float to short (16 bits) for output*/
  2. for (i=0;i<FRAME_SIZE;i++)
  3. out[i]=output[i];

54

  1. /*Write the decoded audio to file*/
  2. fwrite(out, sizeof( short), FRAME_SIZE, fout);
  3. }

58

  1. /*Destroy the decoder state*/
  2. speex_decoder_destroy(state);
  3. /*Destroy the bit-stream truct*/
  4. speex_bits_destroy(&bits);
  5. fclose(fout);
  6. return 0;
  7. }
  1. B Jitter Buffer for Speex

Listing B.1: Example of using the jitter buffer for Speex packets

  1. #include <speex/speex_jitter.h>
  2. #include "speex_jitter_buffer.h"

3

  1. #ifndef NULL
  2. #define NULL 0
  3. #endif

7

8

  1. void speex_jitter_init(SpeexJitter *jitter, void *decoder, int sampling_rate)
  2. {
  3. jitter->dec = decoder;
  4. speex_decoder_ctl(decoder, SPEEX_GET_FRAME_SIZE, &jitter->frame_size);

13

14    jitter->packets = jitter_buffer_init(jitter->frame_size);

15

  1. speex_bits_init(&jitter->current_packet);
  2. jitter->valid_bits = 0;

18

19 }

20

  1. void speex_jitter_destroy(SpeexJitter *jitter)
  2. {
  3. jitter_buffer_destroy(jitter->packets);
  4. speex_bits_destroy(&jitter->current_packet);
  5. }

26

  1. void speex_jitter_put(SpeexJitter *jitter, char *packet, int len, int timestamp)
  2. {
  3. JitterBufferPacket p;
  4. p.data = packet;
  5. p.len = len;
  6. p.timestamp = timestamp;
  7. p.span = jitter->frame_size;
  8. jitter_buffer_put(jitter->packets, &p);
  9. }

36

  1. void speex_jitter_get(SpeexJitter *jitter, spx_int16_t *out, int *current_timestamp )
  2. {
  3. int i;
  4. int ret;
  5. spx_int32_t activity;
  6. char data[2048];
  7. JitterBufferPacket packet;
  8. packet.data = data;

45

  1. if (jitter->valid_bits)
  2. {
  3. /* Try decoding last received packet */
  4. ret = speex_decode_int(jitter->dec, &jitter->current_packet, out);
  5. if (ret == 0)
  6. {
  7. jitter_buffer_tick(jitter->packets);
  8. return;
  9. } else {
  10. jitter->valid_bits = 0;
  11. }
  12. }

58

59    ret = jitter_buffer_get(jitter->packets, &packet, jitter->frame_size, NULL);

60

  1. if (ret != JITTER_BUFFER_OK)
  2. {
  3. /* No packet found */

64

  1. /*fprintf (stderr, "lost/late frame\n");*/
  2. /*Packet is late or lost*/
  3. speex_decode_int(jitter->dec, NULL, out);
  4. } else {
  5. speex_bits_read_from(&jitter->current_packet, packet.data, packet.len);
  6. /* Decode packet */
  7. ret = speex_decode_int(jitter->dec, &jitter->current_packet, out);
  8. if (ret == 0)
  9. {
  10. jitter->valid_bits = 1;
  11. } else {
  12. /* Error while decoding */
  13. for (i=0;i<jitter->frame_size;i++)
  14. out[i]=0;
  15. }
  16. }
  17. speex_decoder_ctl(jitter->dec, SPEEX_GET_ACTIVITY, &activity);
  18. if (activity < 30)
  19. jitter_buffer_update_delay(jitter->packets, &packet, NULL);
  20. jitter_buffer_tick(jitter->packets);
  21. }

86

  1. int speex_jitter_get_pointer_timestamp(SpeexJitter *jitter)
  2. {
  3. return jitter_buffer_get_pointer_timestamp(jitter->packets);
  4. }
  1. C IETF RTP Profile

AVT

Internet-Draft

G. Herlein

Intended status: Standards Track

J. Valin

Expires: October 24, 2007

University of Sherbrooke

A. Heggestad

April 22, 2007

RTP Payload Format for the Speex Codec draft-ietf-avt-rtp-speex-01 (non-final)

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as InternetDrafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on October 24, 2007.

Copyright Notice

Copyright (C) The Internet Society (2007).

 

[Page 1]

April 2007

Abstract

Speex is an open-source voice codec suitable for use in Voice over IP (VoIP) type applications. This document describes the payload format for Speex generated bit streams within an RTP packet. Also included here are the necessary details for the use of Speex with the Session Description Protocol (SDP).

[Page 2]

April 2007

Editors Note

All references to RFC XXXX are to be replaced by references to the RFC number of this memo, when published.

Table of Contents

  1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
  2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
  3. RTP usage for Speex . . . . . . . . . . . . . . . . . . . . . 6
    1. RTP Speex Header Fields . . . . . . . . . . . . . . . . . 6
    2. RTP payload format for Speex . . . . . . . . . . . . . . . 6
    3. Speex payload . . . . . . . . . . . . . . . . . . . . . . 6
    4. Example Speex packet . . . . . . . . . . . . . . . . . . . 7
    5. Multiple Speex frames in a RTP packet . . . . . . . . . . 7
  4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
    1. Media Type Registration . . . . . . . . . . . . . . . . . 9

4.1.1. Registration of media type audio/speex . . . . . . . . 9

  1. SDP usage of Speex . . . . . . . . . . . . . . . . . . . . . . 11
  2. Security Considerations . . . . . . . . . . . . . . . . . . . 14
  3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
  4. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
    1. Normative References . . . . . . . . . . . . . . . . . . . 16
    2. Informative References . . . . . . . . . . . . . . . . . . 16

Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17

Intellectual Property and Copyright Statements . . . . . . . . . . 18 [Page 3]

April 2007

  1. Introduction

Speex is based on the CELP [CELP] encoding technique with support for either narrowband (nominal 8kHz), wideband (nominal 16kHz) or ultrawideband (nominal 32kHz). The main characteristics can be summarized as follows:

  • Free software/open-source
  • Integration of wideband and narrowband in the same bit-stream
  • Wide range of bit-rates available
  • Dynamic bit-rate switching and variable bit-rate (VBR)
  • Voice Activity Detection (VAD, integrated with VBR)
  • Variable complexity

To be compliant with this specification, implementations MUST support 8 kHz sampling rate (narrowband)" and SHOULD support 8 kbps bitrate.

The sampling rate MUST be 8, 16 or 32 kHz.

[Page 4]

April 2007

  1. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [RFC2119] and indicate requirement levels for compliant RTP implementations.

[Page 5]

April 2007

  1. RTP usage for Speex

3.1. RTP Speex Header Fields

The RTP header is defined in the RTP specification [RFC3550]. This section defines how fields in the RTP header are used.

Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used, or signaled dynamically out-of-band (e.g., using SDP).

Marker (M) bit: The M bit is set to one to indicate that the RTP packet payload contains at least one complete frame Extension (X) bit: Defined by the RTP profile used.

Timestamp: A 32-bit word that corresponds to the sampling instant for the first frame in the RTP packet.

3.2. RTP payload format for Speex

The RTP payload for Speex has the format shown in Figure 1. No additional header fields specific to this payload format are required. For RTP based transportation of Speex encoded audio the standard RTP header [RFC3550] is followed by one or more payload data blocks. An optional padding terminator may also be used.

    0    1    2    3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    RTP Header    |

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

    |    one or more frames of Speex ....    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    one or more frames of Speex ....    |    padding    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 1: RTP payload for Speex

3.3. Speex payload

For the purposes of packetizing the bit stream in RTP, it is only necessary to consider the sequence of bits as output by the Speex encoder [speexenc], and present the same sequence to the decoder.

The payload format described here maintains this sequence.

 

A typical Speex frame, encoded at the maximum bitrate, is approx. 110

Herlein, et al.

Expires October 24, 2007

[Page 6]

Internet-Draft

Speex

April 2007

octets and the total number of Speex frames SHOULD be kept less than the path MTU to prevent fragmentation. Speex frames MUST NOT be fragmented across multiple RTP packets,

An RTP packet MAY contain Speex frames of the same bit rate or of varying bit rates, since the bit-rate for a frame is conveyed in band with the signal.

The encoding and decoding algorithm can change the bit rate at any 20 msec frame boundary, with the bit rate change notification provided in-band with the bit stream. Each frame contains both "mode" (narrowband, wideband or ultra-wideband) and "sub-mode" (bit-rate) information in the bit stream. No out-of-band notification is required for the decoder to process changes in the bit rate sent by the encoder.

Sampling rate values of 8000, 16000 or 32000 Hz MUST be used. Any other sampling rates MUST NOT be used.

The RTP payload MUST be padded to provide an integer number of octets as the payload length. These padding bits are LSB aligned in network octet order and consist of a 0 followed by all ones (until the end of the octet). This padding is only required for the last frame in the packet, and only to ensure the packet contents ends on an octet boundary.

3.4. Example Speex packet

In the example below we have a single Speex frame with 5 bits of padding to ensure the packet size falls on an octet boundary.

    0    1    2    3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    RTP Header    |

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

    |    ..speex data..    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    ..speex data..    |0 1 1 1 1|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3.5. Multiple Speex frames in a RTP packet

Below is an example of two Speex frames contained within one RTP packet. The Speex frame length in this example fall on an octet boundary so there is no padding.

Speex codecs [speexenc] are able to detect the bitrate from the

Herlein, et al.

Expires October 24, 2007

[Page 7]

Internet-Draft

Speex

April 2007

payload and are responsible for detecting the 20 msec boundaries between each frame.

    0    1    2    3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    RTP Header    |

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

    |    ..speex frame 1..    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    ..speex frame 1..    |    ..speex frame 2..    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    |    ..speex frame 2..    |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[Page 8]

April 2007

  1. IANA Considerations

This document defines the Speex media type.

  1. Media Type Registration

This section describes the media types and names associated with this payload format. The section registers the media types, as per RFC4288 [RFC4288]

4.1.1. Registration of media type audio/speex

Media type name: audio

Media subtype name: speex Required parameters:

None

Optional parameters:

ptime: see RFC 4566. SHOULD be a multiple of 20 msec.

maxptime: see RFC 4566. SHOULD be a multiple of 20 msec.

Encoding considerations:

This media type is framed and binary, see section 4.8 in [RFC4288].

Security considerations: See Section 6 Interoperability considerations:

None.

Published specification: RFC XXXX [This RFC].

Applications which use this media type:

Audio streaming and conferencing applications.

Additional information: none

Person and email address to contact for further information :

 

[Page 9]

April 2007

Alfred E. Heggestad: aeh@db.org Intended usage: COMMON Restrictions on usage:

This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time.

Author: Alfred E. Heggestad Change controller:

IETF Audio/Video Transport working group delegated from the IESG.

[Page 10]

April 2007

5. SDP usage of Speex

When conveying information by SDP [RFC4566], the encoding name MUST be set to "speex". An example of the media representation in SDP for offering a single channel of Speex at 8000 samples per second might be:

m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000

Note that the RTP payload type code of 97 is defined in this media definition to be 'mapped' to the speex codec at an 8kHz sampling frequency using the 'a=rtpmap' line. Any number from 96 to 127 could have been chosen (the allowed range for dynamic types).

The value of the sampling frequency is typically 8000 for narrow band operation, 16000 for wide band operation, and 32000 for ultra-wide band operation.

If for some reason the offerer has bandwidth limitations, the client may use the "b=" header, as explained in SDP [RFC4566]. The following example illustrates the case where the offerer cannot receive more than 10 kbit/s.

m=audio 8088 RTP/AVP 97 b=AS:10 a=rtmap:97 speex/8000

In this case, if the remote part agrees, it should configure its Speex encoder so that it does not use modes that produce more than 10 kbit/s. Note that the "b=" constraint also applies on all payload types that may be proposed in the media line ("m=").

An other way to make recommendations to the remote Speex encoder is to use its specific parameters via the a=fmtp: directive. The following parameters are defined for use in this way: ptime: duration of each packet in milliseconds. sr: actual sample rate in Hz.

ebw: encoding bandwidth - either 'narrow' or 'wide' or 'ultra' (corresponds to nominal 8000, 16000, and 32000 Hz sampling rates). [Page 11]

April 2007

vbr: variable bit rate - either 'on' 'off' or 'vad' (defaults to off). If on, variable bit rate is enabled. If off, disabled. If set to 'vad' then constant bit rate is used but silence will be encoded with special short frames to indicate a lack of voice for that period.

cng: comfort noise generation - either 'on' or 'off'. If off then silence frames will be silent; if 'on' then those frames will be filled with comfort noise.

mode: Speex encoding mode. Can be {1,2,3,4,5,6,any} defaults to 3 in narrowband, 6 in wide and ultra-wide.

Examples:

m=audio 8008 RTP/AVP 97 a=rtpmap:97 speex/8000 a=fmtp:97 mode=4

This examples illustrate an offerer that wishes to receive a Speex stream at 8000Hz, but only using speex mode 4.

Several Speex specific parameters can be given in a single a=fmtp line provided that they are separated by a semi-colon:

a=fmtp:97 mode=any;mode=1

The offerer may indicate that it wishes to send variable bit rate frames with comfort noise:

m=audio 8088 RTP/AVP 97 a=rtmap:97 speex/8000 a=fmtp:97 vbr=on;cng=on

The "ptime" attribute is used to denote the packetization interval (ie, how many milliseconds of audio is encoded in a single RTP packet). Since Speex uses 20 msec frames, ptime values of multiples of 20 denote multiple Speex frames per packet. Values of ptime which are not multiples of 20 MUST be ignored and clients MUST use the default value of 20 instead.

Implementations SHOULD support ptime of 20 msec (i.e. one frame per packet)

 

In the example below the ptime value is set to 40, indicating that

Herlein, et al.

Expires October 24, 2007

[Page 12]

Internet-Draft

Speex

April 2007

there are 2 frames in each packet.

m=audio 8008 RTP/AVP 97 a=rtpmap:97 speex/8000 a=ptime:40

Note that the ptime parameter applies to all payloads listed in the media line and is not used as part of an a=fmtp directive.

Values of ptime not multiple of 20 msec are meaningless, so the receiver of such ptime values MUST ignore them. If during the life of an RTP session the ptime value changes, when there are multiple Speex frames for example, the SDP value must also reflect the new value.

Care must be taken when setting the value of ptime so that the RTP packet size does not exceed the path MTU.

[Page 13]

April 2007

  1. Security Considerations

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and any appropriate RTP profile. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations.

A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to be overloaded. However, this encoding does not exhibit any significant non-uniformity.

As with any IP-based protocol, in some circumstances a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high.

[Page 14]

April 2007

  1. Acknowledgements

The authors would like to thank Equivalence Pty Ltd of Australia for their assistance in attempting to standardize the use of Speex in H.323 applications, and for implementing Speex in their open source

OpenH323 stack. The authors would also like to thank Brian C. Wiles <brian@streamcomm.com> of StreamComm for his assistance in developing the proposed standard for Speex use in H.323 applications.

The authors would also like to thank the following members of the Speex and AVT communities for their input: Ross Finlayson, Federico Montesino Pouzols, Henning Schulzrinne, Magnus Westerlund.

Thanks to former authors of this document; Simon Morlat, Roger

Hardiman, Phil Kerr

[Page 15]

April 2007

  1. References
    1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.

  1. Informative References

    [CELP]    "CELP, U.S. Federal Standard 1016.", National Technical

Information Service (NTIS) website http://www.ntis.gov/.

[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and

Registration Procedures", BCP 13, RFC 4288, December 2005.

[speexenc]

Valin, J., "Speexenc/speexdec, reference command-line encoder/decoder", Speex website http://www.speex.org/. [Page 16]

April 2007

Authors' Addresses

Greg Herlein

2034 Filbert Street

San Francisco, California 94123

United States

Email: gherlein@herlein.com

Jean-Marc Valin

University of Sherbrooke

Department of Electrical and Computer Engineering

University of Sherbrooke

2500 blvd Universite

Sherbrooke, Quebec J1K 2R1 Canada

Email: jean-marc.valin@usherbrooke.ca

Alfred E. Heggestad

Biskop J. Nilssonsgt. 20a

Oslo 0659

Norway

Email: aeh@db.org

[Page 17]

April 2007

Full Copyright Statement

Copyright (C) The Internet Society (2007).

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

This document and the information contained herein are provided on an

"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS

OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET

ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,

INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

Acknowledgment

Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).

 

[Page 18]

  1. D Speex License

Copyright 2002-2007 Xiph.org Foundation

Copyright 2002-2007 Jean-Marc Valin

Copyright 2005-2007 Analog Devices Inc.

Copyright 2005-2007 Commonwealth Scientific and Industrial Research

Organisation (CSIRO)

Copyright 1993, 2002, 2006 David Rowe

Copyright 2003 EpicGames

Copyright 1992-1994 Jutta Degener, Carsten Bormann

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Xiph.org Foundation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS

''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT

LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR

CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,

EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,

PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR

PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF

LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING

NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

  1. E GNU Free Documentation License

Version 1.1, March 2000

Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

  1. PREAMBLE

The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

  1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LATEX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machinegenerated HTML produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

 

  1. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproducedin all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies.

  1. COPYING IN QUANTITY

If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machinereadable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computernetwork location containing a complete Transparent copy of the Document, free of added material, which the general networkusing public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

  1. 4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  • A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
  • B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five).
  • C. State on the Title page the name of the publisher of the Modified Version, as the publisher.
  • D. Preserve all the copyright notices of the Document.
  • E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
  • F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
  • G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
  • H. Include an unaltered copy of this License.
  • I. Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
  • J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
  • K. In any section entitled "Acknowledgements"or "Dedications", preserve the section's title, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
  • L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
  • M. Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version.
  • N. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant Section.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section entitled "Endorsements", providedit contains nothing but endorsements of your Modified Version by various parties–forexample, statements of peer review or that the text has been approvedby an organizationas the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

  1. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination,youmust combineany sections entitled "History" in the various originaldocuments,formingonesection entitled "History"; likewise combine any sections entitled "Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements."

  1. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

  1. AGGREGATION WITH INDEPENDENT WORKS

A compilationof the Documentor its derivativeswith other separate and independentdocumentsor works, in or on a volumeof a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.

  1. TRANSLATION

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail.

  1. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.

  1. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.

 

  1. Index

acoustic echo cancellation, 20 algorithmic delay, 8 analysis-by-synthesis, 28 auto-correlation, 27 average bit-rate, 7, 17 bit-rate, 33, 35

CELP, 6, 26 complexity, 7, 8, 32, 33 constant bit-rate, 7

discontinuous transmission, 8, 17

DTMF, 7

echo cancellation, 20 error weighting, 28 fixed-point, 10 in-band signalling, 18

Levinson-Durbin, 27 libspeex, 6, 15 line spectral pair, 30 linear prediction, 26, 30 mean opinion score, 32 narrowband, 7, 8, 30

Ogg, 24 open-source, 8

patent, 8 perceptual enhancement, 8, 16, 32 pitch, 27 preprocessor, 19

quadrature mirror filter, 34 quality, 7

RTP, 24

sampling rate, 7 speexdec, 14 speexenc, 13 standards, 24 tail length, 20 ultra-wideband, 7 variable

  1. config.h文件說明

FLOATING_POINT宏:浮點執行。

FIXED_POINT宏:定點執行。

以上選擇其一。

 

USE_SMALLFT宏:傅里葉算法的一種。

USE_KISS_FFT宏:傅里葉算法的一種。

以上選擇其一。

 

_USE_SSE宏:使用SSE指令集。

_USE_SSE2宏:使用SSE2指令集。

 

HAVE_CONFIG_H宏:使用config.h頭文件。

EXPORT宏:

  1. Speex函數庫

    1. 函數模板(未完成)

函數名稱

xxx

頭文件

#include "speex/speex.h"

#include "speex/speex_preprocess.h"

#include "speex/speex_echo.h"

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

函數主要功能說明。

函數聲明

類型 函數名(

類型 參數1,

類型 參數2,

……

);

函數參數

參數1,[輸入|輸出|輸入&輸出]:

參數說明。

參數2,[輸入|輸出|輸入&輸出]:

參數說明。

……

返回值

返回值1:返回值說明。

返回值2:返回值說明。

……

錯誤碼

EXXXX:錯誤碼說明。

EXXXX:錯誤碼說明。

……

線程安全

是 或 否 或 未知,表示此函數多線程調用是否會產生影響

原子操做

是 或 否 或 未知,表示此函數是不是單一操做,不是多個步驟的組合

執行速度

每毫秒能執行多少次。

其餘說明

……

……

 

  1. Speex encoder Speex編碼器

    1. speex_encoder_init(未完成)

函數名稱

speex_encoder_init

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

根據Speex格式配置結構體建立並初始化一個Speex編碼器。

Speex編碼器用於將PCM格式音頻數據編碼成Speex格式音頻數據。

函數聲明

void * speex_encoder_init(

const SpeexMode * mode

);

函數參數

mode,[輸入]:

存放Speex格式配置結構體的內存指針,用於對PCM格式音頻數據編碼時使用,參考SpeexMode

本參數不能手動建立,必須使用Speex庫預約義好的,能夠爲(選一至一個):

speex_nb_mode靜態結構體變量:Narrow Band窄帶模式,音頻數據的採樣頻率爲8000Hz。

speex_wb_mode靜態結構體變量:Wide Band寬帶模式,音頻數據的採樣頻率爲16000Hz。

speex_uwb_mode靜態結構體變量:Ultra Wide Band超寬帶模式,音頻數據的採樣頻率爲32000Hz。

返回值

非NULL:成功,返回值就是Speex編碼器的句柄。

NULL:失敗,沒法查看錯誤碼,通常是內存不足。

錯誤碼

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

若是是對一條PCM格式音頻數據流編碼,那麼編碼從開始到結束都應該用一個Speex編碼器,中途不要更換Speex編碼器,也不要用一個Speex編碼器給多條音頻流編碼,不然會致使解碼後的音頻數據和編碼前的音頻數據相差較大。

當Speex編碼器再也不使用時,必須調用speex_encoder_destroy()函數銷燬Speex編碼器,不然會內存泄漏。

 

  1. speex_encoder_ctl(未完成)

函數名稱

speex_encoder_ctl

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

設置一個Speex編碼器的相關參數。

函數聲明

int speex_encoder_ctl(

void * state,

int request,

void * ptr

);

函數參數

state,[輸入]:

存放Speex編碼器的句柄。

request,[輸入]:

存放須要設置的參數,能夠爲(選一至一個):

SPEEX_SET_ENH宏(0x0000):設置是否使用Speex解碼器的知覺加強。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數僅對Speex解碼器有效。

SPEEX_GET_ENH宏(0x0001):獲取是否使用Speex解碼器的知覺加強。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數僅對Speex解碼器有效。

 

SPEEX_SET_FRAME_SIZE宏(未定義):設置Speex編解碼器在編解碼時,每一個PCM格式音頻數據幀的數據長度,單位個採樣數據。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲160,寬帶默認爲320,超寬帶默認爲640。本參數對Speex編解碼器均有效。本參數目前沒法使用,也就是沒法修改每幀的數據長度。

SPEEX_GET_FRAME_SIZE宏(0x0003):獲取Speex編解碼器在編解碼時,每一個PCM格式音頻數據幀的數據長度,單位個採樣數據。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲160,寬帶默認爲320,超寬帶默認爲640。本參數對Speex編解碼器均有效。

 

SPEEX_SET_QUALITY宏(0x0004):設置Speex編碼器在用固定採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲8。本參數僅對Speex編碼器有效。

SPEEX_GET_QUALITY宏(0x0005):獲取Speex編碼器在用固定採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲8。本參數僅對Speex編碼器有效。本標記目前沒法使用。

 

/** Set sub-mode to use */

#define SPEEX_SET_MODE 6

/** Get current sub-mode in use */

#define SPEEX_GET_MODE 7

 

/** Set low-band sub-mode to use (wideband only)*/

#define SPEEX_SET_LOW_MODE 8

/** Get current low-band mode in use (wideband only)*/

#define SPEEX_GET_LOW_MODE 9

 

/** Set high-band sub-mode to use (wideband only)*/

#define SPEEX_SET_HIGH_MODE 10

/** Get current high-band mode in use (wideband only)*/

#define SPEEX_GET_HIGH_MODE 11

 

SPEEX_SET_VBR宏(0x000C):設置是否使用Speex編碼器的動態比特率,使用後能夠增長壓縮率,且SPEEX_SET_QUALITY參數任然有效。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_VBR宏(0x000D):獲取是否使用Speex編碼器的動態比特率,使用後能夠增長壓縮率,且SPEEX_SET_QUALITY參數任然有效。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

SPEEX_SET_VBR_QUALITY宏(0x000E):設置Speex編碼器在用動態採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲float型變量的內存指針,取值區間爲[0.0,10.0],默認爲10.0。本參數僅對Speex編碼器有效。

SPEEX_GET_VBR_QUALITY宏(0x000F):獲取Speex編碼器在用動態採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲float型變量的內存指針,取值區間爲[0.0,10.0],默認爲10.0。本參數僅對Speex編碼器有效。

 

SPEEX_SET_COMPLEXITY宏(0x0010):設置Speex編碼器的複雜度,複雜度越高壓縮率不變、CPU使用率越高、音質越好。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲2。本參數僅對Speex編碼器有效。

SPEEX_GET_COMPLEXITY宏(0x0011):獲取Speex編碼器的複雜度,複雜度越高壓縮率不變、CPU使用率越高、音質越好。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲2。本參數僅對Speex編碼器有效。

 

/** Set bit-rate used by the encoder (or lower) */

#define SPEEX_SET_BITRATE 18

/** Get current bit-rate used by the encoder or decoder */

#define SPEEX_GET_BITRATE 19

 

/** Define a handler function for in-band Speex request*/

#define SPEEX_SET_HANDLER 20

 

/** Define a handler function for in-band user-defined request*/

#define SPEEX_SET_USER_HANDLER 22

 

SPEEX_SET_SAMPLING_RATE宏(0x0018):設置Speex編解碼器在比特率計算時,音頻的採樣頻率。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲8000,寬帶默認爲16000,超寬帶默認爲32000。本參數對Speex編解碼器均有效。

SPEEX_GET_SAMPLING_RATE宏(0x0019):獲取Speex編解碼器在比特率計算時,音頻的採樣頻率。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲8000,寬帶默認爲16000,超寬帶默認爲32000。本參數對Speex編解碼器均有效。

 

SPEEX_RESET_STATE宏(0x001A):重置Speex編解碼器的全部參數爲初始狀態。ptr參數無心義。本參數對Speex編解碼器均有效。

 

/** Get VBR info (mostly used internally) */

#define SPEEX_GET_RELATIVE_QUALITY 29

 

SPEEX_SET_VAD宏(0x001E):設置Speex編碼器在用動態採樣頻率編碼時,是否使用語音活動檢測,使用後能夠提高在無語音活動時動態比特率編碼的壓縮率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_VAD宏(0x001F):獲取Speex編碼器在用動態採樣頻率編碼時,是否使用語音活動檢測,使用後能夠提高在無語音活動時動態比特率編碼的壓縮率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

/** Set Average Bit-Rate (ABR) to n bits per seconds */

#define SPEEX_SET_ABR 32

/** Get Average Bit-Rate (ABR) setting (in bps) */

#define SPEEX_GET_ABR 33

 

SPEEX_SET_DTX宏(0x0022):設置是否使用Speex編碼器的不連續傳輸,使用後能夠下降網絡傳輸的比特率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_DTX宏(0x0023):獲取是否使用Speex編碼器的不連續傳輸,使用後能夠下降網絡傳輸的比特率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

/** Set submode encoding in each frame (1 for yes, 0 for no, setting to no breaks the standard) */

#define SPEEX_SET_SUBMODE_ENCODING 36

/** Get submode encoding in each frame */

#define SPEEX_GET_SUBMODE_ENCODING 37

 

/*#define SPEEX_SET_LOOKAHEAD 38*/

/** Returns the lookahead used by Speex separately for an encoder and a decoder.

* Sum encoder and decoder lookahead values to get the total codec lookahead. */

#define SPEEX_GET_LOOKAHEAD 39

 

SPEEX_SET_PLC_TUNING宏(0x0028):設置Speex編碼器的數據包丟失隱藏的預計丟失機率,預計丟失機率越高抗網絡抖動越強、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲2。本參數僅對Speex編碼器有效。

SPEEX_GET_PLC_TUNING宏(0x0029):獲取Speex編碼器的數據包丟失隱藏的預計丟失機率,預計丟失機率越高抗網絡抖動越強、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲2。本參數僅對Speex編碼器有效。

 

/** Sets the max bit-rate allowed in VBR mode */

#define SPEEX_SET_VBR_MAX_BITRATE 42

/** Gets the max bit-rate allowed in VBR mode */

#define SPEEX_GET_VBR_MAX_BITRATE 43

 

SPEEX_SET_HIGHPASS宏(0x002C):設置是否使用Speex編碼器的高通濾波器。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數對Speex編解碼器均有效。

SPEEX_GET_HIGHPASS宏(0x002D):獲取是否使用Speex編碼器的高通濾波器。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數對Speex編解碼器均有效。

 

/** Get "activity level" of the last decoded frame, i.e.

how much damage we cause if we remove the frame */

#define SPEEX_GET_ACTIVITY 47

ptr,[輸入&輸出]:

存放設置的參數,本參數是根據request參數來定義的。

返回值

0:成功。

-1:request參數不正確。

-2:無效的參數。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex編碼器句柄,但不能同時操做相同的Speex編碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

特別注意:Speex編碼器在動態比特率模式下,在極少數狀況下會出現讀寫無效內存錯誤,從而致使程序崩潰,改用固定比特率能夠繞過該問題。

 

  1. speex_encode(未完成)

函數名稱

speex_encode

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

用一個Speex編碼器將一個單聲道16位有符號浮點型20毫秒PCM格式音頻數據幀編碼成Speex格式。

函數聲明

int speex_encode(

void * state,

float * in,

SpeexBits * bits

);

函數參數

state,[輸入]:

存放Speex編碼器的句柄。

in,[輸入]:

存放一個單聲道16位有符號浮點型20毫秒PCM格式音頻數據幀數組的內存指針,音頻數據取值區間爲[-1.0,1.0]。

bits,[輸出]:

存放SpeexBits結構體變量的內存指針,該變量用於存放編碼後的Speex格式音頻數據幀。

返回值

1:本幀數據編碼完畢,若是已使用不連續傳輸,表示須要傳輸。

0:本幀數據編碼完畢,若是已使用不連續傳輸,表示不要傳輸。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex編碼器句柄,但不能同時操做相同的Speex編碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

調用本函數前,必須先調用speex_bits_reset()函數清空SpeexBits結構體。

調用本函數後,編碼後的Speex格式音頻數據幀存放在SpeexBits結構體中,還須要調用speex_bits_write()函數才能取出。

建議一個SpeexBits結構體中存放了一個Speex格式音頻數據幀後,就立刻取出,不要等到存放多個後再取出,不然取出後我不知道該如何解碼。

 

  1. speex_encode_int(未完成)

函數名稱

speex_encode_int

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

用一個Speex編碼器將一個單聲道16位有符號整型20毫秒PCM格式音頻數據幀編碼成Speex格式。

函數聲明

int speex_encode_int(

void * state,

spx_int16_t * in,

SpeexBits * bits

);

函數參數

state,[輸入]:

存放Speex編碼器的句柄。

in,[輸入]:

存放一個單聲道16位有符號整型20毫秒PCM格式音頻數據幀數組的內存指針,音頻數據取值區間爲[-32768,32767]。

bits,[輸出]:

存放SpeexBits結構體變量的內存指針,該變量用於存放編碼後的Speex格式音頻數據幀。

返回值

1:本幀數據編碼完畢,若是已使用不連續傳輸,表示須要傳輸。

0:本幀數據編碼完畢,若是已使用不連續傳輸,表示不要傳輸。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex編碼器句柄,但不能同時操做相同的Speex編碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

調用本函數前,必須先調用speex_bits_reset()函數清空SpeexBits結構體。

調用本函數後,編碼後的Speex格式音頻數據幀存放在SpeexBits結構體中,還須要調用speex_bits_write()函數才能取出。

建議一個SpeexBits結構體中存放了一個Speex格式音頻數據幀後,就立刻取出,不要等到存放多個後再取出,不然取出後我不知道該如何解碼。

編碼後Speex格式音頻數據幀的數據長度表:

採樣

頻率

編碼

比特率

質量

等級

複雜度

編碼前

字節數

編碼後

字節數

壓縮率

%

比特率

bps

8000

固定

0

0~10

320

6

1.875

2400

1

10

3.125

4000

2

15

4.6875

6000

3

20

6.25

8000

4

20

6.25

8000

5

28

8.75

11200

6

28

8.75

11200

7

38

11.875

15200

8

38

11.875

15200

9

46

14.375

18400

10

62

19.375

24800

動態

0

0~10

10,6,1

   

1

20,15,10,6,1

   

2

28,20,15,10,6,1

   

3

28,20,15,10,6,1

   

4

38,28,20,15,10,

6,1

   

5

38,28,20,15,10,

6,1

   

6

46,38,28,20,15,

10,6,1

   

7

46,38,28,20,15,

10,6,1

   

8

46,38,28,20,15,

10,6,1

   

9

62,46,38,28,20,

15,10,6,1

   

10

62,46,38,28,20,

15,10,6,1

   

16000

固定

0

0~10

640

10

1.5625

4000

1

15

2.34375

6000

2

20

3.125

8000

3

25

3.90625

10000

4

32

5

12800

5

42

6.5625

16800

6

52

8.125

20800

7

60

9.375

24000

8

70

10.9375

28000

9

86

13.4375

34400

10

100

15.625

40000

動態

0

0~10

25,20,15,10,2

   

1

25,20,15,10,2

   

2

32,25,20,15,10,

2

   

3

32,25,20,15,10,

2

   

4

52,42,32,25,20,

15,10,2

   

5

52,42,32,25,20,

15,10,2

   

6

60,52,42,32,25,

20,15,10,2

   

7

70,60,52,42,32,

25,20,15,10,2

   

8

86,70,60,52,42,

32,25,20,15,10,

2

   

9

100,86,70,60,

52,42,32,25,

20,15,10,2

   

10

100,86,70,60,

52,42,32,25,

20,15,10,2

   

32000

固定

0

0~10

1280

11

0.859375

4400

1

19

1.484375

7600

2

24

1.875

9600

3

29

2.265625

11600

4

37

2.890625

14800

5

47

3.671875

18800

6

56

4.375

22400

7

64

5

25600

8

74

5.78125

29600

9

90

7.03125

36000

10

100

7.8125

40000

動態

0

0~10

29,24,20,15,11,

2

   

1

37,29,24,20,15,

11,2

   

2

37,29,24,20,15,

11,2

   

3

56,47,37,29,24,

20,15,11,2

   

4

56,47,37,29,24,

20,15,11,2

   

5

64,56,47,37,29,

24,20,15,11,2

   

6

64,56,47,37,29,

24,20,15,11,2

   

7

90,74,64,56,47,

37,29,24,20,15,

11,2

   

8

90,74,64,56,47,

37,29,24,20,15,

11,2

   

9

100,90,74,64,

56,47,37,29,24,

20,15,11,2

   

10

100,90,74,64,

56,47,37,29,24,

20,15,11,2

   

 

 

  1. speex_encoder_destroy(未完成)

函數名稱

speex_encoder_destroy

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

銷燬一個Speex編碼器。

函數聲明

void speex_encoder_destroy(

void * state

);

函數參數

state,[輸入]:

存放Speex編碼器的句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex編碼器句柄,但不能同時操做相同的Speex編碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

Speex編碼器句柄在銷燬後就不能再使用了。

 

  1. Speex decoder Speex解碼器

    1. speex_decoder_init(未完成)

函數名稱

speex_decoder_init

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

根據Speex格式配置結構體建立並初始化一個Speex解碼器。

Speex解碼器用於將Speex格式音頻數據解碼成PCM格式音頻數據。

函數聲明

void * speex_decoder_init(

const SpeexMode * mode

);

函數參數

mode,[輸入]:

存放Speex格式配置結構體的內存指針,用於對Speex格式音頻數據解碼時使用,參考SpeexMode

本參數不能手動建立,必須使用Speex庫預約義好的,能夠爲(選一至一個):

speex_nb_mode靜態結構體變量:Narrow Band窄帶模式,音頻數據的採樣頻率爲8000Hz。

speex_wb_mode靜態結構體變量:Wide Band寬帶模式,音頻數據的採樣頻率爲16000Hz。

speex_uwb_mode靜態結構體變量:Ultra Wide Band超寬帶模式,音頻數據的採樣頻率爲32000Hz。

返回值

非NULL:成功,返回值就是Speex解碼器的句柄。

NULL:失敗,沒法查看錯誤碼,通常是內存不足。

錯誤碼

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

若是是對一條Speex格式音頻數據流解碼,那麼解碼從開始到結束都應該用一個Speex解碼器,中途不要更換Speex解碼器,也不要用一個Speex解碼器給多條音頻流解碼,不然會致使解碼後的音頻數據和編碼前的音頻數據相差較大。

當Speex解碼器再也不使用時,必須調用speex_decoder_destroy()函數銷燬Speex解碼器,不然會內存泄漏。

若是Speex解碼器的採樣頻率和Speex編碼器設置不同,解碼出來的音頻數據也是能夠正常使用的。例如將Speex編碼器設置爲8000Hz、Speex解碼器設置爲32000Hz,那麼解碼出來的音頻數據就會是32000Hz的,但音質和8000Hz是同樣的,有點相似於將8000Hz重採樣成32000Hz,反過來將Speex編碼器設置爲32000Hz、Speex解碼器設置爲8000Hz也是能夠的。

 

  1. speex_decoder_ctl(未完成)

函數名稱

speex_encoder_ctl

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

設置一個Speex解碼器的相關參數。

函數聲明

int speex_decoder_ctl(

void * state,

int request,

void * ptr

);

函數參數

state,[輸入]:

存放Speex解碼器的句柄。

request,[輸入]:

存放須要設置的參數,能夠爲(選一至一個):

SPEEX_SET_ENH宏(0x0000):設置是否使用Speex解碼器的知覺加強。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數僅對Speex解碼器有效。

SPEEX_GET_ENH宏(0x0001):獲取是否使用Speex解碼器的知覺加強。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數僅對Speex解碼器有效。

 

SPEEX_SET_FRAME_SIZE宏(未定義):設置Speex編解碼器在編解碼時,每一個PCM格式音頻數據幀的數據長度,單位個採樣數據。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲160,寬帶默認爲320,超寬帶默認爲640。本參數對Speex編解碼器均有效。本參數目前沒法使用,也就是沒法修改每幀的數據長度。

SPEEX_GET_FRAME_SIZE宏(0x0003):獲取Speex編解碼器在編解碼時,每一個PCM格式音頻數據幀的數據長度,單位個採樣數據。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲160,寬帶默認爲320,超寬帶默認爲640。本參數對Speex編解碼器均有效。

 

SPEEX_SET_QUALITY宏(0x0004):設置Speex編碼器在用固定採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲8。本參數僅對Speex編碼器有效。

SPEEX_GET_QUALITY宏(0x0005):獲取Speex編碼器在用固定採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲8。本參數僅對Speex編碼器有效。本標記目前沒法使用。

 

/** Set sub-mode to use */

#define SPEEX_SET_MODE 6

/** Get current sub-mode in use */

#define SPEEX_GET_MODE 7

 

/** Set low-band sub-mode to use (wideband only)*/

#define SPEEX_SET_LOW_MODE 8

/** Get current low-band mode in use (wideband only)*/

#define SPEEX_GET_LOW_MODE 9

 

/** Set high-band sub-mode to use (wideband only)*/

#define SPEEX_SET_HIGH_MODE 10

/** Get current high-band mode in use (wideband only)*/

#define SPEEX_GET_HIGH_MODE 11

 

SPEEX_SET_VBR宏(0x000C):設置是否使用Speex編碼器的動態比特率,使用後能夠增長壓縮率,且SPEEX_SET_QUALITY參數任然有效。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_VBR宏(0x000D):獲取是否使用Speex編碼器的動態比特率,使用後能夠增長壓縮率,且SPEEX_SET_QUALITY參數任然有效。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

SPEEX_SET_VBR_QUALITY宏(0x000E):設置Speex編碼器在用動態採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲float型變量的內存指針,取值區間爲[0.0,10.0],默認爲10.0。本參數僅對Speex編碼器有效。

SPEEX_GET_VBR_QUALITY宏(0x000F):獲取Speex編碼器在用動態採樣頻率編碼時,音頻的質量等級,質量等級越高音質越好、壓縮率越低。ptr參數爲float型變量的內存指針,取值區間爲[0.0,10.0],默認爲10.0。本參數僅對Speex編碼器有效。

 

SPEEX_SET_COMPLEXITY宏(0x0010):設置Speex編碼器的複雜度,複雜度越高壓縮率越高、CPU使用率越高、音質略好。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲2。本參數僅對Speex編碼器有效。

SPEEX_GET_COMPLEXITY宏(0x0011):獲取Speex編碼器的複雜度,複雜度越高壓縮率越高、CPU使用率越高、音質略好。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,10],默認爲2。本參數僅對Speex編碼器有效。

 

/** Set bit-rate used by the encoder (or lower) */

#define SPEEX_SET_BITRATE 18

/** Get current bit-rate used by the encoder or decoder */

#define SPEEX_GET_BITRATE 19

 

/** Define a handler function for in-band Speex request*/

#define SPEEX_SET_HANDLER 20

 

/** Define a handler function for in-band user-defined request*/

#define SPEEX_SET_USER_HANDLER 22

 

SPEEX_SET_SAMPLING_RATE宏(0x0018):設置Speex編解碼器在比特率計算時,音頻的採樣頻率。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲8000,寬帶默認爲16000,超寬帶默認爲32000。本參數對Speex編解碼器均有效。

SPEEX_GET_SAMPLING_RATE宏(0x0019):獲取Speex編解碼器在比特率計算時,音頻的採樣頻率。ptr參數爲spx_int32_t型變量的內存指針,窄帶默認爲8000,寬帶默認爲16000,超寬帶默認爲32000。本參數對Speex編解碼器均有效。

 

SPEEX_RESET_STATE宏(0x001A):重置Speex編解碼器的全部參數爲初始狀態。ptr參數無心義。本參數對Speex編解碼器均有效。

 

/** Get VBR info (mostly used internally) */

#define SPEEX_GET_RELATIVE_QUALITY 29

 

SPEEX_SET_VAD宏(0x001E):設置Speex編碼器在用動態採樣頻率編碼時,是否使用語音活動檢測,使用後能夠提高在無語音活動時動態比特率編碼的壓縮率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_VAD宏(0x001F):獲取Speex編碼器在用動態採樣頻率編碼時,是否使用語音活動檢測,使用後能夠提高在無語音活動時動態比特率編碼的壓縮率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

/** Set Average Bit-Rate (ABR) to n bits per seconds */

#define SPEEX_SET_ABR 32

/** Get Average Bit-Rate (ABR) setting (in bps) */

#define SPEEX_GET_ABR 33

 

SPEEX_SET_DTX宏(0x0022):設置是否使用Speex編碼器的不連續傳輸,使用後能夠下降網絡傳輸的比特率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

SPEEX_GET_DTX宏(0x0023):獲取是否使用Speex編碼器的不連續傳輸,使用後能夠下降網絡傳輸的比特率。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。本參數僅對Speex編碼器有效。

 

/** Set submode encoding in each frame (1 for yes, 0 for no, setting to no breaks the standard) */

#define SPEEX_SET_SUBMODE_ENCODING 36

/** Get submode encoding in each frame */

#define SPEEX_GET_SUBMODE_ENCODING 37

 

/*#define SPEEX_SET_LOOKAHEAD 38*/

/** Returns the lookahead used by Speex separately for an encoder and a decoder.

* Sum encoder and decoder lookahead values to get the total codec lookahead. */

#define SPEEX_GET_LOOKAHEAD 39

 

SPEEX_SET_PLC_TUNING宏(0x0028):設置Speex編碼器的數據包丟失隱藏的預計丟失機率,預計丟失機率越高抗網絡抖動越強、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲2。本參數僅對Speex編碼器有效。

SPEEX_GET_PLC_TUNING宏(0x0029):獲取Speex編碼器的數據包丟失隱藏的預計丟失機率,預計丟失機率越高抗網絡抖動越強、壓縮率越低。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲2。本參數僅對Speex編碼器有效。

 

/** Sets the max bit-rate allowed in VBR mode */

#define SPEEX_SET_VBR_MAX_BITRATE 42

/** Gets the max bit-rate allowed in VBR mode */

#define SPEEX_GET_VBR_MAX_BITRATE 43

 

SPEEX_SET_HIGHPASS宏(0x002C):設置是否使用Speex編碼器的高通濾波器。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數對Speex編解碼器均有效。

SPEEX_GET_HIGHPASS宏(0x002D):獲取是否使用Speex編碼器的高通濾波器。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。本參數對Speex編解碼器均有效。

 

/** Get "activity level" of the last decoded frame, i.e.

how much damage we cause if we remove the frame */

#define SPEEX_GET_ACTIVITY 47

ptr,[輸入&輸出]:

存放設置的參數,本參數是根據request參數來定義的。

返回值

0:成功。

-1:request參數不正確。

-2:無效的參數。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex解碼器句柄,但不能同時操做相同的Speex解碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_decode(未完成)

函數名稱

speex_decode

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

用一個Speex解碼器將一個單聲道16位有符號浮點型20毫秒Speex格式音頻數據幀解碼成PCM格式。

函數聲明

int speex_decode(

void * state,

SpeexBits * bits,

float * out

);

函數參數

state,[輸入]:

存放Speex解碼器的句柄。

bits,[輸入]:

存放SpeexBits結構體變量的內存指針,該變量已經存放了一個單聲道16位有符號浮點型20毫秒Speex格式音頻數據幀,音頻數據取值區間爲[-1.0,1.0]。

若是本參數爲NULL,則表示該音頻數據幀丟失了,Speex解碼器會使用包丟失隱藏算法進行猜想。

out,[輸出]:

存放解碼後PCM格式音頻數據幀數組的內存指針。

返回值

0:成功。

-1:音頻流已經結束。

-2:Speex格式音頻數據幀不正確。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex解碼器句柄,但不能同時操做相同的Speex解碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

調用本函數前,必須先調用speex_bits_reset()函數清空SpeexBits結構體,而後再調用speex_bits_read_from()函數將Speex格式音頻數據存放到SpeexBits結構體中,最後再調用本函數進行解碼。

解碼後的PCM格式音頻數據幀和編碼前的PCM格式音頻數據幀會有很大區別,只有靠聽才能判斷音頻數據是否正確,是否符合要求。

 

  1. speex_decode_int(未完成)

函數名稱

speex_decode_int

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

用一個Speex解碼器對一個單聲道16位有符號整型20毫秒Speex格式音頻數據幀解碼成PCM格式。

函數聲明

int speex_decode_int(

void * state,

SpeexBits * bits,

spx_int16_t * out

);

函數參數

state,[輸入]:

存放Speex解碼器的句柄。

bits,[輸入]:

存放SpeexBits結構體變量的內存指針,該變量已經存放了一個單聲道16位有符號整型20毫秒Speex格式音頻數據幀,音頻數據取值範圍最高爲+32767,最低爲-32768。

若是本參數爲NULL,則表示該音頻數據幀丟失了,Speex解碼器會使用包丟失隱藏算法進行猜想。

out,[輸出]:

存放解碼後PCM格式音頻數據幀數組的內存指針。

返回值

0:成功。

-1:音頻流已經結束。

-2:Speex格式音頻數據幀不正確。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex解碼器句柄,但不能同時操做相同的Speex解碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

調用本函數前,必須先調用speex_bits_reset()函數清空SpeexBits結構體,而後再調用speex_bits_read_from()函數將Speex格式音頻數據存放到SpeexBits結構體中,最後再調用本函數進行解碼。

解碼後的PCM格式音頻數據幀和編碼前的PCM格式音頻數據幀會有很大區別,只有靠聽才能判斷音頻數據是否正確,是否符合要求。

 

  1. speex_decoder_destroy(未完成)

函數名稱

speex_decoder_destroy

頭文件

#include "speex/speex.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

銷燬一個Speex解碼器。

函數聲明

void speex_decoder_destroy(

void * state

);

函數參數

state,[輸入]:

存放Speex解碼器的句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex解碼器句柄,但不能同時操做相同的Speex解碼器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

Speex解碼器句柄在銷燬後就不能再使用了。

 

  1. Speex bits Speex格式數據流

    1. speex_bits_advance

void speex_bits_advance    (    SpeexBits *     bits,

int     n    

)            

Advances the position of the "bit cursor" in the stream

 

Parameters:

bits     Bit-stream to operate on

n     Number of bits to advance

  1. speex_bits_destroy

void speex_bits_destroy    (    SpeexBits *     bits     )     

Frees all resources associated to a SpeexBits struct. Right now this does nothing since no resources are allocated, but this could change in the future.

  1. speex_bits_init

void speex_bits_init    (    SpeexBits *     bits     )     

Initializes and allocates resources for a SpeexBits struct

  1. speex_bits_init_buffer

void speex_bits_init_buffer    (    SpeexBits *     bits,

void *     buff,

int     buf_size    

)            

Initializes SpeexBits struct using a pre-allocated buffer

  1. speex_bits_insert_terminator

void speex_bits_insert_terminator    (    SpeexBits *     bits     )     

Insert a terminator so that the data can be sent as a packet while auto-detecting the number of frames in each packet

 

Parameters:

bits     Bit-stream to operate on

  1. speex_bits_nbytes

int speex_bits_nbytes    (    SpeexBits *     bits     )     

Returns the number of bytes in the bit-stream, including the last one even if it is not "full"

 

Parameters:

bits     Bit-stream to operate on

Returns:

Number of bytes in the stream

  1. speex_bits_pack

void speex_bits_pack    (    SpeexBits *     bits,

int     data,

int     nbBits    

)            

Append bits to the bit-stream

 

Parameters:

bits     Bit-stream to operate on

data     Value to append as integer

nbBits     number of bits to consider in "data"

  1. speex_bits_peek

int speex_bits_peek    (    SpeexBits *     bits     )     

Get the value of the next bit in the stream, without modifying the "cursor" position

 

Parameters:

bits     Bit-stream to operate on

Returns:

Value of the bit peeked (one bit only)

  1. speex_bits_peek_unsigned

unsigned int speex_bits_peek_unsigned    (    SpeexBits *     bits,

int     nbBits    

)            

Same as speex_bits_unpack_unsigned, but without modifying the cursor position

 

Parameters:

bits     Bit-stream to operate on

nbBits     Number of bits to look for

Returns:

Value of the bits peeked, interpreted as unsigned

  1. speex_bits_read_from

void speex_bits_read_from    (    SpeexBits *     bits,

char *     bytes,

int     len    

)            

Initializes the bit-stream from the data in an area of memory

  1. speex_bits_read_whole_bytes

void speex_bits_read_whole_bytes    (    SpeexBits *     bits,

char *     bytes,

int     len    

)            

Append bytes to the bit-stream

 

Parameters:

bits     Bit-stream to operate on

bytes     pointer to the bytes what will be appended

len     Number of bytes of append

  1. speex_bits_remaining

int speex_bits_remaining    (    SpeexBits *     bits     )     

Returns the number of bits remaining to be read in a stream

 

Parameters:

bits     Bit-stream to operate on

Returns:

Number of bits that can still be read from the stream

  1. speex_bits_reset

void speex_bits_reset    (    SpeexBits *     bits     )     

Resets bits to initial value (just after initialization, erasing content)

  1. speex_bits_rewind

void speex_bits_rewind    (    SpeexBits *     bits     )     

Rewind the bit-stream to the beginning (ready for read) without erasing the content

  1. speex_bits_set_bit_buffer

void speex_bits_set_bit_buffer    (    SpeexBits *     bits,

void *     buff,

int     buf_size    

)            

Sets the bits in a SpeexBits struct to use data from an existing buffer (for decoding without copying data)

  1. speex_bits_unpack_signed

int speex_bits_unpack_signed    (    SpeexBits *     bits,

int     nbBits    

)            

Interpret the next bits in the bit-stream as a signed integer

 

Parameters:

bits     Bit-stream to operate on

nbBits     Number of bits to interpret

Returns:

A signed integer represented by the bits read

  1. speex_bits_unpack_unsigned

unsigned int speex_bits_unpack_unsigned    (    SpeexBits *     bits,

int     nbBits    

)            

Interpret the next bits in the bit-stream as an unsigned integer

 

Parameters:

bits     Bit-stream to operate on

nbBits     Number of bits to interpret

Returns:

An unsigned integer represented by the bits read

  1. speex_bits_write

int speex_bits_write    (    SpeexBits *     bits,

char *     bytes,

int     max_len    

)            

Write the content of a bit-stream to an area of memory

 

Parameters:

bits     Bit-stream to operate on

bytes     Memory location where to write the bits

max_len     Maximum number of bytes to write (i.e. size of the "bytes" buffer)

Returns:

Number of bytes written to the "bytes" buffer

  1. speex_bits_write_whole_bytes

int speex_bits_write_whole_bytes    (    SpeexBits *     bits,

char *     bytes,

int     max_len    

)            

Like speex_bits_write, but writes only the complete bytes in the stream. Also removes the written bytes from the stream

  1. Speex echo Speex聲學回音消除器

    1. speex_echo_state_init(未完成)

函數名稱

speex_echo_state_init

頭文件

#include "speex/speex_echo.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

建立並初始化一個Speex聲學回音消除器。

Speex聲學回音消除器用於對16位有符號整型單聲道PCM格式音頻數據進行迴音消除。

函數聲明

SpeexEchoState * speex_echo_state_init(

int frame_size,

int filter_length

);

函數參數

frame_size,[輸入]:

存放一個16位有符號整型單聲道PCM格式音頻數據幀的數據長度,單位個採樣數據,通常爲10毫秒到20毫秒。

例如:8000Hz採樣頻率20毫秒本參數就是160。

filter_length,[輸入]:

存放聲學回音消除器的過濾器長度,單位個採樣數據,通常爲100毫秒到500毫秒,推薦爲500毫秒。

若是採樣頻率是8000Hz,選擇300毫秒,本參數就是8000÷1000×300。

本參數具體大小要慢慢調,調的好很差直接影響聲學回音消除的效果。

返回值

Speex聲學回音消除器句柄。

錯誤碼

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

對一條音頻流進行聲學回音消除時,從開始到結束都應該用同一個Speex聲學回音消除器,中途不要更換,也不要用一個Speex聲學回音消除器給多條音頻流進行聲學回音消除,不然會致使處理後的音頻數據不正確。

當Speex聲學回音消除器再也不使用時,必須調用speex_echo_state_destroy()函數銷燬,不然會內存泄漏。

 

  1. speex_echo_ctl(未完成)

函數名稱

speex_echo_ctl

頭文件

#include "speex/speex_echo.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

設置一個Speex聲學回音消除器的相關參數。

函數聲明

int speex_echo_ctl (

SpeexEchoState * st,

int request,

void * ptr

);

函數參數

st,[輸入]:

存放Speex聲學回音消除器句柄。

request,[輸入]:

存放須要控制的參數,能夠爲(選一至一個):

SPEEX_ECHO_GET_FRAME_SIZE宏(0x0003):獲取Speex聲學回音消除在聲學回音消除時,每一個PCM格式音頻數據幀的數據長度,單位個採樣數據。ptr參數爲int型變量的內存指針。

 

SPEEX_ECHO_SET_SAMPLING_RATE宏(0x0018):設置Speex聲學回音消除器處理一幀音頻數據時的採樣頻率,單位赫茲。ptr參數爲int型變量的內存指針,默認爲8000。

SPEEX_ECHO_GET_SAMPLING_RATE宏(0x0019):獲取Speex聲學回音消除器處理一幀音頻數據時的採樣頻率,單位赫茲。ptr參數爲int型變量的內存指針。

 

/* Can't set window sizes */

/** Get size of impulse response (int32) */

#define SPEEX_ECHO_GET_IMPULSE_RESPONSE_SIZE 27

 

/* Can't set window content */

/** Get impulse response (int32[]) */

#define SPEEX_ECHO_GET_IMPULSE_RESPONSE 29

ptr,[輸入&輸出]:

存放控制參數。

本參數是根據request參數來定義的。

返回值

0:成功。

-1:request參數沒法識別。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex聲學回音消除器句柄,但不能同時操做相同的Speex聲學回音消除器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_echo_cancel(廢棄的)

本函數已經廢棄,使用無效。

  1. speex_echo_cancellation(未完成)

函數名稱

speex_echo_cancellation

頭文件

#include "speex/speex_echo.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

使用Speex聲學回音消除器對一個單聲道16位有符號整型PCM格式音頻數據幀進行聲學回音消除。

函數聲明

void speex_echo_cancellation(

SpeexEchoState * st,

const spx_int16_t * rec,

const spx_int16_t * play,

spx_int16_t * out

);

函數參數

st,[輸入]:

存放Speex聲學回音消除器句柄。

rec,[輸入]:

存放由音頻輸入設備錄音的一幀16位有符號整型單聲道PCM格式音頻數據的內存指針,不能爲NULL。

play,[輸入]:

存放由音頻輸出設備播放的一幀16位有符號整型單聲道PCM格式音頻數據的內存指針,不能爲NULL。

out,[輸出]:

存放通過聲學回音消除後的一幀16位有符號整型單聲道PCM格式音頻數據的內存指針,不能爲NULL。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex聲學回音消除器句柄,但不能同時操做相同的Speex聲學回音消除器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

聲學回音消除的原理就是將音頻輸入數據中所採集到的音頻輸出數據清除掉,因此要求音頻輸入數據和音頻輸出數據必須是同步的,且音頻輸入數據中所採集到的迴音數據確定在實際的音頻輸出數據以後出現。目前尚未找到不一樣步就能夠作迴音消除的方法。

在聲學回音消除後,最好再使用Speex預處理器對音頻輸入數據進行噪音抑制、混響消除、自動增益控制、殘餘迴音消除等操做,不要在聲學回音消除前作預處理操做,不然聲學回音消除效果將下降。

若是感受迴音消除效果很差,就把rec、play、out這些參數打印日誌出來看看,而後調整speex_echo_state_init()函數的filter_length參數,再測試。

 

已知問題:

當錄音裏的迴音的音量大於播放的音量,則本函數認爲這個不是迴音,就不會消除掉。這種狀況主要在開了麥克風增益、或者喇叭離麥克風特別近時纔會產生。

當錄音裏的迴音和播放的聲音區別較大時,則本函數認爲這個不是迴音,就不會消除掉。這種狀況主要是麥克風或音響的音質很差形成的,大多出如今臺式機,筆記本通常不會。

音頻流的剛開始幾秒鐘內產生的聲學回音可能消除不掉,由於聲學回音消除算法有收斂時間。

 

  1. speex_echo_state_reset(未完成)

函數名稱

speex_echo_state_reset

頭文件

#include "speex/speex_echo.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

將一個Speex聲學回音消除器重置爲初始狀態。

函數聲明

void speex_echo_state_reset    (

SpeexEchoState * st

);

函數參數

st,[輸入]:

存放Speex聲學回音消除器句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex聲學回音消除器句柄,但不能同時操做相同的Speex聲學回音消除器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_echo_state_destroy(未完成)

函數名稱

speex_echo_state_destroy

頭文件

#include "speex/speex_echo.h"

庫文件

#pragma comment( lib, "libspeex.lib" )

函數功能

銷燬一個Speex聲學回音消除器。

函數聲明

void speex_echo_state_destroy(

SpeexEchoState * st

);

函數參數

st,[輸入]:

存放Speex聲學回音消除器句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex聲學回音消除器句柄,但不能同時操做相同的Speex聲學回音消除器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

Speex聲學回音消除器句柄在銷燬後就不能再使用了。

 

  1. Speex preprocess Speex預處理器

    1. speex_preprocess_state_init(未完成)

函數名稱

speex_preprocess_state_init

頭文件

#include "speex/speex_preprocess.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

建立並初始化Speex預處理器。

Speex預處理器用於對PCM格式音頻數據進行預處理。

函數聲明

SpeexPreprocessState * speex_preprocess_state_init (

int frame_size,

int sampling_rate

);

函數參數

frame_size,[輸入]:

存放音頻數據一幀的大小,單位多少個採樣單元。

sampling_rate,[輸入]:

存放音頻數據的採樣頻率,單位赫茲。

返回值

Speex預處理器句柄。

錯誤碼

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

若是是對一條音頻流預處理,那麼預處理從開始到結束都應該用一個Speex預處理器,中途不要更換Speex預處理器,也不要用一個Speex預處理器給多條音頻流預處理,不然會致使預處理後的音頻數據不正確。

當Speex預處理器再也不使用時,必須調用speex_preprocess_state_destroy()函數銷燬預處理器,不然會內存泄漏。

 

  1. speex_preprocess_ctl(未完成)

函數名稱

speex_preprocess_ctl

頭文件

#include "speex/speex_preprocess.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

設置一個Speex預處理器的相關參數。

函數聲明

int speex_preprocess_ctl (

SpeexPreprocessState * st,

int request,

void * ptr

);

函數參數

st,[輸入]:

存放Speex預處理器句柄。

request,[輸入]:

存放須要控制的參數,能夠爲(選一至一個):

SPEEX_PREPROCESS_SET_DENOISE宏(0x0000):設置是否使用Speex預處理器的噪音抑制。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲1。

SPEEX_PREPROCESS_GET_DENOISE宏(0x0001):獲取是否使用Speex預處理器的噪音抑制,ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用。

 

SPEEX_PREPROCESS_SET_AGC宏(0x0002):設置是否使用Speex預處理器的自動增益控制。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。

SPEEX_PREPROCESS_GET_AGC宏(0x0003):獲取是否使用Speex預處理器的自動增益控制。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用。

 

SPEEX_PREPROCESS_SET_VAD宏(0x0004):設置是否使用Speex預處理器的語音活動檢測。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。

SPEEX_PREPROCESS_GET_VAD宏(0x0005):獲取是否使用Speex預處理器的語音活動檢測。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用。

 

SPEEX_PREPROCESS_SET_AGC_LEVEL宏(0x0006):設置Speex預處理器在自動增益控制時,增益的目標等級,目標等級越大增益越大。ptr參數爲float型變量的內存指針,取值區間爲[1.0,32768.0],默認爲8000.0。本參數與SPEEX_PREPROCESS_SET_AGC_TARGET宏參數意義相同,區別在於ptr參數的類型。

SPEEX_PREPROCESS_GET_AGC_LEVEL宏(0x0007):獲取Speex預處理器在自動增益控制時,增益的目標等級,目標等級越大增益越大。ptr參數爲float型變量的內存指針,取值區間爲[1.0,32768.0]。本參數與SPEEX_PREPROCESS_GET_AGC_TARGET宏參數意義相同,區別在於ptr參數的類型。

 

SPEEX_PREPROCESS_SET_DEREVERB宏(0x0008):設置是否使用Speex預處理器的混響音消除。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用,默認爲0。

SPEEX_PREPROCESS_GET_DEREVERB宏(0x0009):設置是否使用Speex預處理器的混響音消除。ptr參數爲spx_int32_t型變量的內存指針,非0表示要使用,0表示不使用。

 

SPEEX_PREPROCESS_SET_DEREVERB_LEVEL宏(0x000A):設置Speex預處理器在混響音消除時,消除的等級。ptr參數爲float型變量的內存指針,本參數目前沒法使用。

SPEEX_PREPROCESS_GET_DEREVERB_LEVEL宏(0x000B):獲取Speex預處理器在混響音消除時,消除的等級。ptr參數爲float型變量的內存指針,本參數目前沒法使用。

 

SPEEX_PREPROCESS_SET_DEREVERB_DECAY宏(0x000C):設置Speex預處理器在混響音消除時,衰減的分貝值。ptr參數爲float型變量的內存指針,本參數目前沒法使用。

SPEEX_PREPROCESS_GET_DEREVERB_DECAY宏(0x000C):獲取Speex預處理器在混響音消除時,衰減的分貝值。ptr參數爲float型變量的內存指針,本參數目前沒法使用。

 

SPEEX_PREPROCESS_SET_PROB_START宏(0x000E):設置Speex預處理器在語音活動檢測時,從無語音活動到有語音活動的判斷百分比機率,機率越大越難判斷爲有語音活動。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲34。

SPEEX_PREPROCESS_GET_PROB_START宏(0x000F):獲取Speex預處理器在語音活動檢測時,從無語音活動到有語音活動的判斷百分比機率,機率越大越難判斷爲有語音活動。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100]。

 

SPEEX_PREPROCESS_SET_PROB_CONTINUE宏(0x0010):設置Speex預處理器在語音活動檢測時,從有語音活動到無語音活動的判斷百分比機率,機率越大越容易判斷爲無語音活動。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100],默認爲20。

SPEEX_PREPROCESS_GET_PROB_CONTINUE宏(0x0011):獲取Speex預處理器在語音活動檢測時,從有語音活動到無語音活動的判斷百分比機率,機率越大越容易判斷爲無語音活動。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,100]。

 

SPEEX_PREPROCESS_SET_NOISE_SUPPRESS宏(0x0012):設置Speex預處理器在噪音抑制時,噪音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0],默認爲-15。

SPEEX_PREPROCESS_GET_NOISE_SUPPRESS宏(0x0013):獲取Speex預處理器在噪音抑制時,噪音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0]。

 

SPEEX_PREPROCESS_SET_ECHO_SUPPRESS宏(0x0014):設置Speex預處理器在殘餘迴音消除時,殘餘迴音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0],默認爲-40。

SPEEX_PREPROCESS_GET_ECHO_SUPPRESS宏(0x0015):獲取Speex預處理器在殘餘迴音消除時,殘餘迴音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0]。

 

SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE宏(0x0016):設置Speex預處理器在殘餘迴音消除時,有近端語音活動時的殘餘迴音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0],默認爲-15。

SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE宏(0x0017):獲取Speex預處理器在殘餘迴音消除時,有近端語音活動時的殘餘迴音最大衰減的分貝值,分貝值越小衰減越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0]。

 

SPEEX_PREPROCESS_SET_ECHO_STATE宏(0x0018):設置是否使用Speex預處理器的殘餘迴音消除,使用後應在進行Speex迴音消除後再進行Speex預處理。ptr參數爲SpeexEchoState *類型的Speex聲學回音消除器句柄,ptr參數爲NULL表示關閉殘餘迴音消除。

SPEEX_PREPROCESS_GET_ECHO_STATE宏(0x0019):獲取是否使用Speex預處理器的殘餘迴音消除。ptr參數爲SpeexEchoState *類型的Speex聲學回音消除器句柄,ptr參數爲NULL表示關閉殘餘迴音消除。

 

SPEEX_PREPROCESS_SET_AGC_INCREMENT宏(0x001A):設置Speex預處理器在自動增益控制時,每秒最大增益的分貝值,分貝值越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,2147483647],默認爲12。

SPEEX_PREPROCESS_GET_AGC_INCREMENT宏(0x001B):獲取Speex預處理器在自動增益控制時,每秒最大增益的分貝值,分貝值越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,2147483647]。

 

SPEEX_PREPROCESS_SET_AGC_DECREMENT宏(0x001C):設置Speex預處理器在自動增益控制時,每秒最大減益的分貝值,分貝值越小減益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0],默認爲-40。

SPEEX_PREPROCESS_GET_AGC_DECREMENT宏(0x001D):獲取Speex預處理器在自動增益控制時,每秒最大減益的分貝值,分貝值越小減益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[-2147483648,0]。

 

SPEEX_PREPROCESS_SET_AGC_MAX_GAIN宏(0x001E):設置Speex預處理器在自動增益控制時,最大增益的分貝值,分貝值越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,2147483647],默認爲30。

SPEEX_PREPROCESS_GET_AGC_MAX_GAIN宏(0x001F):獲取Speex預處理器在自動增益控制時,最大增益的分貝值,分貝值越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[0,2147483647]。

 

/* Can't set loudness */

/** Get loudness */

#define SPEEX_PREPROCESS_GET_AGC_LOUDNESS 33

 

/* Can't set gain */

/** Get current gain (int32 percent) */

#define SPEEX_PREPROCESS_GET_AGC_GAIN 35

 

/* Can't set spectrum size */

/** Get spectrum size for power spectrum (int32) */

#define SPEEX_PREPROCESS_GET_PSD_SIZE 37

 

/* Can't set power spectrum */

/** Get power spectrum (int32[] of squared values) */

#define SPEEX_PREPROCESS_GET_PSD 39

 

/* Can't set noise size */

/** Get spectrum size for noise estimate (int32) */

#define SPEEX_PREPROCESS_GET_NOISE_PSD_SIZE 41

 

/* Can't set noise estimate */

/** Get noise estimate (int32[] of squared values) */

#define SPEEX_PREPROCESS_GET_NOISE_PSD 43

 

/* Can't set speech probability */

/** Get speech probability in last frame (int32). */

#define SPEEX_PREPROCESS_GET_PROB 45

 

SPEEX_PREPROCESS_SET_AGC_TARGET宏(0x002E):設置Speex預處理器在自動增益控制時,增益的目標等級,目標等級越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[1,32768],默認爲8000。本參數與SPEEX_PREPROCESS_SET_AGC_LEVEL宏參數意義相同,區別在於ptr參數的類型。

SPEEX_PREPROCESS_GET_AGC_TARGET宏(0x002F):獲取Speex預處理器在自動增益控制時,增益的目標等級,目標等級越大增益越大。ptr參數爲spx_int32_t型變量的內存指針,取值區間爲[1,32768]。本參數與SPEEX_PREPROCESS_SET_AGC_LEVEL宏參數意義相同,區別在於ptr參數的類型。

ptr,[輸入&輸出]:

存放設置參數。

本參數是根據request參數來定義的。

返回值

0:成功。

非0:失敗,request參數無效。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex預處理器句柄,但不能同時操做相同的Speex預處理器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_preprocess(廢棄的)

本函數已經廢棄,使用無效。

  1. speex_preprocess_run(未完成)

函數名稱

speex_preprocess_run

頭文件

#include "speex/speex_preprocess.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

用一個Speex預處理器對一個單聲道16位有符號整型PCM格式音頻數據幀進行預處理。

函數聲明

int speex_preprocess_run (

SpeexPreprocessState * st,

spx_int16_t * x

);

函數參數

st,[輸入]:

存放Speex預處理器的內存指針。

x,[輸入]:

存放一個單聲道16位有符號整型PCM格式音頻數據幀數組的內存指針。

返回值

1:預處理成功,若是已使用語音活動檢測,表示本音頻數據幀爲有語音活動。

0:預處理成功,若是已使用語音活動檢測,表示本音頻數據幀爲無語音活動。

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex預處理器句柄,但不能同時操做相同的Speex預處理器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_preprocess_estimate_update(未完成)

函數名稱

speex_preprocess_estimate_update

頭文件

#include "speex/speex_preprocess.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

用Speex預處理器對一幀16位有符號整型單聲道PCM格式音頻數據進行預處理,但只對Speex預處理器進行內部狀態更新,不修改音頻數據。

函數聲明

void speex_preprocess_estimate_update (

SpeexPreprocessState * st,

spx_int16_t * x

);

函數參數

st,[輸入]:

存放Speex預處理器句柄。

x,[輸入]:

存放一幀16位有符號整型單聲道PCM格式音頻數據的內存指針。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex預處理器句柄,但不能同時操做相同的Speex預處理器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_preprocess_state_destroy(未完成)

函數名稱

speex_preprocess_state_destroy

頭文件

#include "speex/speex_preprocess.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

銷燬一個Speex預處理器。

函數聲明

void speex_preprocess_state_destroy (

SpeexPreprocessState * st

);

函數參數

st,[輸入]:

存放Speex預處理器句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex預處理器句柄,但不能同時操做相同的Speex預處理器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

Speex預處理器句柄在銷燬後就不能再使用了。

 

  1. Speex resampler Speex重採樣器

    1. speex_resampler_init(未完成)

函數名稱

speex_resampler_init

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

建立並初始化一個具備整數的輸入和輸出採樣頻率的Speex重採樣器。

函數聲明

SpeexResamplerState * speex_resampler_init(

spx_uint32_t nb_channels,

spx_uint32_t in_rate,

spx_uint32_t out_rate,

int quality,

int * err

);

函數參數

nb_channels,[輸入]:

存放要處理的聲道數。

in_rate,[輸入]:

存放整數的輸入採樣頻率,單位赫茲。

out_rate,[輸入]:

存放整數的輸出採樣頻率,單位赫茲。

quality,[輸入]:

存放重採樣後音頻的質量等級,取值區間爲[0,10],該值越大音質越好。

err,[輸出]:

存放用於存放本函數錯誤碼的變量的內存指針,能夠爲NULL。

返回值

非NULL:成功,返回值就是Speex重採樣器的句柄。

NULL:失敗,經過err參數查看錯誤碼。

錯誤碼

RESAMPLER_ERR_SUCCESS枚舉(0x0000):。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):。

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_resampler_init_frac(未完成)

函數名稱

speex_resampler_init_frac

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

建立並初始化一個具備小數輸入和輸出採樣頻率的重採樣器。採樣頻率比是任意有理數,分子和分母都是32位整數。

函數聲明

SpeexResamplerState * speex_resampler_init_frac(

spx_uint32_t nb_channels,

spx_uint32_t ratio_num,

spx_uint32_t ratio_den,

spx_uint32_t in_rate,

spx_uint32_t out_rate,

int quality,

int * err

);

函數參數

nb_channels,[輸入]:

存放要處理的聲道數。

ratio_num,[輸入]:

存放採樣頻率比的32位整數分子。

ratio_den,[輸入]:

存放採樣頻率比的32位整數分母。

in_rate,[輸入]:

存放輸入採樣頻率四捨五入到最接近的整數,單位赫茲。

out_rate,[輸入]:

存放輸出採樣頻率四捨五入到最接近的整數,單位赫茲。

quality,[輸入]:

存放重採樣後的質量等級,取值區間爲[0,10],該值越大音質越好。

err,[輸出]:

存放用於存放本函數錯誤碼的變量的內存指針,能夠爲NULL。

返回值

非NULL:成功,返回值就是Speex重採樣器的句柄。

NULL:失敗,經過err參數查看錯誤碼。

錯誤碼

RESAMPLER_ERR_SUCCESS枚舉(0x0000):成功。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):。

線程安全

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_resampler_process_float(未完成)

函數名稱

speex_resampler_process_float

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

用一個Speex重採樣器對一個單或多聲道32位有符號浮點型PCM格式音頻數據幀進行重採樣。

函數聲明

int speex_resampler_process_float(

SpeexResamplerState * st,

spx_uint32_t channel_index,

const float * in,

spx_uint32_t * in_len,

float * out,

spx_uint32_t * out_len

);

函數參數

st,[輸入]:

存放Speex重採樣器的句柄。

channel_index,[輸入]:

存放聲道的索引號,從0開始。

若是是單聲道,本參數就填0。

in,[輸入]:

存放重採樣前的一個單或多聲道32位有符號浮點型PCM格式音頻數據幀數組的內存指針。

in_len,[輸入&輸出]:

輸入時,存放用於存放重採樣前的音頻數據幀數組長度的變量的內存指針,單位個採樣數據。

輸出時,存放已處理多少個重採樣前的音頻採樣數據。

out,[輸出]:

存放重採樣後的一個單或多聲道32位有符號浮點型PCM格式音頻數據幀數組的內存指針。

out_len,[輸入&輸出]:

輸入時,存放用於存放重採樣後的音頻數據幀數組長度的變量的內存指針,單位個採樣數據。

輸出時,存放已寫入多少個重採樣後的音頻採樣數據。

返回值

RESAMPLER_ERR_SUCCESS枚舉(0x0000):成功。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):失敗,內存分配失敗。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):失敗,錯誤的Speex重採樣器句柄。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):失敗,錯誤的參數。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):失敗。

錯誤碼

返回值就是錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex重採樣器句柄,但不能同時操做相同的Speex重採樣器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

重採樣前的和重採樣後的音頻數據幀數組的內存不能重疊。

 

  1. speex_resampler_process_int(未完成)

函數名稱

speex_resampler_process_int

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

用一個Speex重採樣器對一個單或多聲道16位有符號整型PCM格式音頻數據幀進行重採樣。

函數聲明

int speex_resampler_process_int(

SpeexResamplerState * st,

spx_uint32_t channel_index,

const spx_int16_t * in,

spx_uint32_t * in_len,

spx_int16_t * out,

spx_uint32_t * out_len

);

函數參數

st,[輸入]:

存放Speex重採樣器的句柄。

channel_index,[輸入]:

存放聲道的索引號,從0開始。

若是是單聲道,本參數就填0。

in,[輸入]:

存放重採樣前的一個單或多聲道16位有符號整型PCM格式音頻數據幀數組的內存指針。

in_len,[輸入&輸出]:

輸入時,存放用於存放重採樣前的音頻數據幀數組長度的變量的內存指針,單位個採樣數據。

輸出時,存放已處理多少個重採樣前的音頻採樣數據。

out,[輸出]:

存放重採樣後的一個單或多聲道16位有符號整型PCM格式音頻數據幀數組的內存指針。

out_len,[輸入&輸出]:

輸入時,存放用於存放重採樣後的音頻數據幀數組長度的變量的內存指針,單位個採樣數據。

輸出時,存放已寫入多少個重採樣後的音頻採樣數據。

返回值

RESAMPLER_ERR_SUCCESS枚舉(0x0000):成功。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):失敗。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):失敗。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):失敗。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):失敗。

錯誤碼

返回值就是錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex重採樣器句柄,但不能同時操做相同的Speex重採樣器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

重採樣前的和重採樣後的音頻數據幀數組的內存不能重疊。

 

  1. speex_resampler_process_interleaved_float(未完成)

函數名稱

speex_resampler_process_interleaved_float

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

/** Resample an interleaved float array. The input and output buffers must *not* overlap.

* @param st Resampler state

* @param in Input buffer

* @param in_len Number of input samples in the input buffer. Returns the number

* of samples processed. This is all per-channel.

* @param out Output buffer

* @param out_len Size of the output buffer. Returns the number of samples written.

* This is all per-channel.

*/

函數聲明

int speex_resampler_process_interleaved_float(SpeexResamplerState *st,

const float *in,

spx_uint32_t *in_len,

float *out,

spx_uint32_t *out_len);

類型 函數名(

類型 參數1,

類型 參數2,

……

);

函數參數

參數1,[輸入|輸出|輸入&輸出]:

參數說明。

參數2,[輸入|輸出|輸入&輸出]:

參數說明。

……

返回值

返回值1:返回值說明。

返回值2:返回值說明。

……

錯誤碼

EXXXX:錯誤碼說明。

EXXXX:錯誤碼說明。

……

線程安全

是 或 否 或 未知,表示此函數多線程調用是否會產生影響

原子操做

是 或 否 或 未知,表示此函數是不是單一操做,不是多個步驟的組合

執行速度

每毫秒能執行多少次。

其餘說明

……

……

 

  1. speex_resampler_process_interleaved_int(未完成)

函數名稱

speex_resampler_process_interleaved_int

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

/** Resample an interleaved int array. The input and output buffers must *not* overlap.

* @param st Resampler state

* @param in Input buffer

* @param in_len Number of input samples in the input buffer. Returns the number

* of samples processed. This is all per-channel.

* @param out Output buffer

* @param out_len Size of the output buffer. Returns the number of samples written.

* This is all per-channel.

*/

函數聲明

int speex_resampler_process_interleaved_int(SpeexResamplerState *st,

const spx_int16_t *in,

spx_uint32_t *in_len,

spx_int16_t *out,

spx_uint32_t *out_len);

類型 函數名(

類型 參數1,

類型 參數2,

……

);

函數參數

參數1,[輸入|輸出|輸入&輸出]:

參數說明。

參數2,[輸入|輸出|輸入&輸出]:

參數說明。

……

返回值

返回值1:返回值說明。

返回值2:返回值說明。

……

錯誤碼

EXXXX:錯誤碼說明。

EXXXX:錯誤碼說明。

……

線程安全

是 或 否 或 未知,表示此函數多線程調用是否會產生影響

原子操做

是 或 否 或 未知,表示此函數是不是單一操做,不是多個步驟的組合

執行速度

每毫秒能執行多少次。

其餘說明

……

……

 

  1. speex_resampler_skip_zeros(未完成)

函數名稱

speex_resampler_skip_zeros

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

Make sure that the first samples to go out of the resamplers don't have leading zeros. This is only useful before starting to use a newly created resampler. It is recommended to use that when resampling an audio file, as it will generate a file with the same length. For real-time processing, it is probably easier not to use this call (so that the output duration is the same for the first frame).

函數聲明

int speex_resampler_skip_zeros(

SpeexResamplerState * st

);

函數參數

st,[輸入]:

存放Speex重採樣器的句柄。

返回值

RESAMPLER_ERR_SUCCESS枚舉(0x0000):成功。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):失敗。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):失敗。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):失敗。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):失敗。

錯誤碼

返回值就是錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex重採樣器句柄,但不能同時操做相同的Speex重採樣器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

調用本函數後會致使重採樣後的音頻數據幀數組數據長度比應該要寫入的少,好比重採樣前是8000Hz的160個採樣數據,重採樣後應該爲16000Hz的320個採樣數據,但實際可能只有64個,若是不調用本函數,則重採樣後就是320個採樣數據,因此建議不要調用本函數。

 

  1. speex_resampler_reset_mem(未完成)

函數名稱

speex_resampler_reset_mem

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

Reset a resampler so a new (unrelated) stream can be processed.

函數聲明

int speex_resampler_reset_mem(

SpeexResamplerState * st

);

函數參數

st,[輸入]:

存放Speex重採樣器的句柄。

返回值

RESAMPLER_ERR_SUCCESS枚舉(0x0000):成功。

RESAMPLER_ERR_ALLOC_FAILED枚舉(0x0001):失敗。

RESAMPLER_ERR_BAD_STATE枚舉(0x0002):失敗。

RESAMPLER_ERR_INVALID_ARG枚舉(0x0003):失敗。

RESAMPLER_ERR_PTR_OVERLAP枚舉(0x0004):失敗。

錯誤碼

返回值就是錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex重採樣器句柄,但不能同時操做相同的Speex重採樣器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

 

 

  1. speex_resampler_destroy(未完成)

函數名稱

speex_resampler_destroy

頭文件

#include "speex/speex_resampler.h"

庫文件

#pragma comment( lib, "libspeexdsp.lib" )

函數功能

銷燬一個Speex重採樣器。

函數聲明

void speex_resampler_destroy(

SpeexResamplerState * st

);

函數參數

st,[輸入]:

存放Speex重採樣器的句柄。

返回值

錯誤碼

線程安全

多線程能夠同時操做不一樣的Speex重採樣器句柄,但不能同時操做相同的Speex重採樣器句柄。

原子操做

執行速度

每毫秒能執行多少次。

其餘說明

Speex重採樣器句柄在銷燬後就不能再使用了。

 

  1. 自適應抖動緩衝器

本人認爲效果很差,沒有再使用了。

  1. Speex專用自適應抖動緩衝器

本人認爲效果很差,沒有再使用了。

  1. 結構體庫

    1. 結構體模板(未完成)

結構體名稱

xxx

頭文件

#include "speex/speex.h"

結構體稱呼

結構體的中文稱呼。

結構體說明

結構體主要用途說明。

相關函數

Func1()、Func2()、Func3()…

結構體聲明

struct xxx

{

類型 成員變量1;

類型 成員變量2;

……

};

成員變量

成員變量1

成員變量說明。

成員變量2

成員變量說明。

……

其餘說明

……

……

 

  1. SpeexMode(未完成)

結構體名稱

SpeexMode

頭文件

#include "speex/speex.h"

結構體稱呼

Speex格式配置結構體。

結構體說明

用於指定編解碼器在壓縮和解壓縮時的相關參數。

相關函數

speex_encoder_init()、Func2()、Func3()…

結構體聲明

typedef struct SpeexMode {

const void * mode; /** Pointer to the low-level mode data */

mode_query_func query; /** Pointer to the mode query function */

const char * modeName;

int modeID;

int bitstream_version;

encoder_init_func enc_init;

encoder_destroy_func enc_destroy;

encode_func enc;

decoder_init_func dec_init;

decoder_destroy_func dec_destroy;

decode_func dec;

encoder_ctl_func enc_ctl;

decoder_ctl_func dec_ctl;

} SpeexMode;

成員變量

mode

成員變量說明。

query

存放Speex格式配置的查詢函數的內存指針。

若是是speex_nb_mode靜態結構體,本參數爲nb_mode_query()函數的內存指針。

若是是speex_wb_mode靜態結構體,本參數爲wb_mode_query()函數的內存指針。

若是是speex_uwb_mode靜態結構體,本參數爲wb_mode_query()函數的內存指針。

modeName

存放Speex格式配置的名稱字符串。

若是是speex_nb_mode靜態結構體,本參數爲"narrowband"。

若是是speex_wb_mode靜態結構體,本參數爲"wideband (sub-band CELP)"。

若是是speex_uwb_mode靜態結構體,本參數爲"ultra-wideband (sub-band CELP)"。

modeID

存放Speex格式配置的ID值。

若是是speex_nb_mode靜態結構體,本參數爲0。

若是是speex_wb_mode靜態結構體,本參數爲1。

若是是speex_uwb_mode靜態結構體,本參數爲2。

bitstream_version

存放bitstream的版本號,版本越高數值越大,主要爲了提高兼容性而升級。

enc_init

存放Speex格式配置的編碼器初始化函數的內存指針。

enc_destroy

存放Speex格式配置的編碼器銷燬函數的內存指針。

enc

存放Speex格式配置的編碼器壓縮函數的內存指針。

dec_init

存放Speex格式配置的解碼器初始化函數的內存指針。

dec_destroy

存放Speex格式配置的解碼器銷燬函數的內存指針。

dec

存放Speex格式配置的解碼器解壓縮函數的內存指針。

enc_ctl

存放Speex格式配置的編碼器控制函數的內存指針。

dec_ctl

存放Speex格式配置的解碼器控制函數的內存指針。

其餘說明

相關文章
相關標籤/搜索