Android音頻編解碼和混音實現

時間 2019-12-25

原文原文鏈接

相關源碼:https://github.com/YeDaxia/MusicPlusphp

認識數字音頻:

在實現以前，咱們先來了解一下數字音頻的有關屬性。html

採樣頻率(Sample Rate)：每秒採集聲音的數量，它用赫茲(Hz)來表示。(採樣率越高越靠近原聲音的波形)
採樣精度(Bit Depth)：指記錄聲音的動態範圍，它以位(Bit)爲單位。(聲音的幅度差)
聲音通道(Channel)：聲道數。好比左聲道右聲道。java

採樣量化後的音頻最終是一串數字,聲音的大小(幅度)會體如今這個每一個數字數值大小上；而聲音的高低(頻率)和聲音的音色(Timbre)都和時間維度有關，會體如今數字之間的差別上。android

在編碼解碼以前，咱們先來感覺一下原始的音頻數據到底是什麼樣的。咱們知道wav文件裏面放的就是原始的PCM數據，下面咱們經過AudioTrack來直接把這些數據write進去播放出來。下面是某個wav文件的格式，關於wav的格式內容能夠看:http://soundfile.sapp.org/doc/WaveFormat/ ，能夠經過Binary Viewer等工具去查看一下wav文件的二進制內容。ios

播放wav文件:git

int sampleRateInHz = 44100;
int channelConfig = AudioFormat.CHANNEL_OUT_STEREO;
int audioFormat = AudioFormat.ENCODING_PCM_16BIT;

int bufferSizeInBytes = AudioTrack.getMinBufferSize(sampleRateInHz, channelConfig, audioFormat);
AudioTrack audioTrack = new  AudioTrack(AudioManager.STREAM_MUSIC, sampleRateInHz, channelConfig, audioFormat, bufferSizeInBytes, AudioTrack.MODE_STREAM);
audioTrack.play();
			
FileInputStream audioInput = null;
try {
	audioInput = new FileInputStream(audioFile);//put your wav file in

	audioInput.read(new byte[44]);//skid 44 wav header
	
	byte[] audioData = new byte[512];
	
	while(audioInput.read(audioData)!= -1){
		audioTrack.write(audioData, 0, audioData.length); //play raw audio bytes
	}
	
} catch (FileNotFoundException e) {
	e.printStackTrace();
} catch (IOException e) {
	e.printStackTrace();
}finally{
	audioTrack.stop();
	audioTrack.release();
	if(audioInput != null)
		try {
			audioInput.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
}

若是你有試過一下上面的例子，那你應該對音頻的源數據有了一個概念了。github

音頻的解碼:

經過上面的介紹，咱們不難知道，解碼的目的就是讓編碼後的數據恢復成wav中的源數據。算法

利用MediaExtractor和MediaCodec來提取編碼後的音頻數據並解壓成音頻源數據:編程

final String encodeFile = "your encode audio file path";
MediaExtractor extractor = new MediaExtractor();
extractor.setDataSource(encodeFile);

MediaFormat mediaFormat = null;
for (int i = 0; i < extractor.getTrackCount(); i++) {
	MediaFormat format = extractor.getTrackFormat(i);
	String mime = format.getString(MediaFormat.KEY_MIME);
	if (mime.startsWith("audio/")) {
		extractor.selectTrack(i);
		mediaFormat = format;
		break;
	}
}

if(mediaFormat == null){
	DLog.e("not a valid file with audio track..");
	extractor.release();
	return null;
}

FileOutputStream fosDecoder = new FileOutputStream(outDecodeFile);//your out file path

String mediaMime = mediaFormat.getString(MediaFormat.KEY_MIME);
MediaCodec codec = MediaCodec.createDecoderByType(mediaMime);
codec.configure(mediaFormat, null, null, 0);
codec.start();

ByteBuffer[] codecInputBuffers = codec.getInputBuffers();
ByteBuffer[] codecOutputBuffers = codec.getOutputBuffers();

final long kTimeOutUs = 5000;
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
boolean sawInputEOS = false;
boolean sawOutputEOS = false;
int totalRawSize = 0;
try{
	while (!sawOutputEOS) {
		if (!sawInputEOS) {
			int inputBufIndex = codec.dequeueInputBuffer(kTimeOutUs);
			if (inputBufIndex >= 0) {
				ByteBuffer dstBuf = codecInputBuffers[inputBufIndex];
				int sampleSize = extractor.readSampleData(dstBuf, 0);
				if (sampleSize < 0) {
					DLog.i(TAG, "saw input EOS.");
					sawInputEOS = true;
					codec.queueInputBuffer(inputBufIndex,0,0,0,MediaCodec.BUFFER_FLAG_END_OF_STREAM );
				} else {
					long presentationTimeUs = extractor.getSampleTime();
					codec.queueInputBuffer(inputBufIndex,0,sampleSize,presentationTimeUs,0);
					extractor.advance();
				}
			}
		}
		int res = codec.dequeueOutputBuffer(info, kTimeOutUs);
		if (res >= 0) {

			 int outputBufIndex = res;
			// Simply ignore codec config buffers.
			if ((info.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)!= 0) {
				 DLog.i(TAG, "audio encoder: codec config buffer");
				 codec.releaseOutputBuffer(outputBufIndex, false);
				 continue;
			 }
			 
			if(info.size != 0){
				
				ByteBuffer outBuf = codecOutputBuffers[outputBufIndex];
				
				outBuf.position(info.offset);
				outBuf.limit(info.offset + info.size);
				byte[] data = new byte[info.size];
				outBuf.get(data);
				totalRawSize += data.length;
				fosDecoder.write(data);
				
			}
			
			codec.releaseOutputBuffer(outputBufIndex, false);
			
			if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
				DLog.i(TAG, "saw output EOS.");
				sawOutputEOS = true;
			}
			
		} else if (res == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
			codecOutputBuffers = codec.getOutputBuffers();
			DLog.i(TAG, "output buffers have changed.");
		} else if (res == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
			MediaFormat oformat = codec.getOutputFormat();
			DLog.i(TAG, "output format has changed to " + oformat);
		}
	}
}finally{
	fosDecoder.close();
	codec.stop();
	codec.release();
	extractor.release();
}

解壓以後，能夠用AudioTrack來播放驗證一下這些數據是否正確。app

音頻的混音:

音頻混音的原理: 量化的語音信號的疊加等價於空氣中聲波的疊加。

反應到音頻數據上，也就是把同一個聲道的數值進行簡單的相加，可是這樣同時會產生一個問題，那就是相加的結果可能會溢出，固然爲了解決這個問題已經有不少方案了，在這裏咱們採用簡單的平均算法(average audio mixing algorithm, 簡稱V算法)。在下面的演示程序中，咱們假設音頻文件是的採樣率，通道和採樣精度都是同樣的,這樣會便於處理。另外要注意的是，在源音頻數據中是按照little-endian的順序來排放的，PCM值爲0表示沒聲音(振幅爲0)。

public void mixAudios(File[] rawAudioFiles){
	
	final int fileSize = rawAudioFiles.length;

	FileInputStream[] audioFileStreams = new FileInputStream[fileSize];
	File audioFile = null;
	
	FileInputStream inputStream;
	byte[][] allAudioBytes = new byte[fileSize][];
	boolean[] streamDoneArray = new boolean[fileSize];
	byte[] buffer = new byte[512];
	int offset;
	
	try {
		
		for (int fileIndex = 0; fileIndex < fileSize; ++fileIndex) {
			audioFile = rawAudioFiles[fileIndex];
			audioFileStreams[fileIndex] = new FileInputStream(audioFile);
		}

		while(true){
			
			for(int streamIndex = 0 ; streamIndex < fileSize ; ++streamIndex){
				
				inputStream = audioFileStreams[streamIndex];
				if(!streamDoneArray[streamIndex] && (offset = inputStream.read(buffer)) != -1){
					allAudioBytes[streamIndex] = Arrays.copyOf(buffer,buffer.length);
				}else{
					streamDoneArray[streamIndex] = true;
					allAudioBytes[streamIndex] = new byte[512];
				}
			}
			
			byte[] mixBytes = mixRawAudioBytes(allAudioBytes);
			
			//mixBytes 就是混合後的數據
			
			boolean done = true;
			for(boolean streamEnd : streamDoneArray){
				if(!streamEnd){
					done = false;
				}
			}
			
			if(done){
				break;
			}
		}
		
	} catch (IOException e) {
		e.printStackTrace();
		if(mOnAudioMixListener != null)
			mOnAudioMixListener.onMixError(1);
	}finally{
		try {
			for(FileInputStream in : audioFileStreams){
				if(in != null)
					in.close();
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

/**
 * 每一行是一個音頻的數據
 */
byte[] averageMix(byte[][] bMulRoadAudioes) {
		
		if (bMulRoadAudioes == null || bMulRoadAudioes.length == 0)
			return null;

		byte[] realMixAudio = bMulRoadAudioes[0];
		
		if(bMulRoadAudioes.length == 1)
			return realMixAudio;
		
		for(int rw = 0 ; rw < bMulRoadAudioes.length ; ++rw){
			if(bMulRoadAudioes[rw].length != realMixAudio.length){
				Log.e("app", "column of the road of audio + " + rw +" is diffrent.");
				return null;
			}
		}
		
		int row = bMulRoadAudioes.length;
		int coloum = realMixAudio.length / 2;
		short[][] sMulRoadAudioes = new short[row][coloum];

		for (int r = 0; r < row; ++r) {
			for (int c = 0; c < coloum; ++c) {
				sMulRoadAudioes[r][c] = (short) ((bMulRoadAudioes[r][c * 2] & 0xff) | (bMulRoadAudioes[r][c * 2 + 1] & 0xff) << 8);
			}
		}

		short[] sMixAudio = new short[coloum];
		int mixVal;
		int sr = 0;
		for (int sc = 0; sc < coloum; ++sc) {
			mixVal = 0;
			sr = 0;
			for (; sr < row; ++sr) {
				mixVal += sMulRoadAudioes[sr][sc];
			}
			sMixAudio[sc] = (short) (mixVal / row);
		}

		for (sr = 0; sr < coloum; ++sr) {
			realMixAudio[sr * 2] = (byte) (sMixAudio[sr] & 0x00FF);
			realMixAudio[sr * 2 + 1] = (byte) ((sMixAudio[sr] & 0xFF00) >> 8);
		}

		return realMixAudio;
}

一樣，你能夠把混音後的數據用AudioTrack播放出來，驗證一下混音的效果。

音頻的編碼:

對音頻進行編碼的目的用更少的空間來存儲和傳輸，有有損編碼和無損編碼，其中咱們常見的Mp3和ACC格式就是有損編碼。在下面的例子中，咱們經過MediaCodec來對混音後的數據進行編碼，在這裏，咱們將採用ACC格式來進行。

ACC音頻有ADIF和ADTS兩種，第一種適用於磁盤，第二種則能夠用於流的傳輸，它是一種幀序列。咱們這裏用ADTS這種來進行編碼，首先要了解一下它的幀序列的構成:

ADTS的幀結構:

header

body

ADTS幀的Header組成:

Length (bits)	Description
12	syncword 0xFFF, all bits must be 1
1	MPEG Version: 0 for MPEG-4, 1 for MPEG-2
2	Layer: always 0
1	protection absent, Warning, set to 1 if there is no CRC and 0 if there is CRC
2	profile, the MPEG-4 Audio Object Type minus 1
4	MPEG-4 Sampling Frequency Index (15 is forbidden)
1	private bit, guaranteed never to be used by MPEG, set to 0 when encoding, ignore when decoding
3	MPEG-4 Channel Configuration (in the case of 0, the channel configuration is sent via an inband PCE)
1	originality, set to 0 when encoding, ignore when decoding
1	home, set to 0 when encoding, ignore when decoding
1	copyrighted id bit, the next bit of a centrally registered copyright identifier, set to 0 when encoding, ignore when decoding
1	copyright id start, signals that this frame's copyright id bit is the first bit of the copyright id, set to 0 when encoding, ignore when decoding
13	frame length, this value must include 7 or 9 bytes of header length: FrameLength = (ProtectionAbsent == 1 ? 7 : 9) + size(AACFrame)
11	Buffer fullness
2	Number of AAC frames (RDBs) in ADTS frame minus 1, for maximum compatibility always use 1 AAC frame per ADTS frame
16	CRC if protection absent is 0

咱們的思路就很明確了，把編碼後的每一幀數據加上header寫到文件中，保存後的.acc文件應該是能夠被播放器識別播放的。爲了簡單，咱們仍是假設以前生成的混音數據源的採樣率是44100Hz，通道數是2，採樣精度是16Bit。

把音頻源數據編碼成ACC格式完成源代碼:

class AACAudioEncoder{
	private final static String TAG = "AACAudioEncoder";
	private final static String AUDIO_MIME = "audio/mp4a-latm";
	private final static long audioBytesPerSample = 44100*16/8;
	private String rawAudioFile；
	AACAudioEncoder(String rawAudioFile) {
		this.rawAudioFile = rawAudioFile;
	}
	@Override
	public void encodeToFile(String outEncodeFile) {
		FileInputStream fisRawAudio = null;
		FileOutputStream fosAccAudio = null;
		try {
			fisRawAudio = new FileInputStream(rawAudioFile);
			fosAccAudio = new FileOutputStream(outEncodeFile);
			final MediaCodec audioEncoder = createACCAudioDecoder();
			audioEncoder.start();
			ByteBuffer[] audioInputBuffers = audioEncoder.getInputBuffers();
			ByteBuffer[] audioOutputBuffers = audioEncoder.getOutputBuffers();
			boolean sawInputEOS = false;
	        boolean sawOutputEOS = false;
	        long audioTimeUs = 0 ;
			BufferInfo outBufferInfo = new BufferInfo();
			boolean readRawAudioEOS = false;
			byte[] rawInputBytes = new byte[4096];
			int readRawAudioCount = 0;
			int rawAudioSize = 0;
			long lastAudioPresentationTimeUs = 0;
			int inputBufIndex, outputBufIndex;
	        while(!sawOutputEOS){
	        	if (!sawInputEOS) {
	        		 inputBufIndex = audioEncoder.dequeueInputBuffer(10000);
				     if (inputBufIndex >= 0) {
				           ByteBuffer inputBuffer = audioInputBuffers[inputBufIndex];
				           inputBuffer.clear();
				           int bufferSize = inputBuffer.remaining();
				           if(bufferSize != rawInputBytes.length){
				        	   rawInputBytes = new byte[bufferSize];
				           }
				           if(!readRawAudioEOS){
				        	   readRawAudioCount = fisRawAudio.read(rawInputBytes);
				        	   if(readRawAudioCount == -1){
				        		   readRawAudioEOS = true;
				        	   }
				           }
				           if(readRawAudioEOS){
			        		   audioEncoder.queueInputBuffer(inputBufIndex,0 , 0 , 0 ,MediaCodec.BUFFER_FLAG_END_OF_STREAM);
				        	   sawInputEOS = true;
				           }else{
				        	   inputBuffer.put(rawInputBytes, 0, readRawAudioCount);
					           rawAudioSize += readRawAudioCount;
					           audioEncoder.queueInputBuffer(inputBufIndex, 0, readRawAudioCount, audioTimeUs, 0);
					           audioTimeUs = (long) (1000000 * (rawAudioSize / 2.0) / audioBytesPerSample);
				           }
				     }
	        	}
	        	outputBufIndex = audioEncoder.dequeueOutputBuffer(outBufferInfo, 10000);
	        	if(outputBufIndex >= 0){
	        		// Simply ignore codec config buffers.
	        		if ((outBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)!= 0) {
	                     DLog.i(TAG, "audio encoder: codec config buffer");
	                     audioEncoder.releaseOutputBuffer(outputBufIndex, false);
	                     continue;
	                 }
	        		if(outBufferInfo.size != 0){
		        		 ByteBuffer outBuffer = audioOutputBuffers[outputBufIndex];
		        		 outBuffer.position(outBufferInfo.offset);
		        		 outBuffer.limit(outBufferInfo.offset + outBufferInfo.size);
		        		 DLog.i(TAG, String.format(" writing audio sample : size=%s , presentationTimeUs=%s", outBufferInfo.size, outBufferInfo.presentationTimeUs));
		        		 if(lastAudioPresentationTimeUs < outBufferInfo.presentationTimeUs){
			        		 lastAudioPresentationTimeUs = outBufferInfo.presentationTimeUs;
			        		 int outBufSize   = outBufferInfo.size;
			        		 int outPacketSize = outBufSize + 7;
			        		 outBuffer.position(outBufferInfo.offset);
			        		 outBuffer.limit(outBufferInfo.offset + outBufSize);
			        		 byte[] outData = new byte[outBufSize + 7];
			        		 addADTStoPacket(outData, outPacketSize);
			        		 outBuffer.get(outData, 7, outBufSize);
			        		 fosAccAudio.write(outData, 0, outData.length);
			                 DLog.i(TAG, outData.length + " bytes written.");
		        		 }else{
		        			 DLog.e(TAG, "error sample! its presentationTimeUs should not lower than before.");
		        		 }
	        		}
	        		audioEncoder.releaseOutputBuffer(outputBufIndex, false);
	                 if ((outBufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
				           sawOutputEOS = true;
				     }
	        	}else if (outputBufIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
	        		audioOutputBuffers = audioEncoder.getOutputBuffers();
			    } else if (outputBufIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
			    	MediaFormat audioFormat = audioEncoder.getOutputFormat();
			    	DLog.i(TAG, "format change : "+ audioFormat);
			    }
	        }
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if (fisRawAudio != null)
					fisRawAudio.close();
				if(fosAccAudio != null)
					fosAccAudio.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	private MediaCodec createACCAudioDecoder() throws IOException {
		MediaCodec	codec = MediaCodec.createEncoderByType(AUDIO_MIME);
		MediaFormat format = new MediaFormat();
		format.setString(MediaFormat.KEY_MIME, AUDIO_MIME);
		format.setInteger(MediaFormat.KEY_BIT_RATE, 128000);
		format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, 2);
		format.setInteger(MediaFormat.KEY_SAMPLE_RATE, 44100);
		format.setInteger(MediaFormat.KEY_AAC_PROFILE,MediaCodecInfo.CodecProfileLevel.AACObjectLC);
		codec.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
		return codec;
	}
	/**
     *  Add ADTS header at the beginning of each and every AAC packet.
     *  This is needed as MediaCodec encoder generates a packet of raw
     *  AAC data.
     *
     *  Note the packetLen must count in the ADTS header itself.
     **/
    private void addADTStoPacket(byte[] packet, int packetLen) {
        int profile = 2;  //AAC LC
        //39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;
        int freqIdx = 4;  //44.1KHz
        int chanCfg = 2;  //CPE
        // fill in ADTS data
        packet[0] = (byte)0xFF;
        packet[1] = (byte)0xF9;
        packet[2] = (byte)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
        packet[3] = (byte)(((chanCfg&3)<<6) + (packetLen>>11));
        packet[4] = (byte)((packetLen&0x7FF) >> 3);
        packet[5] = (byte)(((packetLen&7)<<5) + 0x1F);
        packet[6] = (byte)0xFC;
    }
}

參考資料:

數字音頻: http://en.flossmanuals.net/pure-data/ch003_what-is-digital-audio/

WAV文件格式: http://soundfile.sapp.org/doc/WaveFormat/

ACC文件格式: http://www.cnblogs.com/caosiyang/archive/2012/07/16/2594029.html

有關Android Media編程的一些CTS:https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts