Android音頻開發——對講機實時語音對話

前言

因爲公司需求, 安排我研究對講機的實時語音對話. 對講機點擊按鈕發起對話, Android 端接聽, 而後進行語音對話.研究了幾天第三方對講機Demo,發現這個demo只是簡單播放音頻, 並且尚未提供Android客戶端相關代碼,Java版也要本身看底層實現,沒辦法只有本身動手造, 我只想說 *** !!!,html

準備工做

一開始原本打算用Web端來作客戶端, 可是因爲技術有限, 中途換成Android(Kotlin) 端, 後臺是 SpringBoot. 先後端交互是經過WebSocket進行實時數據交互. 肯定技術方案後, 而後瞭解相關音頻格式java

  • PCM
    PCM(Pulse Code Modulation)也被稱爲脈碼編碼調製。PCM文件是模擬音頻信號經模數轉換(A/D變換)直接造成的二進制序列,該文件沒有附加的文件頭文件結束標誌。PCM中的聲音數據沒有被壓縮,若是是單聲道的文件,採樣數據按時間的前後順序依次存入。可是隻有這些數字化的音頻二進制序列並不可以播放,由於任何的播放器都不知道應該以什麼樣的聲道數、採樣頻率和採樣位數播放,這個二進制序列沒有任何自描述性。react

  • WAV
    WAVE(Waveform Audio File Format),又或者是由於擴展名而被大衆所知的WAV,也是一種無損音頻編碼。WAV文件能夠當成是PCM文件的wrapper,實際上查看pcm和對應wav文件的hex文件,能夠發現,wav文件只是在pcm文件的開頭多了44bytes,來表徵其聲道數、採樣頻率和採樣位數等信息。因爲其具備自描述性,WAV文件能夠被基本全部的音頻播放器播放.天然而言的認爲,若咱們須要在web端播放純純的PCM碼流,是否只須要在其頭部加上44bytes轉成對應的WAV文件,就能夠播放了。android

  • GSM
    GSM 06.10有損聲音壓縮。用於壓縮語音的有損格式,用於全球移動電信標準(GSM)。它的目的是有益於縮小音頻數據大小,可是當給定的音頻信號被屢次編碼和解碼時,它會引入大量的噪聲。這種格式被一些語音郵件應用程序使用。這是CPU密集型git

基本流程

  • 對講機發起呼叫(或者Android直接發起對話)
  • Android端接聽, 發去網絡請求.
  • 後端接受請求, 調用對講機接聽方法.
  • 後臺至關於一箇中轉站, 轉發Android的音頻數據、對講機回調音頻數據.
  • Android、後端 經過WebScoket 進行實時數據傳輸(byte)

注意

  • 對講機回調回來的是 GSM音頻數據
  • Android端錄音是數據是 PCM音頻數據

多說無益上碼

Android 端

在app gradle下添加相關依賴github

// WebSocket
api 'org.java-websocket:Java-WebSocket:1.3.6'

api 'com.github.tbruyelle:rxpermissions:0.10.2'

// retrofit
String retrofit_version = '2.4.0'
api "com.squareup.retrofit2:retrofit:$retrofit_version"
api "com.squareup.retrofit2:converter-gson:${retrofit_version}"
api "com.squareup.retrofit2:adapter-rxjava2:${retrofit_version}"

// okhttp
String okhttp_version = '3.4.1'
api "com.squareup.okhttp3:okhttp:${okhttp_version}"
api "com.squareup.okhttp3:logging-interceptor:${okhttp_version}"

// RxKotlin and RxAndroid 2.x
api 'io.reactivex.rxjava2:rxkotlin:2.3.0'
api 'io.reactivex.rxjava2:rxandroid:2.1.0'
複製代碼

新建JWebSocketClient 繼承WebSocketClientweb

class JWebSocketClient(serverUri: URI,private val callback: ((data: ByteBuffer?) -> Unit)) : WebSocketClient(serverUri) {

    override fun onOpen(handshakedata: ServerHandshake?) {
        Log.d("LLLLLLLLLLLL", "onOpen")
    }

    override fun onClose(code: Int, reason: String?, remote: Boolean) {
        Log.d("LLLLLLLLLLLL", "code = $code, onClose = $reason")
    }

    override fun onMessage(message: String?) {
        //Log.d("LLLLLLLLLLLL", "onMessage = $message")
    }


    override fun onMessage(bytes: ByteBuffer?) {
        super.onMessage(bytes)

        //Log.d("LLLLLLLLLLLL", "onMessage2 = $bytes")

        callback.invoke(bytes)
    }

    override fun onError(ex: Exception?) {
        Log.d("LLLLLLLLLLLL", "onError = $ex")
    }
}
複製代碼

onMessage方法, 接受到後臺傳過來的數據, 調用callback回調到Activity中處理。 MainActivitiy 相關代碼spring

class MainActivity : AppCompatActivity() {

    private lateinit var client: WebSocketClient

    private var isGranted = false
    private var isRecording = true

    private var disposable: Disposable? = null

    private val service by lazy {
        RetrofitFactory.newInstance.create(ApiService::class.java)
    }

    private val sampleRate = 8000
    private val channelIn = AudioFormat.CHANNEL_IN_MONO
    private val channelOut = AudioFormat.CHANNEL_OUT_MONO
    private val audioFormat = AudioFormat.ENCODING_PCM_16BIT

    private val trackBufferSize by lazy { AudioTrack.getMinBufferSize(sampleRate, channelOut, audioFormat) }

    private val recordBufferSize by lazy { AudioTrack.getMinBufferSize(sampleRate, channelOut, audioFormat) }

    private val audioTrack by lazy {
        AudioTrack(AudioManager.STREAM_MUSIC,
                sampleRate,
                channelOut,
                audioFormat,
                trackBufferSize,
                AudioTrack.MODE_STREAM)
    }

    /** * MediaRecorder.AudioSource.MIC指的是麥克風 */
    private val audioRecord by lazy {
        AudioRecord(MediaRecorder.AudioSource.MIC,
                sampleRate,
                channelIn,
                audioFormat,
                recordBufferSize)
    }

    private val pcm2WavUtil by lazy {
        FileUtils(sampleRate, channelIn, audioFormat)
    }

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        // 權限申請
        requestPermission()

        initWebSocket()

        btnReceive.setOnClickListener {
            if (client.readyState == WebSocket.READYSTATE.NOT_YET_CONNECTED) {
                client.connect()
            }

            audioTrack.play()
            // 傳入設備 
            service.talkIntercom(IdModel(10))
                    .observeOn(AndroidSchedulers.mainThread())
                    .subscribeOn(Schedulers.io())
                    .subscribe({
                        if (!isGranted) {
                            toast("拒絕權限申請, 錄音功能沒法使用")
                            return@subscribe
                        }

                        //檢測AudioRecord初始化是否成功
                        if (audioRecord.state != AudioRecord.STATE_INITIALIZED) {
                            toast("錄音初始化失敗")
                            return@subscribe
                        }

                        audioRecord.startRecording()
                        isRecording = true

                        thread {
                            val data = ByteArray(recordBufferSize)
                            while (isRecording) {
                                val readSize = audioRecord.read(data, 0, recordBufferSize)
                            
                                if (readSize >= AudioRecord.SUCCESS) {
                                    // 進行轉碼, 吧pcm轉化爲 wav
                                    // 至關於添加 文件頭
                                    client.send(pcm2WavUtil.pcm2wav(data))
                                } else {
                                    "讀取失敗".showLog()
                                }
                            }
                        }
                    }, {
                        "error = $it".showLog()
                    })
        }

        btnHangup.setOnClickListener {
            isRecording = false
            // 關掉錄音
            audioRecord.stop()
            // 關掉播放
            audioTrack.stop()

            service.hangupIntercom(IdModel(10))
                    .observeOn(AndroidSchedulers.mainThread())
                    .subscribeOn(Schedulers.io())
                    .subscribe {
                        toast("掛斷成功")
                    }
        }
    }

    private fun initWebSocket() {
        val uri = URI.create("ws://192.168.1.140:3014/websocket/16502")
        client = JWebSocketClient(uri) {
            val buffer = ByteArray(trackBufferSize)
            it?.let { byteBuffer ->
                //byteBuffer.array().size.toString().showLog()

                val inputStream = ByteArrayInputStream(byteBuffer.array())
                while (inputStream.available() > 0) {
                    val readCount = inputStream.read(buffer)
                    if (readCount == -1) {
                        "沒有更多數據能夠讀取了".showLog()
                        break
                    }
                    audioTrack.write(buffer, 0, readCount)
                }
            }
        }
    }

    private fun requestPermission() {
        disposable = RxPermissions(this)
                .request(android.Manifest.permission.RECORD_AUDIO,
                        android.Manifest.permission.WRITE_EXTERNAL_STORAGE)
                .subscribe { granted ->
                    if (!granted) {
                        toast("拒絕權限申請, 錄音功能沒法使用")
                        return@subscribe
                    }

                    isGranted = true
                }
    }

    override fun onDestroy() {
        super.onDestroy()
        client.close()
        disposable?.dispose()
        audioRecord.stop()
        audioRecord.release()
    }
}
複製代碼

因爲須要用到錄音, 因此要申請錄音權限.在初始化 WebSocket. 當點擊接聽按鈕時, 發起請求, 請求成功執行subscribe裏面邏輯, 讀取錄音數據(PCM), 轉化成 WAV格式傳遞給後端, 相關轉碼代碼以下chrome

fun pcm2wav(data: ByteArray): ByteArray{

        val sampleRate = 8000
        val channels = 1
        val byteRate = (16 * sampleRate * channels / 8).toLong()

        val totalAudioLen = data.size
        val totalDataLen = totalAudioLen + 36


        val header = ByteArray(44 + data.size)
        // RIFF/WAVE header
        header[0] = 'R'.toByte()
        header[1] = 'I'.toByte()
        header[2] = 'F'.toByte()
        header[3] = 'F'.toByte()
        header[4] = (totalDataLen and 0xff).toByte()
        header[5] = (totalDataLen shr 8 and 0xff).toByte()
        header[6] = (totalDataLen shr 16 and 0xff).toByte()
        header[7] = (totalDataLen shr 24 and 0xff).toByte()
        //WAVE
        header[8] = 'W'.toByte()
        header[9] = 'A'.toByte()
        header[10] = 'V'.toByte()
        header[11] = 'E'.toByte()
        // 'fmt ' chunk
        header[12] = 'f'.toByte()
        header[13] = 'm'.toByte()
        header[14] = 't'.toByte()
        header[15] = ' '.toByte()
        // 4 bytes: size of 'fmt ' chunk
        header[16] = 16
        header[17] = 0
        header[18] = 0
        header[19] = 0
        // format = 1
        header[20] = 1
        header[21] = 0
        header[22] = channels.toByte()
        header[23] = 0
        header[24] = (sampleRate and 0xff).toByte()
        header[25] = (sampleRate shr 8 and 0xff).toByte()
        header[26] = (sampleRate shr 16 and 0xff).toByte()
        header[27] = (sampleRate shr 24 and 0xff).toByte()
        header[28] = (byteRate and 0xff).toByte()
        header[29] = (byteRate shr 8 and 0xff).toByte()
        header[30] = (byteRate shr 16 and 0xff).toByte()
        header[31] = (byteRate shr 24 and 0xff).toByte()
        // block align
        header[32] = (2 * 16 / 8).toByte()
        header[33] = 0
        // bits per sample
        header[34] = 16
        header[35] = 0
        //data
        header[36] = 'd'.toByte()
        header[37] = 'a'.toByte()
        header[38] = 't'.toByte()
        header[39] = 'a'.toByte()
        header[40] = (totalAudioLen and 0xff).toByte()
        header[41] = (totalAudioLen shr 8 and 0xff).toByte()
        header[42] = (totalAudioLen shr 16 and 0xff).toByte()
        header[43] = (totalAudioLen shr 24 and 0xff).toByte()

        // 添加原始數據
        data.forEachIndexed { index, byte ->
            header[44 + index] = byte
        }

        return header
    }
複製代碼

後臺

在SpringBoot項目中引入 WebSocket 所須要的依賴後端

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
複製代碼

而後新建一個WebScoket 類, 代碼以下

import com.kapark.cloud.context.AudioSender;
import com.xiaoleilu.hutool.log.Log;
import com.xiaoleilu.hutool.log.LogFactory;
import org.springframework.stereotype.Component;

import javax.websocket.*;
import javax.websocket.server.PathParam;
import javax.websocket.server.ServerEndpoint;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.concurrent.ConcurrentHashMap;

/** * @author: hyzhan * @date: 2019/6/14 * @desc: TODO */
@Component
@ServerEndpoint("/websocket/{devId}")
public class AudioSocket {

    private static Log log = LogFactory.get(AudioSocket.class);
    //靜態變量,用來記錄當前在線鏈接數。應該把它設計成線程安全的。
    private static int onlineCount = 0;
    //concurrent包的線程安全Set,用來存放每一個客戶端對應的 AudioSocket 對象。
    private static ConcurrentHashMap<Integer, AudioSocket> webSocketMap = new ConcurrentHashMap<>();

    //與某個客戶端的鏈接會話,須要經過它來給客戶端發送數據
    private Session session;

    //接收sid
    private int devId;

    /** * 鏈接創建成功調用的方法 */
    @OnOpen
    public void onOpen(Session session, @PathParam("devId") int devId) {
        this.session = session;
        this.devId = devId;

        webSocketMap.put(devId, this);
        addOnlineCount();           //在線數加1
        log.info("有新窗口開始監聽:" + devId + ",當前在線人數爲" + getOnlineCount());
    }

    /** * 鏈接關閉調用的方法 */
    @OnClose
    public void onClose() {
        webSocketMap.remove(devId, this);  //從set中刪除
        subOnlineCount();           //在線數減1
        log.info("有一鏈接關閉!當前在線人數爲" + getOnlineCount());
    }

    @OnMessage
    public void onMessage(String message, Session session) {
        log.info("接受String: " + message);
    }

    /** * 收到客戶端消息後調用的方法 * * @param message 客戶端發送過來的消息 */
    @OnMessage
    public void onMessage(byte[] message, Session session) {
        log.info("接受byte length: " + message.length);
        AudioSender.send2Intercom(devId, message);
    }


    @OnError
    public void onError(Session session, Throwable error) {
        log.error("發生錯誤");
        error.printStackTrace();
    }

    /** * 發自定義消息 */
    public static void send2Client(int devId, byte[] data, int len) {
        AudioSocket audioSocket = webSocketMap.get(devId);
        if (audioSocket != null) {
            try {
                synchronized (audioSocket.session) {
                    audioSocket.session.getBasicRemote().sendBinary(ByteBuffer.wrap(data, 0, len));
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static synchronized int getOnlineCount() {
        return onlineCount;
    }

    private static synchronized void addOnlineCount() {
        AudioSocket.onlineCount++;
    }

    private static synchronized void subOnlineCount() {
        AudioSocket.onlineCount--;
    }
}
複製代碼

代碼中添加了相關的註釋, 咱們關注其中的onMessage(), 和 send2Client()方法, onMessage()收到客戶端發來消息, 根據設備id, 調用send2Intercom方法發送音頻給對應id的對講機, send2Intercom方法以下:

public static void send2Intercom(int devId, byte[] data) {
        try {

            // 把 傳過來的音頻數據轉化成 inputStream
            InputStream inputStream = new ByteArrayInputStream(data);

            // 在根據 inputStream 轉化成 audioInputStream (音頻輸入數據流)
            AudioInputStream pcmInputStream = AudioSystem.getAudioInputStream(inputStream);
            // 轉碼, pcm數據類型 轉化成 gsm 類型
            AudioInputStream gsmInputStream = AudioSystem.getAudioInputStream(gsmFormat, pcmInputStream);

            // 這個byte大小可根據自身自行調整
            byte[] tempBytes = new byte[50];
            int len;
            while ((len = gsmInputStream.read(tempBytes)) != -1) {
                // 調用對講機相關方法(requestSendAudioData), 發送給對講機
                DongSDKProxy.requestSendAudioData(devId, tempBytes, len);
            }
        } catch (UnsupportedAudioFileException | IOException e) {
            e.printStackTrace();
        }
    }
複製代碼

GMS格式代碼以下

AudioFormat gsmFormat = new AudioFormat(org.tritonus.share.sampled.Encodings.getEncoding("GSM0610"),
                    8000.0F,        // sampleRate
                    -1,             // sampleSizeInBits
                    1,              // channels
                    33,             // frameSize
                    50.0F,          // frameRate
                    false);
複製代碼

注意

因爲實時語音傳遞數據, 默認有大小限制, 因此後臺 WebSocket, 須要把BufferSize設置傳遞數據大一些, 參考代碼以下:

@Configuration
public class WebSocketConfig {

    @Bean
    public ServerEndpointExporter serverEndpointExporter() {
        return new ServerEndpointExporter();
    }

    @Bean
    public ServletServerContainerFactoryBean createWebSocketContainer() {
        ServletServerContainerFactoryBean container = new ServletServerContainerFactoryBean();
        container.setMaxTextMessageBufferSize(500000);
        container.setMaxBinaryMessageBufferSize(500000);
        return container;
    }
}
複製代碼

後臺這邊接收到對講機的回調onAudioData, 調用audioSender.send2Client方法, 進行解碼, 發送給Android端

/** * 音頻數據 */
    @Override
    public int onAudioData(int dwDeviceID, InfoMediaData audioData) {
        String clazzName = Thread.currentThread().getStackTrace()[1].getMethodName();
        
        audioSender.send2Client(dwDeviceID, audioData.pRawData, audioData.nRawLen);
        return 0;
    }
複製代碼
public void send2Client(int devId, byte[] data, long total) {

        /** * 一、對講機回調回來的數據是 gsm格式 * 二、把傳過來的音頻數據(gsm)轉化成 inputStream * 三、把 inputStream 轉化對應的音頻輸入流 audioInputStream * 四、在根據pcmFormat 轉化成 pcm 格式的 inputStream * 五、在讀取inputStream, 發送音頻數據到Android端 */
        try (InputStream inputStream = new ByteArrayInputStream(data);
             AudioInputStream gsmInputStream = AudioSystem.getAudioInputStream(inputStream);
             AudioInputStream pcmInputStream = AudioSystem.getAudioInputStream(pcmFormat, gsmInputStream)) {
            
            // 這個byte大小, 能夠根據自身需求進行調整
            byte[] tempBytes = new byte[50];
            int len;
            while ((len = pcmInputStream.read(tempBytes)) != -1) {
                // 調用 WebSocket, 發送給客戶端
                AudioSocket.send2Client(devId, tempBytes, len);
            }
        } catch (UnsupportedAudioFileException | IOException e) {
            e.printStackTrace();
        }
    }
複製代碼

pcmFormat 格式代碼以下

// PCM_SIGNED 8000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian
pcmFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                8000f,
                16,
                1,
                2,
                8000f,
                false);
複製代碼

再回頭看看Android 端的接受數據處理:

private fun initWebSocket() {
        val uri = URI.create("ws://192.168.1.140:3014/websocket/16502")
        client = JWebSocketClient(uri) {
            val buffer = ByteArray(trackBufferSize)
            it?.let { byteBuffer ->
                val inputStream = ByteArrayInputStream(byteBuffer.array())
                while (inputStream.available() > 0) {
                    val readCount = inputStream.read(buffer)
                    if (readCount == -1) {
                        "沒有更多數據能夠讀取了".showLog()
                        break
                    }
                    audioTrack.write(buffer, 0, readCount)
                }
            }
        }
    }
複製代碼

JWebSocketClient{ } 這裏對應 WebSocket 的 onMessage 回調, 把讀取的數據直接丟到 audioTrack中便可, audioTrack只播放PCM格式數據, 而咱們在後臺已經轉碼成PCM格式, 因此能夠直接播放

最後

到此, 上述就是一個相對完整的實時語音流程。第一次研究Andorid音頻相關開發, 可能有些知識點理解不深刻, 若有不妥, 望各位大佬指點一二。

附上源碼

Android端源碼傳送門

後端源碼傳送門 因爲涉及公司代碼, 因此只上傳核心的Java類

Google在線轉碼 便於測試

感謝閱讀。下次再見。

相關文章
相關標籤/搜索