語音識別,語義理解一站式解決之智能照相機(人臉識別,olami)html
若是有代碼排版和圖片顯示問題,請訪問CSDN博客。
轉載請註明CSDN博文地址:http://blog.csdn.net/ls0609/a...java
olami sdk實現了把錄音或者文字轉化爲用戶能夠理解的json字符串從而實現語義理解,用戶能夠定義本身的
語義,經過這種方式能夠實現用戶須要的語義理解。前面寫了兩篇語音識別,語義理解的博文,分別是語音
在線聽書和語音記賬軟件,本篇是語音智能照相機。android
1.智能照相機的功能git
手機後攝像頭像素比較高,若是用後設想頭對準本身自拍,那麼看不到屏幕的狀況下怎麼知道
本身在不在鏡頭中呢?而本篇作的智能照相機就能夠爲您解決這個問題。
想要作的是這樣一個照相機app,能夠語音切換攝像頭,人臉識別並語音播報識別的人臉是否在屏幕中央,
是偏向哪裏,當人臉居中的時候,提示用戶能夠拍照了,用戶說「拍照」,「茄子」就會自動抓拍並保存圖
片在手機中。json
抓了兩張應用運行時的圖片: canvas
2.eclipse中的lib目錄結構以下api
assets下面的事tts播報的資源文件
libs目錄下,
libtts.so tts播報所需的庫文件
libspeex.so 語音識別所需的庫文件
libolamsc.so 語音識別所需的庫文件
tts.jar tts播報所需的庫文件
voicesdk_android.jar 語音識別所需的庫文件服務器
3.AndroidManifest.xml網絡
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"app
package="com.olami" android:versionCode="1" android:versionName="1.0" > <uses-sdk android:minSdkVersion="8" android:targetSdkVersion="14" /> <uses-permission android:name="android.permission.RECORD_AUDIO"/> <uses-permission android:name="android.permission.INTERNET"/> <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/> <uses-permission android:name="android.permission.ACCESS_WIFI_STATE"/> <uses-permission android:name="android.permission.READ_PHONE_STATE"/> <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" /> <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" /> <uses-permission android:name="android.permission.MOUNT_UNMOUNT_FILESYSTEMS"/> <uses-permission android:name="android.permission.CAMERA" /> <application android:allowBackup="true" android:icon="@drawable/ic_launcher" android:label="@string/app_name" android:theme="@style/AppTheme" > <activity android:name=".MainActivity" android:label="@string/app_name" > <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> </application>
</manifest>
須要錄音,網絡,讀寫sd卡,拍照等權限。
4.layout佈局
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools" android:layout_width="match_parent" android:layout_height="match_parent"> <FrameLayout android:layout_width="match_parent" android:layout_height="match_parent"> <SurfaceView android:id="@+id/sView" android:layout_width="match_parent" android:layout_height="wrap_content"/> <com.olami.FaceView android:id="@+id/faceView" android:layout_width="match_parent" android:layout_height="match_parent"/> </FrameLayout> <Button android:id="@+id/btn_start" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_alignParentBottom="true" android:layout_centerHorizontal="true" android:text="開始" />
</RelativeLayout>
在surfaceview中自定義了一個FaceView,faceview用來顯示抓拍的人臉。
屏幕最下方有個button,由於這個版本暫時不支持語音喚醒功能(後續添加後再更新),添加一個button用於用戶想隨時說拍照的時候點擊觸發用。
5.MainActivity.java 和FaceView.java
1.MainActivity.Java
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState); setContentView(R.layout.layout_camera); initHandler();//用於處理錄音狀態回調的消息 initView(); //初始化界面 initViaVoiceRecognizerListener(); //初始化olami語音回調監聽 init(); //初始化olami語音識別sdk initTts(); //初始化tts語音播報 DisplayMetrics dm = new DisplayMetrics();//定義DisplayMetrics對象 getWindowManager().getDefaultDisplay().getMetrics(dm);//取得窗口屬性 mScreenCenterx = dm.widthPixels/2;//窗口的寬度 mScreenCentery = dm.heightPixels/2; //窗口的高度
}
如下是olamisdk的初始化
public void init()
{
mOlamiVoiceRecognizer = new OlamiVoiceRecognizer(MainActivity.this); TelephonyManager telephonyManager= (TelephonyManager) this.getSystemService( this.getBaseContext().TELEPHONY_SERVICE); String imei=telephonyManager.getDeviceId(); mOlamiVoiceRecognizer.init(imei);//設置身份標識,能夠填null //設置識別結果回調listener mOlamiVoiceRecognizer.setListener(mOlamiVoiceRecognizerListener); //設置支持的語音類型,優先選擇中文簡體 mOlamiVoiceRecognizer.setLocalization( OlamiVoiceRecognizer.LANGUAGE_SIMPLIFIED_CHINESE); mOlamiVoiceRecognizer.setAuthorization( "51a4bb56ba954655a4fc834bfdc46af1", "asr", "68bff251789b426896e70e888f919a6d", "nli"); //註冊Appkey,在olami官網註冊應用後生成的appkey //註冊api,請直接填寫「asr」,標識語音識別類型 //註冊secret,在olami官網註冊應用後生成的secret //註冊seq ,請填寫「nli」 //錄音時尾音結束時間,建議填//2000ms mOlamiVoiceRecognizer.setVADTailTimeout(2000); //設置經緯度信息,不肯上傳位置信息,能夠填0 mOlamiVoiceRecognizer.setLatitudeAndLongitude( 31.155364678184498,121.34882432933009);
}
定義OlamiVoiceRecognizerListener,此處代碼就不貼了。
onError(int errCode)//出錯回調,能夠對比官方文檔錯誤碼看是什麼錯誤
onEndOfSpeech()//錄音結束
onBeginningOfSpeech()//錄音開始
onResult(String result, int type)//result是識別結果JSON字符串
onCancel()//取消識別,不會再返回識別結果
onUpdateVolume(int volume)//錄音時的音量,1-12個級別大小音量
如下是handler消息處理,包含語義解析
private void initHandler()
{ mHandler = new Handler(){ @Override public void handleMessage(Message msg) { switch (msg.what){ case MessageConst.CLIENT_ACTION_START_RECORED: mBtnStart.setText("錄音中"); break; case MessageConst.CLIENT_ACTION_STOP_RECORED: mBtnStart.setText("識別中"); break; case MessageConst.CLIENT_ACTION_CANCEL_RECORED: mBtnStart.setText("開始"); break; case MessageConst.CLIENT_ACTION_ON_ERROR: mBtnStart.setText("開始"); break; case MessageConst.CLIENT_ACTION_UPDATA_VOLUME: //mTextViewVolume.setText("音量: "+msg.arg1); break; case MessageConst.SERVER_ACTION_RETURN_RESULT: mBtnStart.setText("開始"); try{ String message = (String) msg.obj; String input = null; JSONObject jsonObject = new JSONObject(message); JSONArray jArrayNli = jsonObject.optJSONObject("data").optJSONArray("nli"); JSONObject jObj = jArrayNli.optJSONObject(0); JSONArray jArraySemantic = null; if(message.contains("semantic")) { jArraySemantic = jObj.getJSONArray("semantic"); String modifier = jArraySemantic.optJSONObject(0).optJSONArray( "modifier").optString(0); if("take_photo".equals(modifier)) capture(); else if("switch_camera".equals(modifier)) switchCamera(); } else{ Log.i("ppp","result error"); } } catch(Exception e) { e.printStackTrace(); } break; case MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA: if(mIsRecording) break; RectF rect = (RectF) msg.obj; mLeft = rect.left; mRight = rect.right; mTop = rect.top; mBottom = rect.bottom; float centerx = mLeft +(mRight - mLeft)/2; float centery = mTop + (mBottom-mTop)/2; String promptString = ""; if(centerx<mScreenCenterx && Math.abs(mScreenCenterx-centerx) >100) promptString = "位置偏左,"; else if((centerx > mScreenCenterx)&& (Math.abs(centerx -mScreenCenterx)>100)) promptString = "位置偏右,"; if((centery < mScreenCentery)&&( Math.abs(mScreenCentery-centery) >200)) { if("".equals(promptString)) promptString = "位置偏上"; else promptString += "而且偏上"; } else if((centery > mScreenCentery)&& (Math.abs(centery -mScreenCenterx)>200)) { if("".equals(promptString)) promptString = "位置偏下"; else promptString += "而且偏下"; } if("".equals(promptString)) { promptString = "位置已經居中,能夠拍照了"; mIsCenter = true; } else { mIsCenter = false; } ITtsListener ttsListener = new ITtsListener() { @Override public void onPlayEnd() { if(mIsCenter) { if(mOlamiVoiceRecognizer != null) mOlamiVoiceRecognizer.start(); } } @Override public void onPlayFlagEnd(String arg0) { } @Override public void onTTSPower(long arg0) { } }; TtsPlayer.playText(MainActivity.this, promptString, ttsListener,Tts.TTS_SYSTEM_PRIORITY); break; } } }; }
在MessageConst.SERVER_ACTION_RETURN_RESULT消息中,經過解析服務器返回的json字符串,能夠找到modifier這個字段的值,若是是take_photo表示拍照,若是是switch_camera表示切換攝像頭。
當用戶說拍照或者茄子的時候,服務器返回以下json字符串:
[
{
"desc_obj": { "status": 0 }, "semantic": [ { "app": "camera", "input": "拍照", "slots": [ ], "modifier": [ "take_photo" ], "customer": "58df512384ae11f0bb7b487e" } ], "type": "camera"
}
]
這個拍照,茄子等語法都是本身定義的,詳細請看:
olami開放平臺語法編寫簡介:http://blog.csdn.net/ls0609/a...
olami開放平臺語法官方介紹:https://cn.olami.ai/wiki/?mp=...
• 2.人臉識別FaceView.java
public class FaceView extends View {
private Camera.Face[] mFaces; private Paint mPaint; private Matrix matrix = new Matrix(); private RectF mRectF = new RectF(); private Handler mHandler; private long mCurrentTime; public void setFaces(Camera.Face[] faces) { mFaces = faces; invalidate(); } public FaceView(Context context) { super(context); init(context); } public FaceView(Context context, AttributeSet attrs) { super(context, attrs); init(context); } public FaceView(Context context, AttributeSet attrs, int defStyleAttr) { super(context, attrs, defStyleAttr); init(context); } public void init(Context context) { mPaint = new Paint(); mPaint.setColor(Color.RED); mPaint.setStrokeWidth(5f); mPaint.setStyle(Paint.Style.STROKE); } public void setHandler(Handler handler) { mHandler = handler; } @Override protected void onDraw(Canvas canvas) { super.onDraw(canvas); if (mFaces == null || mFaces.length < 0) { return; } //準備矩形框 MainActivity.prepareMatrix(matrix, false, 270, getWidth(), getHeight()); canvas.save(); matrix.postRotate(0); canvas.rotate(-0); RectF tempRectF = new RectF(); long tempTime = System.currentTimeMillis(); for (int i = 0; i < mFaces.length; i++) { mRectF.set(mFaces[i].rect);//獲取face矩形框值 float temp = mRectF.top; mRectF.top = -mRectF.bottom; mRectF.bottom = - temp; //上下交換 matrix.mapRect(mRectF); canvas.drawRect(mRectF, mPaint);//繪製矩形框 tempRectF.set(mRectF); if((mCurrentTime == 0) ||((tempTime-mCurrentTime)/1000) >= 4) {//超過4秒,發送一次識別face矩形框值 mHandler.sendMessage(mHandler.obtainMessage( MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA, tempRectF)); mCurrentTime = tempTime; } Log.i("ppp","mRectF.left = "+mRectF.left+" mRectF.right = "+mRectF.right); } canvas.restore(); }
}
自定義FaceView中,因爲旋轉了270度,因此須要face矩形框上下值進行交換,否則人臉識別老是左右或者上下不能追蹤。每隔4秒發送一次矩形框的值,在MainActivity.java的handler中收到這個消息並進行是否居中的判斷。
case MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA:
if(mIsRecording) break; RectF rect = (RectF) msg.obj; mLeft = rect.left; mRight = rect.right; mTop = rect.top; mBottom = rect.bottom;//保存上下左右的矩形框值 float centerx = mLeft +(mRight - mLeft)/2;//獲取矩形框橫向中心點位置 float centery = mTop + (mBottom-mTop)/2;//獲取矩形框縱向中心點位置 String promptString = ""; if(centerx<mScreenCenterx && Math.abs(mScreenCenterx-centerx) >100) promptString = "位置偏左,"; else if((centerx > mScreenCenterx)&& (Math.abs(centerx -mScreenCenterx)>100)) promptString = "位置偏右,"; if((centery < mScreenCentery)&&( Math.abs(mScreenCentery-centery) >200)) { if("".equals(promptString)) promptString = "位置偏上"; else promptString += "而且偏上"; } else if((centery > mScreenCentery)&& (Math.abs(centery -mScreenCenterx)>200)) { if("".equals(promptString)) promptString = "位置偏下"; else promptString += "而且偏下"; } if("".equals(promptString)) { promptString = "位置已經居中,能夠拍照了"; mIsCenter = true; } else { mIsCenter = false; } ITtsListener ttsListener = new ITtsListener() { @Override public void onPlayEnd() { if(mIsCenter) { if(mOlamiVoiceRecognizer != null) mOlamiVoiceRecognizer.start(); } } @Override public void onPlayFlagEnd(String arg0) { } @Override public void onTTSPower(long arg0) { } }; TtsPlayer.playText(MainActivity.this, promptString, ttsListener,Tts.TTS_SYSTEM_PRIORITY);
break;
能夠得到屏幕的中心點和人臉識別的矩形框的中心點,對比橫向和縱向的中心點大小和絕對值差,當橫向的值差100像素以上就認爲橫向不居中,而且根據大小分居左和居右,縱向大小差值在200像素以上認爲縱向不居中,而且根據大小分偏上和偏下,這個100,200像素值用戶能夠本身調節到合適的值。
調用TtsPlayer.playText提示,當播報結束後回調到onPlayEnd() ,若是居中那麼已經提示用戶能夠拍照了,此時啓動錄音程序,用戶不用點擊button也不用喚醒,只許說拍照或者茄子就能夠拍照了。
6.源碼下載連接
https://pan.baidu.com/s/1qXITWs8
7.相關連接
語音在線聽書:http://blog.csdn.net/ls0609/a...
語音記帳demo:http://blog.csdn.net/ls0609/a...
olami開放平臺語法編寫簡介:http://blog.csdn.net/ls0609/a...
olami開放平臺語法官方介紹:https://cn.olami.ai/wiki/?mp=...