【項目】——智能AI語音管家tamako

項目簡介

使用C++編寫一個智能AI對話和語音命令執行的語音管理工具
藉助圖靈機器人和百度語音識別和合成等第三方平臺和第三方工具
可執行Linux下相關指令，可本身添加想要執行的指令

項目技術點

C++ STL
http第三方庫
圖靈機器人
百度語音識別和語音合成
Linux系統/網絡編程
各類第三方庫和第三方工具的安裝與使用

項目框架

開始編碼

我按照程序流程來介紹這個項目，包括要提早準備的各類第三方庫等。（注：介紹的順序並非真實編碼順序）linux

使用錄音工具對用戶的話進行錄音

使用arecord進行語音錄製，arecord系統默認自帶，能夠直接輸入arecord看看你的平臺有沒有安裝，若是沒有，就會告訴你「bash: XXX: command not found...」，網上查閱進行安裝便可。根據百度語音識別技術文檔知錄音時採用何種格式。ios

# arecord -t wav -c 1 -r 16000 -d 5 -f S16_LE demo.wav 
// -t: 設置文件類型,咱們採用wav格式
// -c: 設置通道號 // -r: 設置頻率
// -d: 設置持續時間,單位爲秒 
// -f: 設置採樣格式.格式包括:S8  U8  S16_LE（S16_LE: little endian signed 16 bits） 
//                       S16_BE  U16_LE U16_BE  S24_LE S24_BE U24_LE U24_BE S32_LE S32_BE 
//                       U32_LE U32_BE FLOAT_LE  FLOAT_BE  FLOAT64_LE  FLOAT64_BE   
//                       IEC958_SUBFRAME_LE IEC958_SUBFRAME_BE MU_LAW A_LAW IMA_ADPCM MPEG GSM 
//咱們爲什麼這樣採用，主要是由於百度語音識別的要求，具體能夠參照百度語音識別文檔

百度語音識別

咱們須要作的是將錄好的音頻發送給百度語音識別，而後獲取語音識別的結果。git

1.那麼首先咱們須要在代碼中實現錄音功能，也就是須要在代碼中實現執行Linux指令的功能，咱們使用popen函數來實現這點。github

//執行linux命令
156         bool Exec(string command,bool is_print)
157         {
158             FILE* fp=popen(command.c_str(),"r");
159             if(fp==nullptr)
160             {
161                 cout<<"popen erroe"<<endl;
162                 return false;
163             } 
164                 if(is_print)
165                 {
166                     char c;
167                     size_t s=0;
168                     while((s=fread(&c,1,1,fp)>0))
169                     {
170                         cout<<c;
171                     }
172                 } 
173             pclose(fp);
174             return true;
175         }

2.接着咱們須要與百度語音識別平臺通訊，閱讀官方文檔，根據要求安裝各類第三方庫，因爲是在Linux系統開發，我在更新cmake時發現根本下不動，而後在網上查緣由，各類解決方案都嘗試了，包括更換yum源，更換ipv4的地址爲ipv6的等，折騰了將近兩三個小時也沒解決，最後把wifi網絡換成了手機流量就行了。。。編程

3.接着根據官方文檔，咱們建立client來與其通訊。json

class yuzi
 92 {
 93     private:
 94         turing tr;
 95         aip::Speech *client;
 96         string appid="21527483";
 97         string apikey="szxkTzGtRhGIGl2MMoE84qIw";
 98         string secretKey="hZppPdvzecr8WSPSBfh9gPxP6VgB6lqO";

4.根據語音識別接口說明，咱們編寫語音識別的函數。須要注意的是：因爲返回的是jason串，所以咱們須要對結果進行反序列化。剛開始我並無學過jason串，所以還學習瞭如何進行jason的序列化與反序列化。api

//語音識別
190         string ASR(aip::Speech* client)
191         {
192             string asr_file=VOICE_PATH;
193             asr_file+="/";
194             asr_file+=SPEECH_ASR;
195             
196             map<string,string> options;
197             string file_content;
198             aip::get_file_content(asr_file.c_str(),&file_content);
199             Json::Value root=client->recognize(file_content,"wav",16000,options);
200             return RecognizePickup(root);
201         }

//反序列化
178         string RecognizePickup(Json::Value& root)
179         {
180             int err_no=root["err_no"].asInt();
181             if(err_no!=0)
182             {
183                 cout<<root["err_msg"]<<":"<<err_no<<endl;
184                 return "unknown";
185             }
186             return root["result"][0].asString();
187         }

判斷是否爲命令

想要使這個程序能夠執行Linux指令，咱們須要建立一個命令文件，其中存放中文命令(key)及Linux命令(value)這樣的kv鍵值對，而後在程序中遍歷這個文件，將其存放在map中，這樣就能判斷一條語句是否爲命令而且經過以前的Exec函數執行它。當時在運行這段代碼時，發現說出了對應的Key值，可是依舊沒有執行相關命令，發現是由於百度語音識別返回的語句結尾帶有「。」，所以將key+="。"解決了問題。bash

代碼以下：網絡

//加載配置文件
110         void LoadCommandEtc()
111         {
112             LOG(Normal,"命令開始執行");
113             string name=CMD_ETC;
114             ifstream in(name);
115             if(!in.is_open())
116             {
117                 LOG(Warning,"Load command etc error");
118                 exit(1);
119             }
120             char line[SIZE];
121             string sep=": ";
122             while(in.getline(line,sizeof(line)))
123             {
124                 string str=line;
125                 size_t pos=str.find(sep);
126                 if(pos==string::npos)
127                 {
128                     LOG(Warning,"command etc format error");
129                     break;
130                 }
131                 string key=str.substr(0,pos);
132                 string value=str.substr(pos+sep.size());
133                 key+="。";//語音識別可以成功
134                 record_set.insert({key,value});                                                                                                                        
135             }
136             in.close();
137             LOG(Normal,"命令執行成功");
138         }

//判斷是不是命令
229         bool IsCommand(string& message)
230         {
231             return record_set.find(message)==record_set.end()? false:true;
232         }

void run()
235         {
236             while(1)
237             {  
238                 LOG(Normal,"開始錄音：");
239                 fflush(stdout);
240                 if(Exec(record,false))
241                 {
242                     LOG(Normal,"識別中...");
243                     fflush(stdout);
244                     string message =ASR(client);
245                     cout<<endl;
246                     LOG(Normal,message);
247                     if(IsCommand(message))
248                     {
249                         //是命令
250                         LOG(Normal,"運行一個指令");
251                         Exec(record_set[message],true);
252                         continue;
253                     }

不是命令，將信息發送給圖靈機器人

經過閱讀圖靈機器人api接入文檔，咱們構建turning類。咱們直接使用百度語音SDK自帶的http客戶端對圖靈機器人平臺發送消息。因爲文檔要求post請求，因此咱們使用post接口。固然因爲發送的數據都爲jason串，因此咱們須要進行序列化與反序列化。app

代碼以下：

class turing                                                                                                                                                           
 22 {
 23     private:
 24         string apiKey="4235c17e20a34e3eb3cad76b5f0d0097";
 25         string userId="1";
 26         string url="http://openapi.tuling123.com/openapi/api/v2";
 27         aip::HttpClient client;
 28     public:
 29         turing()
 30         {
 31 
 32         }
 33         
 34         //反序列化
 35         string ResponsePickup(string& str)
 36         {
 37             JSONCPP_STRING errs;
 38             Json::Value root;
 39             Json::CharReaderBuilder rb;
 40             std::unique_ptr<Json::CharReader> const jsonReader(rb.newCharReader());
 41             bool res=jsonReader->parse(str.data(),str.data()+str.size(),&root,&errs);
 42             if(!res || !errs.empty())
 43             {
 44                 LOG(Warning,"jsoncpp parse error");
 45                 return errs;
 46             }
 47             Json::Value results = root["results"];
 48             Json::Value values  = results[0]["values"];
                return values["text"].asString();
 50         }
 51 
 52         //序列化
 53         string chat(string message)
 54         {
 55             Json::Value root;
 56             root["reqType"]=0;
 57             Json::Value word;
 58             word["text"]=message;
 59             Json::Value text;
 60             text["inputText"]=word;
 61             root["perception"]=text;
 62             Json::Value user;
 63             user["apiKey"]=apiKey;
 64             user["userId"]=userId;
 65             root["userInfo"]=user;
 66 
 67             Json::StreamWriterBuilder wb;
 68             std::ostringstream os;
 69 
 70             std::unique_ptr<Json::StreamWriter> jsonWriter(wb.newStreamWriter());
 71             jsonWriter->write(root,&os);
 72             string body=os.str();//有了json串
 73            
 74             //發送http請求
 75             string response;
 76             int code= client.post(url,nullptr,body,nullptr,&response);                                                                                                 
 77             if(code!=CURLcode::CURLE_OK)
 78             {
 79                  LOG(Warning,"http 請求錯誤!");
 80                  return "";
 81             }
                return ResponsePickup(response);
 83         }
 84 
 85         ~turing()
 86         {
 87 
 88         }
 89 };

返回的信息發送給百度語音合成

語音合成與語音識別相似，第三方庫並無什麼區別，一樣直接使用自帶的client就行，根據官方文檔給的接口說明咱們照瓢畫葫。須要注意的是，必定要先領取語音識別以及語音合成的免費額度，我剛開始就是由於沒有領取致使一直返回錯誤碼，查錯誤碼文檔也一直沒找到那個錯誤碼對應的錯誤信息，最後終於發現是由於沒領取免費額度。

代碼以下：

//語音合成
204         void TTL(aip::Speech *client,string &str)
205         {
206             ofstream ofile;
207             string ttl=VOICE_PATH;
208             ttl+="/";
209             ttl+=SPEECH_TTL;
210             ofile.open(ttl.c_str(),ios::out | ios::binary);
211             string file_ret;
212             map<string,string> options;
213             options["spd"]="6";
214             options["per"]="4";
215 
216             Json::Value result =client->text2audio(str,options,file_ret);
217             if(!file_ret.empty())
218             {
219                 ofile<<file_ret;
220             }
221             else
222             {
223                 cout<<result.toStyledString()<<endl;
224             }
225             ofile.close();
226         }

播放返回的錄音

咱們經過Exec函數播放對應的錄音便可，至此整個程序就完成了，如下爲效果演示，固然因爲沒法錄製視頻，因此沒法展示語音效果。

總結

總的來講，這個項目代碼並不複雜，主要在於學會各個工具以及第三方平臺的使用，是一個很好玩的項目。

項目源碼

https://github.com/hu1277141113/C-study

【項目】——智能AI語音管家tamako

項目簡介

項目技術點

項目框架

開始編碼

使用錄音工具對用戶的話進行錄音

百度語音識別

判斷是否爲命令

不是命令，將信息發送給圖靈機器人

返回的信息發送給百度語音合成

播放返回的錄音

總結

項目源碼