接着上篇文章繼續講字符集的故事。這一篇文章主要講MYSQL的各個字符集設置,關於基礎理論部分,參考於這裏。html
1. MYSQL的系統變量mysql
– character_set_server:默認的內部操做字符集 linux
– character_set_client:客戶端來源數據使用的字符集 sql
– character_set_connection:鏈接層字符集 數據庫
– character_set_results:查詢結果字符集 ide
– character_set_database:當前選中數據庫的默認字符集 this
– character_set_system:系統元數據(字段名等)字符集 編碼
簡單來講,對於使用MYSQL C API的咱們來講,主要關心的是3個字符集,即character_set_client, character_set_connection和character_set_results。可是從個人使用的角度上來講,總以爲character_set_connection有點多餘。 spa
2. MySQL中的字符集轉換過程 orm
這一節徹底盜版的http://www.laruence.com/2008/01/05/12.html。爲了閱讀起來方便,再貼一遍。
1) MySQL Server收到請求時將請求數據從character_set_client轉換爲character_set_connection;
2) 進行內部操做前將請求數據從character_set_connection轉換爲內部操做字符集,其肯定方法以下:
• 使用每一個數據字段的CHARACTER SET設定值;
• 若上述值不存在,則使用對應數據表的DEFAULT CHARACTER SET設定值(MySQL擴展,非SQL標準);
• 若上述值不存在,則使用對應數據庫的DEFAULT CHARACTER SET設定值;
• 若上述值不存在,則使用character_set_server設定值。
3) 將操做結果從內部操做字符集轉換爲character_set_results。
上面從character_set_connection轉換到內部操做字符集的過程看起來比較複雜,可是若是咱們在MYSQL建表的時候指定了數據表的字符集,就能夠簡單認爲這個「內部操做字符集」就是對應表的字符集。因此說,我比較推薦在建表的時候帶上這句話「DEFAULT CHARSET=xxx」,其中的xxx能夠經過」select character_set_name from information_schema.CHARACTER_SETS」來獲取。建議是」UTF8」。
3. MySQL中的字符集轉換實驗
我這裏的環境是這樣的。
CREATE TABLE `tbl_test` (
`id` int ,
name varchar(20000),
uptime date,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
實驗一:正確地處理中文的過程
這個實驗的大體過程是,
須要注意的點是,我首先將二進制中的硬編碼(utf8格式)的char*串轉換成wchar_t*串,而後調整中文。在出去以前再將wchar_t*串調整爲gbk的char*串。通過試驗,下面的代碼運行正常。
#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/dbcomm/DbComm.h" using namespace std; COMMON::DbLocation dbLocation1; void InsertBySqlStatmentTest1(); int main() { dbLocation1.SetDbId("TEST_DB1"); dbLocation1.SetIp("127.0.0.1"); dbLocation1.SetPort("3306"); dbLocation1.SetUser("cup_dba"); dbLocation1.SetPassword("123456"); InsertBySqlStatmentTest1(); return 0; } void InsertBySqlStatmentTest1() { try { vector<COMMON::DbLocation> dbLocations_array; dbLocations_array.push_back(dbLocation1); dbLocations_array.push_back(dbLocation2); tr1::shared_ptr<COMMON::IDbTasks> mysqlTasks( new COMMON::MysqlDbTasks(dbLocations_array, true) ); mysqlTasks->Connect(); cout << "Connect success" << endl; { COMMON::DbExecuteAction* char_action = mysqlTasks->Execute(); COMMON::ExecuteFilter char_filter("set names utf8"); char_action->Do(&char_filter, &dbLocation1); // change the character_set_client to gbk COMMON::ExecuteFilter char_filter2("SET character_set_client = gbk"); char_action->Do(&char_filter2, &dbLocation1); char_action->EndAction(); } COMMON::DbExecuteAction* insert_action = mysqlTasks->Insert(5000); stringstream ss; ss << "INSERT INTO tbl_test(id, name, uptime) VALUES" << "(" << 100 << "," << "'你好'," << "'20130101')"; string statement = ss.str(); // use mbstowcs to change the sql statement to wide-char-string // we use the default value of fexec-charset, which is utf-8, to compile this file with gcc. setlocale(LC_ALL, "zh_CN.utf8"); size_t wcs_size = mbstowcs(NULL, statement.c_str(), 0); wchar_t* dest = new wchar_t[wcs_size + 1]; wmemset(dest, L'\0', wcs_size + 1); mbstowcs(dest, statement.c_str(), statement.size() * sizeof(char)); // change the last '好' to '饕' wchar_t *tmp = wcsrchr(dest, L'好'); *tmp = L'饕'; // change the sql statement to the charset that corresponds to the character_set_client of mysql setlocale(LC_ALL, "zh_CN.gbk"); size_t mbs_size = wcstombs(NULL, dest, 0); char* buf_mbs = new char [mbs_size + 1]; memset(buf_mbs, '\0', mbs_size + 1); wcstombs(buf_mbs, dest, wcs_size * sizeof(wchar_t)); // try to insert into mysql COMMON::InsertFilter insertFilter(buf_mbs); insert_action->Do(&insertFilter); insert_action->EndAction(); cout << "EndAction success" << endl; mysqlTasks->Disconnect(); cout << "Disconnect success" << endl; } catch (COMMON::ThrowableException& e) { cout << e.What() << endl; } catch (...) { cout << "unknown exception" << std::endl; } }
實驗二:錯誤地處理中文的過程
如今來作一些修改,咱們先把狀況變得簡單一些,咱們不惡意地去set character_set_client=gbk,而是隻運行set names utf8。而後在拿到拼湊好的sql語句的時候,利用string::find方法找到‘你’,而後直接利用結果的數字下標來修改爲‘饕’。具體的代碼以下
#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/dbcomm/DbComm.h" using namespace std; COMMON::DbLocation dbLocation1; void InsertBySqlStatmentTest1(); int main() { dbLocation1.SetDbId("TEST_DB1"); dbLocation1.SetIp("127.0.0.1"); dbLocation1.SetPort("3306"); dbLocation1.SetUser("cup_dba"); dbLocation1.SetPassword("123456"); InsertBySqlStatmentTest1(); return 0; } void InsertBySqlStatmentTest1() { try { vector<COMMON::DbLocation> dbLocations_array; dbLocations_array.push_back(dbLocation1); tr1::shared_ptr<COMMON::IDbTasks> mysqlTasks( new COMMON::MysqlDbTasks(dbLocations_array, true) ); mysqlTasks->Connect(); cout << "Connect success" << endl; { // ************這裏再也不惡做劇地修改character_set_client爲gbk************** COMMON::DbExecuteAction* char_action = mysqlTasks->Execute(); COMMON::ExecuteFilter char_filter("set names utf8"); char_action->Do(&char_filter, &dbLocation1); char_action->EndAction(); } COMMON::DbExecuteAction* insert_action = mysqlTasks->Insert(5000); stringstream ss; ss << "INSERT INTO tbl_test(id, name, uptime) VALUES" << "(" << 100 << "," << "'你好'," << "'20130101')"; // ************直接修改string************** string statement = ss.str(); size_t pos = statement.find('你'); statement[pos] = '饕'; // try to insert into mysql COMMON::InsertFilter insertFilter(statement); insert_action->Do(&insertFilter); insert_action->EndAction(); cout << "EndAction success" << endl; mysqlTasks->Disconnect(); cout << "Disconnect success" << endl; } catch (COMMON::ThrowableException& e) { cout << e.What() << endl; } catch (...) { cout << "unknown exception" << std::endl; } }
結果是,
爲了追尋錯誤的緣由,讓咱們從十六進制的角度來看。
能夠看到,
size_t pos = statement.find('你'); statement[pos] = '饕';
實質只改動了一個字節(utf8編碼,從‘你’的E4BDA0到‘何’的E4BC95,咱們的改動,就是那個95,他是‘饕’的一個字節。)這個現象也符合咱們對於string行爲的認識。
4. 總結和建議