不修改Mysql 服務器字符集(character_set_server=utf8mb4)的前提下,使用Jade插入Emoji字符.html
Mysql服務器字符集設置:java
mysql> show variables like 'character%'; +--------------------------+---------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/server-5.6/share/charsets/ | +--------------------------+---------------------------------------+ mysql> show create table t\G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `data` varchar(10) CHARACTER SET utf8mb4 DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
DAO:
mysql
@DAO(catalog="temp") public interface Utf8mb4TestDAO { @SQL("insert into t select :1") public void insertEmoji(String data); @SQL("set names utf8mb4") public void setNamesUtf8mb4(); }
單元測試:sql
Config:服務器
String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&useUnicode=true&characterEncoding=UTF-8", timeout); //String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&characterEncoding=UTF-8", timeout); //這個也行
總結:單元測試
須要在jdbc url中指定characterEncoding爲UTF-8,同時在插入emoji字符前須要先執行:set names utf8mb4測試
其餘失敗狀況補充:ui
若沒有執行setNamesUtf8mb4,僅執行insert的話,會報錯:編碼
Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x81' for column 'data' at row 1
2. 必需要在jdbc url參數中指定characterEncoding=UTF-8,不然雖然能夠成功插入到表中,可是是亂碼。url
3. 將set names與insert放在一條SQL中不支持,以下所示:
@SQL("set names utf8mb4; insert into t select :1") public void insertEmojiWithSetNamesUtf8mb4(String data);
會報錯:
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'insert into t select '????'' at line 1
解決方法:
在jdbc url中添加一個參數:allowMultiQueries=true,以下所示:
String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&characterEncoding=UTF-8&allowMultiQueries=true", timeout);
官方文檔:
allowMultiQueries Allow the use of ';' to delimit multiple queries during one statement (true/false), defaults to 'false', and does not affect the addBatch() and executeBatch() methods, which instead rely on rewriteBatchStatements. Default: false Since version: 3.1.1
其餘補充:
不能在jdbc url中指定characterEncoding=utf8mb4,由於
Connector/J did not support utf8mb4 for servers 5.5.2 and newer.
(摘自:http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-13.html)
不然會報錯:
2015-05-01 16:38:00 ERROR com.alibaba.druid.pool.DruidDataSource create connection error java.sql.SQLException: Unsupported character encoding 'utf8mb4'.
另外Jdbc目前支持的字符集有:
Table 5.3 MySQL to Java Encoding Name Translations
摘自:http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
2. 若Mysql服務器字符集配置爲:character_set_server=utf8mb4,則創建鏈接時會自動執行set names utf8mb4。以下所示:
Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)
(摘自:http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-13.html)
以下所示:
方案二:
直接保存Emoji字符二進制內容,即表字段類型爲blob,以下所示:
CREATE TABLE `t2` ( `data` blob ) ENGINE=InnoDB DEFAULT CHARSET=latin1 mysql> show variables like 'character%'; +--------------------------+---------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/server-5.6/share/charsets/ | +--------------------------+---------------------------------------+
jdbc url:
jdbc:mysql://localhost:3306/temp?connectTimeout=1000&generateSimpleParameterMetadata=true
DAO:
@SQL("insert into t2 select :1") public void insertEmojiAsBlob(byte[] data); @SQL("select data from t2 limit 1") public byte[] getEmojiFromT2();
單元測試:
問題一:
爲何插入emoji字符前前都顯式執行了set names utf8mb4,卻仍須要在jdbc url中顯式指定characterEncoding=UTF-8?
由於若不顯式指定characterEncoding爲UTF-8的話,默認的字符集爲cp1252(由於character_set_server=latin1),這時會經過SingleByteCharsetConverter來對emoji字符編碼,原本一個emoji字符是四個字節卻被編碼成兩個字節,因而最終的效果至關於在命令行中執行以下的命令:
mysql> insert into t select '??';
這樣的話即便執行了set names utf8mb4也無濟於事。相關程序代碼以下所示:
若顯式指定了characterEncoding=UTF-8,
注意:此時不是亂碼,而是正常的四個字節: f0, 9f, 98, 81