字符集與排序規則概念html
在數據庫當中都有字符集和排序規則的概念, 不少開發人員甚至包括有些DBA都會將這個混淆,固然這個狀況也有一些情有可原的緣由。一來二者原本就是相輔相成,相互依賴關聯; 另一方面, 有些數據庫並無清晰的區分開二者。例如,SQL Server中字符集和排序規則就是合在一塊兒的,建立一個新的數據庫,只有一個Collation給你選擇,並無字符集選項概念,實際上你在選擇一個Collatin時,就選定了數據庫的字符集和排序規則,例如Chinese_PRC_CI_AS。在MySQL中,字符集和排序規則是區分開來的,你須要單獨設置字符集和排序規則。固然MySQL字符集和排序規則也是相關聯的。除非特殊需求,只要設置其一便可。設置字符集,即設置了默認的排序規則。java
咱們先來搞清楚字符、字符集與字符編碼的概念。相信不少人都在這些概念上犯過迷糊。什麼是字符呢? 什麼是字符集呢,什麼有是字符編碼呢? mysql
字符(Charcter)是文字與符號的總稱,包括文字、圖形符號、數學符號等。26個英文字母屬於字符,每一個漢字也屬於一個字符。sql
字符集是一組抽象的字符(Charcter)組合的集合。舉一個例子,全部的漢字就算一個「字符集合」, 全部的英語字母也算一個「字符集合」。 注意,我這裏說它們是字符集合,並且還有雙引號。是由於字符集並不簡單的是字符的集合, 準確概述來講,字符集是一套符號和編碼的規則。 字符集須要以某種字符編碼方式來表示、存儲字符。咱們知道,計算機內部,全部信息最終都是一個二進制值。每個二進制位(bit)有0和1兩種狀態。而若是用不一樣的0和1組合表示不一樣的字符就是編碼。數據庫
關於字符編碼,咱們知道字符最終是以二進制形式存儲在磁盤的,這也是爲何要有字符編碼的緣由,由於計算機最終都要以二進制形式存儲,那麼編碼規則就是用什麼樣的二進制來表明這個字符。例如,咱們所熟知的ASCII碼錶中,01000011這個二進制對應的十進制是67,它表明的就是英語字母C。準確概述來講,字符編碼方式是用一個或多個字節的二進制形式表示字符集中的一個字符。每種字符集都有本身特有的編碼方式,所以同一個字符,在不一樣字符集的編碼方式下,可能會產生不一樣的二進制形式。服務器
另外,字符集合只是指定了一個集合中有哪些字符,而字符編碼,是爲這個集合中全部字符定義相關編號,而字符集(注意與字符集合的區別)是字符和集合與編碼規則的混合體,這也是有時候編碼方案表明字符集的緣由。session
說了這麼多,相信有些人不能區分UTF8 與 Unicode,例如咱們鏈接MySQL的字符串,這裏面就會包含字符編碼與字符集。app
<string value="jdbc:mysql://192.168.xxx.xxx/TEST?useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull"/>dom
那麼Unicode與UTF-8 、UTF-16 、UTF-32是什麼關係?ide
Unicode(統一碼、萬國碼、單一碼)是一種字符集,Unicode是國際組織制定的能夠容納世界上全部文字和符號的字符編碼方案。Unicode用數字0-0x10FFFF來映射這些字符,最多能夠容納1114112個字符,或者說有1114112個碼位。UTF-8、UTF-16、UTF-32都是將數字轉換到程序數據的編碼方案。.在Unicode中:漢字「中」對應的數字是20013。咱們能夠用:UTF-8、UTF-16、UTF-32表示這個數字,將數字20013存儲在計算機中。UTF-8對應是:E4B8AD,UTF-16對應是:FEFF4E2D,UTF-32對應是:0000FEFF00004E2D。簡單來講,UTF-8、UTF-16、UTF-32是Unicode碼一種實現形式,都是屬於Unicode編碼。
在MySQL中,常見的幾個字符集有latin1、GBK、GB2312、BIG5、UTF8、UTF8MB4、UTF16、UTF32等。
而MySQl的排序規則(collation),通常指對字符集中字符串之間的比較、排序制定的規則, MySLQ排序規則特徵:
o 兩個不一樣的字符集不能有相同的校對規則;
o 每一個字符集有一個默認校對規則;
o 存在校對規則命名約定:以其相關的字符集名開始,中間包括一個語言名,而且以_ci(大小寫不敏感)、_cs(大小寫敏感)或_bin(二元)結束。
其實對於排序規則的細節問題,咱們關注較少,反而對排序規則中是否涉及大小寫敏感關注較多。 例如,系統使用utf8字符集,若使用utf8_bin校對規則執行SQL查詢時區分大小寫,使用utf8_general_ci不區分大小寫(默認的utf8字符集對應的校對規則是utf8_general_ci)。
MySQL字符集的分類
MySQL數據庫的相關字符集設置至關靈活和複雜(靈活性過高,就會引發複雜性),要搞清、弄懂這些概念還真須要花一點時間。這個也是不少人遭遇中文亂碼的真正緣由。具體來講,MySQL的字符集有分層的、靈活的特色。若是沒有指定字段的字符集,那麼就默認使用當前表的字符集,若是沒有指定當前表的字符集,那麼就會默認使用當前數據庫的字符集.... 要了解不一樣字符集的分類,咱們先從MySQL的系統變量(字符集相關的系統變量)開始
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
character_set_client 客戶端數據使用的字符集
MySQL Client發送給mysqld的語句或數據使用字符集。
The character set for statements that arrive from the client. The session value of this variable is set using the character set requested by the client when the client connects to the server. (Many clients support a --default-character-set option to enable this character set to be specified explicitly. See also Section 10.1.4, 「Connection Character Sets and Collations」.) The global value of the variable is used to set the session value in cases when the client-requested value is unknown or not available, or the server is configured to ignore client requests:
The client is from a version of MySQL older than MySQL 4.1, and thus does not request a character set.
The client requests a character set not known to the server. For example, a Japanese-enabled client requests sjis when connecting to a server not configured with sjis support.
mysqld was started with the --skip-character-set-client-handshake option, which causes it to ignore client character set configuration. This reproduces MySQL 4.0 behavior and is useful should you wish to upgrade the server without upgrading all the clients.
ucs2, utf16, utf16le, and utf32 cannot be used as a client character set, which means that they also do not work for SET NAMES or SET CHARACTER SET.
character_set_connection 鏈接層字符集
其實不少人對這個字符集一臉懵逼,這個字符集與character_set_client有啥區別呢? 這個字符集用於沒有introducer修飾的字符串和數字到字符串的轉換。
由introducer修飾的文本字符串在請求過程當中不通過多餘的轉碼,直接轉換爲內部字符集處理。
The character set used for literals that do not have a character set introducer and for number-to-string conversion. For information about introducers, seeSection 10.1.3.8, 「Character Set Introducers」.
character_set_database 數據庫字符集
MySQL能夠給實例下不一樣數據庫單獨設置各自的字符集。這個跟SQL Server是相似的。
The character set used by the default database. The server sets this variable whenever the default database changes. If there is no default database, the variable has the same value as character_set_server.
character_set_filesystem
The file system character set. This variable is used to interpret string literals that refer to file names, such as in the LOAD DATA INFILE and SELECT ... INTO OUTFILE statements and the LOAD_FILE() function. Such file names are converted from character_set_client to character_set_filesystem before the file opening attempt occurs. The default value is binary, which means that no conversion occurs. For systems on which multibyte file names are permitted, a different value may be more appropriate. For example, if the system represents file names using UTF-8, set character_set_filesystem to 'utf8'.
文件系統字符集。 該變量用於解釋引用文件名的字符串文字,例如在LOAD DATA INFILE和SELECT ... INTO OUTFILE語句和LOAD_FILE()函數中。 在文件打開嘗試發生以前,這樣的文件名將從character_set_client轉換爲character_set_filesystem。 默認值爲二進制,這意味着不會發生轉換。 對於容許多字節文件名的系統,不一樣的值可能更合適。例如,若是系統使用UTF-8表示文件名,則將character_set_filesystem設置爲「utf8」。
character_set_results 查詢結果字符集
mysqld 在返回查詢結果集或者錯誤信息到客戶端時,使用的編碼字符集
The character set used for returning query results such as result sets or error messages to the client.
character_set_server 服務器字符集,默認的字符集。
服務器級別(實例級別) 的字符集。若是建立數據庫時,不指定字符集,那麼就會默認使用服務器的編碼字符集。
The server's default character set.
character_set_system 系統元數據字符集
它是系統元數據(表名、字段名等)存儲時使用的編碼字符集,該字段和具體存儲的數據無關。老是固定不變的UTF8字符集。
The character set used by the server for storing identifiers. The value is always utf8.
另外,以前的版本還有default-character-set,MySQL 5.5版本開始,移除了參數default_character_set 取而代之的是參數character_set_server。
character_set_client、character_set_connection、character_set_results這3個參數值是由客戶端每次鏈接進來設置的,和服務器端不要緊。MySQL會存在不一樣字符集的轉換過程,
MySQL支持的字符集
MySQL不一樣版本支持的字符集有所不一樣,你可使用命令show charset或show character set來查看當前MySQL版本支持的字符集。
mysql> show charset;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
+----------+-----------------------------+---------------------+--------+
40 rows in set (0.00 sec)
第一列表示字符集、 第二列表示字符集描述、第三列表示默認排序規則、第四列表示字符集的一個字符佔用的最大字節數。固然你也可使用下面SQL語句查詢,效果是同樣的。
mysql> select * from information_schema.character_sets;
MySQL字符集的查看
查看MySQL當前字符集
可使用show variables like '%character%'查看相關字符集。
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
查看客戶端使用的字符集
mysql> show variables like '%character_set_client%';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | utf8 |
+----------------------+-------+
1 row in set (0.00 sec)
查看鏈接層字符集
mysql> show variables like 'character_set_connection';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| character_set_connection | utf8 |
+--------------------------+-------+
1 row in set (0.00 sec)
查看MySQL查詢結果字符集
mysql> show variables like 'character_set_results';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| character_set_results | utf8 |
+-----------------------+-------+
1 row in set (0.01 sec)
查看MySQL服務器字符集
mysql> show variables like 'character_set_server';
+----------------------+--------+
| Variable_name | Value |
+----------------------+--------+
| character_set_server | latin1 |
+----------------------+--------+
1 row in set (0.00 sec)
mysql> status;
查看MySQL數據庫的字符集
mysql> use YourDB;
mysql> show variables like 'character_set_database'
&
mysql> status;
注意:上面這些命令是查看當前數據庫的字符集。
mysql>show create database dbname --dbname爲你要查看的數據庫。
mysql> show create database MyDB;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| MyDB | CREATE DATABASE `MyDB` /*!40100 DEFAULT CHARACTER SET utf8mb4*/ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
要查看當前MySQL實例下面全部數據庫的字符集和排序規則,可使用下面腳本
SELECT SCHEMA_NAME,DEFAULT_CHARACTER_SET_NAME,DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA ;
查看MySQL表的字符集
方式1: show create table xxxx;
方式2: 查看INFORMATION_SCHEMA.TABLES下的TABLE_COLLATION,從而推斷表的字符集
SELECT TABLE_SCHEMA, TABLE_NAME,TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
SELECT TABLE_SCHEMA, TABLE_NAME,TABLE_COLLATION
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME='TEST';
查看MySQL字段的字符集
以下所示,若是在建立表的時候已經指定了字段使用的字符集,那麼show create table xxx 就能看到字段使用字符集,若是沒有顯示指定字段的字符集,show create table xxx 看不到其字符集,其實,這表示字段就會默認使用表的字符集。
mysql> drop table if exists test;
Query OK, 0 rows affected (0.05 sec)
mysql> create table test(name1 varchar(10) character set gbk, name2 varchar(10));
Query OK, 0 rows affected (0.06 sec)
mysql> show create table test;
+-------+---------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------+
| test | CREATE TABLE `test` (
`name1` varchar(10) CHARACTER SET gbk DEFAULT NULL,
`name2` varchar(10) COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+---------------------------------------------------+
1 row in set (0.00 sec)
mysql> drop table if exists test;
Query OK, 0 rows affected (0.01 sec)
mysql> create table test(name varchar(10));
Query OK, 0 rows affected (0.03 sec)
mysql> show create table test;
+-------+------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
mysql> show full columns from test;
+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| name | varchar(10) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
+-------+-------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
如何修改MySQL字符集
1:修改MySQL字段的字符集
修改字段的字符集語法以下所示,固然,成功的修改字符集是有限制的,具體參考10.1.7 Column Character Set Conversion,修改前最好作好備份,充分測試。
ALTER TABLE xxx MODIFY xxx VARCHAR(50) CHARACTER SET UTF8;
2:修改MySQL表的字符集
mysql> create table test(name varchar(10));
Query OK, 0 rows affected (0.03 sec)
mysql> show create table test;
+------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(10) COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> alter table test charset=gbk;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table table_name character set xxx;
注意:上面命令只修改表的字符集,影響後續該表新增列的默認定義,已有列的字符集不受影響。
同時修改表字符集和已有列字符集,並將已有數據進行字符集編碼轉換。可使用相似下面腳本。
mysql> alter table table_name convert to character set xxx;
3:修改MySQL數據庫字符集
alter database database_name character set xxx;
注意:只修改庫的字符集,影響後續建立的表的默認定義;對於已建立的表的字符集不受影響。
4:修改系統變量character_set_database
mysql> set character_set_database=utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> set global character_set_database=utf8mb4;
Query OK, 0 rows affected (0.00 sec)
5:修改MySQL服務器字符集
mysql> set global character_set_server=utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'character_set_server';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
1 row in set (0.00 sec)
注意,上述命令只對當前環境生效,若是沒有在my.cnf設置系統變量character_set_server,那麼MySQL服務重啓後,就會失效。因此通常應該在my.cnf配置文件設置系統變量character_set_server。對於系統變量character_set_database也是如此。
6: 修改客戶端字符集(character_set_client、character_set_results、character_set_connection)。
mysql> show variables like 'character_set_client';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | utf8 |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> set character_set_client=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+--------+
| Variable_name | Value |
+----------------------+--------+
| character_set_client | latin1 |
+----------------------+--------+
1 row in set (0.00 sec)
mysql>
set character_set_client = utf8;
set character_set_results = utf8;
set character_set_connection = utf8;
另外,SET NAMES 'charset_name' [COLLATE 'collation_name'] 至關於SET character_set_client = charset_name; SET character_set_results = charset_name; SET character_set_connection = charset_name;
mysql> show variables like 'character_set_client';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | utf8 |
+----------------------+-------+
1 row in set (0.01 sec)
mysql> show variables like 'character_set_results';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| character_set_results | utf8 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'character_set_connection';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| character_set_connection | utf8 |
+--------------------------+-------+
1 row in set (0.00 sec)
mysql> set names 'utf8mb4';
Query OK, 0 rows affected (0.02 sec)
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql>
MySQL字符集選擇
通常而言,咱們可能選擇utf8mb4這個字符集,而不選擇utf8. 這個是由於MySQL的utf8並非真正的UTF8字符集,MySQL的utf8字符編碼只有三個字節,節省空間但不能表達所有的UTF-8,只能支持「基本多文種平面」(Basic Multilingual Plane,BMP),而utf8mb4纔是真正的支持UTF8編碼,網上有篇文章專門介紹這個。 通常而言,咱們會選擇utf8mb4,而不會選擇gb2312、gbk。 對於gb2312而言,有些偏僻字(例如:洺)不能保存。gbk是中文字符編碼是雙字節的。雖然節省空間,可是有可能帶來一些其餘問題。在當前環境下,相信存儲空間對於絕大部分公司來講都不是什麼問題。
MySQL的排序規則
MySQL排序規則的查看、設置比較簡單,這裏就不作展開介紹了。
mysql> show collation;
mysql> show variables like 'collation_%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
MySQL出現亂碼的緣由
爲何會出現亂碼呢? 這個是咱們常常遇到的問題。要說清楚亂碼產生的緣由。以下圖所示,咱們簡單的
對於數據輸入而言:
1. 在客戶端對相關數據進行編碼。
2. MySQL接收到請求時,它會詢問客戶端經過什麼方式對字符編碼:客戶端經過character_set_client系統變量告知MySQL客戶端的編碼方式,當MySQL發現客戶端的client所傳輸的字符集與本身的connection不同時,它會將請求數據從character_set_client轉換爲character_set_connection;
3. 進行內部操做前會將請求數據從character_set_connection轉換爲內部操做字符集:在存儲的時候會判斷編碼是否與內部存儲字符集(按照優先級判斷字符集類型,以下所示)上的編碼一致,若是不一致須要轉換,其流程以下:
• 使用每一個數據字段的CHARACTER SET設定值;
• 若上述值不存在,則使用對應數據表的DEFAULT CHARACTER SET設定值(MySQL擴展,非SQL標準);
• 若上述值不存在,則使用對應數據庫的DEFAULT CHARACTER SET設定值;
• 若上述值不存在,則使用character_set_server設定值。
對於數據輸出而言:
客戶端使用的字符集必須經過character_set_results來體現,服務器詢問客戶端字符集,經過character_set_results將結果轉換爲與客戶端相同的字符集傳遞給客戶端。(character_set_results默認等於character_set_client)
下面咱們以某一個漢字來講明如何產生亂碼的,例如「華」字,它的不一樣編碼以下(http://mytju.com/classcode/tools/encode_gb2312.asp)
Unicode編碼:0000534E 十進制:21326
UTF8編碼 :E58D8E
UTF16編碼:FEFF534E
UTF32編碼:0000FEFF0000534E
GBK編碼: BBAA
若是「華「字是以UTF8編碼存儲的,值爲E58D8E, 佔3個字節,可是轉換爲latin1編碼的時候(latin1編碼是1個字節一個字符),就會亂碼了,以下所示:
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> create database MyDB default character set utf8;
Query OK, 1 row affected (0.00 sec)
mysql> use MyDB;
Database changed
mysql> create table test(name varchar(12));
Query OK, 0 rows affected (0.04 sec)
mysql> insert into test value('華');
Query OK, 1 row affected (0.00 sec)
mysql> select * from test;
+------+
| name |
+------+
| 華 |
+------+
1 row in set (0.00 sec)
若是我使用客戶端工具EMS MySQl鏈接數據庫,若是系統變量character_set_results爲latin1,此時,你會發現「華」字變成亂碼。以下所示。
另外,關於編碼引發的亂碼、解碼引發的亂碼以及缺乏某種字體庫引發的亂碼,能夠參考「常見亂碼問題分析和總結」
因此避免亂碼的關鍵因素,就是避免不一樣層級之間的編碼不一致,出現編碼轉換,從而致使出現亂碼。因此統一各個層級的編碼,就能很大程度避免亂碼。
參考資料:
https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html
https://dev.mysql.com/doc/refman/5.7/en/charset.html
https://dev.mysql.com/doc/refman/5.7/en/charset-introducer.html
https://dev.mysql.com/doc/refman/5.7/en/charset-collation-names.html
https://www.ibm.com/developerworks/cn/java/analysis-and-summary-of-common-random-code-problems/index.html