任何計算機存儲數據,都須要字符集,由於計算機存儲的數據其實都是二進制編碼,將一個個字符,映射到對應的二進制編碼的這個映射就是字符編碼(字符集)。這些字符如何排序呢?決定字符排序的規則就是排序規則。java
查看內置字符集與比較規則
經過show charset;
命令,能夠查看全部的字符集。 如下僅展現了咱們經常使用的字符集:mysql
+----------+---------------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+---------------------------------+---------------------+--------+ | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | +----------+---------------------------------+---------------------+--------+
ascii
:共收錄128個字符,包括空格、標點符號、數字、大小寫字母和一些不可見字符。因爲總共才128個字符,因此可使用1個字節來進行編碼latin1
:共收錄256個字符,是在ASCII字符集的基礎上又擴充了128個西歐經常使用字符(包括德法兩國的字母),也可使用1個字節來進行編碼。gb2312
: 收錄了漢字以及拉丁字母、希臘字母、日文平假名及片假名字母、俄語西裏爾字母。其中收錄漢字6763個,其餘文字符號682個,兼容ASCII字符集。這是一個變長字符集,若是該字符在ascii
字符集中,則採用1字節編碼,不然採用兩字節。gbk
: GBK是在gb2312
基礎上擴容後的標準。收錄了全部的中文字符。一樣的,這是一個變長字符集,若是該字符在ascii
字符集中,則採用1字節編碼,不然採用兩字節。utf8
和utf8mb4
: 收錄地球上能想到的全部字符,並且還在不斷擴充。這種字符集兼容ASCII字符集,採用變長編碼方式,編碼一個字符須要使用1~4個字節。MySQL
爲了節省空間,其中的utf8
是標準 UTF8 閹割後的,只有1~3字節編碼的字符集,基本包含了全部經常使用的字符。若是還要使用 enoji 表情,那麼須要使用utf8mb4
,這個是完整的 UTF8 字符集。utf16
: 不一樣於utf8
,utf16
用兩個字節或者四個字節編碼字符,能夠理解爲utf8
的不節省空間的一種形式utf32
: 固定用四個字節編碼字符,能夠理解爲utf8
的不節省空間的一種形式
經過查看information_schema.character_sets
表,也能夠看到全部的字符集:sql
mysql> select * from information_schema.character_sets where character_set_name = "utf8"; +--------------------+----------------------+---------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+---------------+--------+ | utf8 | utf8_general_ci | UTF-8 Unicode | 3 | +--------------------+----------------------+---------------+--------+ 1 row in set (0.06 sec)
經過show collation;
命令,能夠查看全部的字符集,咱們這裏來查看utf8mb4
的排序規則:數據庫
mysql> show collation like 'utf8mb4%'; +------------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +------------------------+---------+-----+---------+----------+---------+ | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | +------------------------+---------+-----+---------+----------+---------+ 26 rows in set (0.13 sec)
一樣的,經過查詢information_schema.collations
也能夠:編程
mysql> select * from information_schema.collations where character_set_name = "utf8mb4"; +------------------------+--------------------+-----+------------+-------------+---------+ | COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | +------------------------+--------------------+-----+------------+-------------+---------+ | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | +------------------------+--------------------+-----+------------+-------------+---------+ 26 rows in set (0.11 sec)
- 每一個字符集都有一個默認的排序規則:IS_DEFAULT 爲 Yes。
- 比較規則名稱以與其關聯的字符集的名稱開頭,能夠用經過這個開頭查詢全部的字符集,也能夠查詢
information_schema.collations
精確指定字符集 - 字符集後面跟着的是語言編碼,由於
utf8mb4
包含了全部字符,不一樣國家的文字語言排序確定不同。 - 最後末尾的
ci
表明case insensitive
,大小寫不敏感,全部可能的後綴以下所示:- ai: accent insensitive 不區分重音
- as: accent sensitive 區分重音
- ci: case insensitive 不區分大小寫
- cs: case sensitive 區分大小寫
- bin: binary 以二進制方式比較
應用字符集與比較規則
字符集與比較規則配置有四個級別:服務器
- MySQL實例級別
- 庫級別
- 表級別
- 字段級別 指定的級別粒度越小,則以粒度越小的字符集還有比較規則優先。例如指定MySQL實例級別字符集是
utf8mb4
,指定某個表字符集是latin1
,那麼這個表的全部字段若是不指定的話,編碼就是latin1
因爲字符集和比較規則是互相有聯繫的,若是咱們只修改了字符集,比較規則也會跟着變化,若是隻修改了比較規則,字符集也會跟着變化,具體規則以下:微信
- 只修改字符集,則比較規則將變爲修改後的字符集默認的比較規則。
- 只修改比較規則,則字符集將變爲修改後的比較規則對應的字符集。
實例級別
經過兩個系統變量來指定實例級別的字符集與排序規則。編碼
配置文件:spa
[server] character_set_server=utf8mb4 collation_server=utf8mb4_general_ci
啓動以後,能夠查看並修改這兩個變量。操作系統
mysql> show variables like 'character_set_server'; +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | character_set_server | utf8mb4 | +----------------------+---------+ 1 row in set (0.06 sec) mysql> show variables like 'collation_server'; +------------------+--------------------+ | Variable_name | Value | +------------------+--------------------+ | collation_server | utf8mb4_general_ci | +------------------+--------------------+ 1 row in set (0.05 sec) mysql> set character_set_server = 'utf8mb4'; Query OK, 0 rows affected (0.00 sec) mysql> set collation_server = 'utf8mb4_general_ci'; Query OK, 0 rows affected (0.00 sec)
庫級別
建立數據庫的時候,能夠指定字符集還有排序規則。
mysql> create database test_db character set utf8mb4 collate utf8mb4_general_ci; Query OK, 1 row affected (0.01 sec)
不指定的話,就用實例級別的字符集還有排序規則。
查看當前數據庫的字符集還有排序規則則是經過use
命令指定數據庫以後,查看character_set_database
變量以及collation_database
來實現:
mysql> show variables like 'character_set_database'; +------------------------+---------+ | Variable_name | Value | +------------------------+---------+ | character_set_database | utf8mb4 | +------------------------+---------+ 1 row in set (0.07 sec) mysql> show variables like 'collation_database'; +--------------------+--------------------+ | Variable_name | Value | +--------------------+--------------------+ | collation_database | utf8mb4_general_ci | +--------------------+--------------------+ 1 row in set (0.09 sec)
就算設置這兩個變量,也是無效的:
mysql> set character_set_database = 'utf8'; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'character_set_database'; +------------------------+---------+ | Variable_name | Value | +------------------------+---------+ | character_set_database | utf8mb4 | +------------------------+---------+ 1 row in set (0.09 sec)
修改數據庫的字符集還有排序規則的方式:
mysql> alter database test_db character set = 'utf8'; Query OK, 1 row affected (0.01 sec) mysql> show variables like 'character_set_database'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | character_set_database | utf8 | +------------------------+-------+ 1 row in set (0.08 sec)
這個更新只會對新建的表若是沒指定字符集和排序規則的生效,並不會更新老表的字符集還有排序規則。
表級別
能夠在建立時指定字符集合排序規則,不指定的話,用數據庫的字符集還有排序規則,也能夠修改字符集和排序規則。
mysql> create table test (name varchar(32)) character set utf8mb4 collate utf8mb4_bin; Query OK, 0 rows affected (0.04 sec) mysql> show create table test; +-------+---------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` varchar(32) COLLATE utf8mb4_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin | +-------+---------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.09 sec) mysql> alter table test character set = 'utf8'; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show create table test; +-------+--------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+--------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)
能夠看出,僅僅是表的字符集還有排序規則變了,對於已有字段,並無改變編碼和排序規則。
列級別
能夠在建立表的時候,指定不一樣的列有不一樣的字符集和排序規則,也能夠修改列的字符集和排序規則:
mysql> create table test (name varchar(32) character set utf8 collate utf8_bin) character set utf8mb4 collate utf8mb4_bin; Query OK, 0 rows affected (0.03 sec) mysql> show create table test; +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.09 sec) mysql> alter table test modify column name varchar(32) COLLATE latin1_bin; Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show create table test; +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` varchar(32) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.09 sec)
MySQL客戶端字符編碼問題
有時候,咱們會遇到字符編碼不一致致使的程序問題。例如咱們的 Java 程序,使用 jdbc 連接。讀取的數據,打印出來是亂碼。或者是,MySQL 沒法識別咱們客戶端發來的命令。這涉及到字符編碼問題。咱們須要保持 Java 程序的字符編碼與 JDBC 連接指定的字符編碼一致,這樣纔不會有亂碼的問題。
指定 Java 程序編碼:經過啓動參數:-Dfile.encoding=UTF-8
設置默認的字符編碼(java.nio.charset.Charset.defaultCharset();
)是utf-8
(對應 MySQL 的utf8
還有utf8mb4
)。
指定 JDBC 連接編碼:
jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf8
mysql客戶端命令行指定字符集
mysql -h 127.0.0.1 -P 3306 -u root --default-character-set=utf8mb4 -p
以後查看有關編碼的環境變量,都是和設置的這個字符集同樣。
mysql> SHOW VARIABLES LIKE 'character_set_client'; +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | character_set_client | utf8mb4 | +----------------------+---------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set_connection'; +--------------------------+---------+ | Variable_name | Value | +--------------------------+---------+ | character_set_connection | utf8mb4 | +--------------------------+---------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set_results'; +-----------------------+---------+ | Variable_name | Value | +-----------------------+---------+ | character_set_results | utf8mb4 | +-----------------------+---------+ 1 row in set, 1 warning (0.00 sec)
其中:
character_set_client
: 服務器解碼請求時使用的字符集character_set_connection
:服務器處理請求時將字符集轉換成這個字符集處理。操做具體列時,在轉換爲具體列的編碼。character_set_results
:服務器向客戶端返回數據時使用的字符集
MySQL 設計這三個編碼的時候,出於如下考慮:
- 一個 MySQL,可能有多種不一樣語言和操做系統或者國家的客戶端,因此經過設置
character_set_client
還有character_set_results
進行兼容。 - 因爲操做具體列數據的時候須要編碼轉換,若是
character_set_connection
和字段一致的話,就不用轉換了,因此設置character_set_connection
可讓 MySQL 用一種編碼理解命令統一處理,同時設置character_set_connection
爲最經常使用的能夠減小轉換。
通常狀況下,保持這三個一致就好。咱們就設置好鏈接使用的字符集就好了。
微信搜索「個人編程喵」關注公衆號,每日一刷,輕鬆提高技術,斬獲各類offer: