#1, 字符集相關的參數名和概念 mysql
MySQL的字符集設置比較自由。能夠設置不少種組合,相關的變量和參數有: sql
(root@localhost)[sample3]> show global variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec)
其中global variables表示全局變量。也就是默認狀況下,新創建的數據庫若是不顯式的指定字符集相關參數,將使用這些參數。也叫作全局字符集變量數據庫
其中variables沒帶global,表示當前session生效的的參數。所謂當前session,表示若是更改過相關的參數,離開這個session之後,就會恢復默認的參數。也叫作鏈接時字符集變量安全
各個變量的概念:服務器
1,character_set_client:客戶端字符集,即數據在client端時字符集狀態。session
2,character_set_connection:鏈接時轉換字符集,即客戶端和服務端鏈接時,字符集裝換成的字符集ide
3,character_set_server:服務端處理時候使用的字符集編碼
4,character_set_database:數據庫層面存儲默認使用的字符集spa
5,character_set_results:數據返回時所用的字符集code
指定字符集參數有多種方式,
1,在編譯時指定,主要是:
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8-general_ci \
其中CHARSET是指字符集,COLLATION是指相關的校對規則(也稱排序規則)
2,參數文件,也就是my.cnf中設定
character_set_server=utf8
collation_server=utf8_general_ci
參數人間的設定將會覆蓋編譯時設定的字符集和校對規則。
3,啓動MySQL服務的時候指定:
--character_set_server: 指定全局粒度的默認字符集
--collation_server:指定全局粒度的默認校對規則
啓動時指定的參數將覆蓋參數文件以及編譯時指定的字符集和校對規則。
這些參數,若是從大方向分的話,能夠分爲兩類:
1,鏈接時使用的字符集,即爲show variable like '%character%' 顯示的那些字符集
2,存儲時使用的字符集,分爲4個級別
1)SERVER,全局級別
2)DATABASE,數據庫級別
3)TABLE,表級別
4)column,列級別
2# MySQL查詢的基本過程和亂碼的造成以及如何避免亂碼
1)查詢的基本過程
MySQL查詢基本過程以下:
1, 程序將字符轉換成二進制格式
2,MySQL 客戶端發出查詢(client端字符集)====>到達server端鏈接器(connection字符集)====>
內部轉換並查詢(列字符集、表字符集、數據庫字符集、server端字符集,轉換優先級逐級遞減)=====>
查詢結果返回給result(result字符集)
這裏先要回答一個latin字符集爲什麼能存放漢字的問題(漢字每一個字符佔用2個字節長度,latin不支持雙字節長度)。
其實是由於OS/APP層已經將漢字轉換爲單字符串的形式。通常來講,爲了正常顯示和處理漢字,
OS層面和程序層面也必定設定了字符集,這個字符集就會將漢字先一步處理成二進制。好比OS和程序層面設置的是UTF8,
那麼實際上輸入一個漢字,咱們在mysql client端其實是得到3個單字節,而非雙字節,這樣latin字符集就能夠處理了。
這些字符集的轉換,其實是字符長度的轉換。好比若是CLIENT端時LATIN1,connection是UTF8, 那麼就會發生3個字節長度轉換成6個字節長度。每一個單字符後面都會被填0,這樣變長了天然是沒事的, 頂多顯示的時候再轉回來,把後面填的零都切掉。可是反過來,若是是client段是utf8,connection是latin1, 那麼3個字節長度的utf8就會被轉成1個字節長度的latin1,後兩位丟失,之後是怎麼都變不回來的。 這種丟失非零位的過程是不可逆的。因此咱們必須得保證設定字符集的時候 (列字符集、表字符集、數據庫字符集、server段字符集)>= connection字符集 >= client字符集來避免字符編碼丟失問題。 一種可能發生的情況: client(latin1)==>connection(utf8)==>服務端內部的存儲時字符集(latin1), 並不會發生字符編碼丟失,由於connection轉換到存儲時字符集時,只是切掉了client轉connection時後面填的零,至關於轉回來了。 3,實驗: 1)client字符集爲utf8,connection 爲latin1,存儲字符集爲latin1 (root@localhost)[sample3]> show create table test2; +-------+---------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------+ | test2 | CREATE TABLE `test2` ( `id` int(11) DEFAULT NULL, `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> insert into test2 values (1,'中國'); Query OK, 1 row affected, 1 warning (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 這種狀況,已經丟失了字符,轉換result是沒有用的。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 2)client字符集爲utf8,connection爲utf8,存儲字符集爲latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> (root@localhost)[sample3]> insert into test2 values (1,'中國'); ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1 (root@localhost)[sample3]> 這裏直接報錯,5.6之後增強了數據庫數據的安全性,由於會丟失數據,因此不容許插入。 3)client字符集爲latin1,connection爲utf8,存儲字符集爲latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+---------------+ | id | name | +------+---------------+ | 1 | ä¸å›½ | +------+---------------+ 1 row in set (0.00 sec) 亂碼了,這裏爲什麼會亂碼呢?按理說轉換過程當中只會切掉填充的零纔對。其實是result的問題。這個result是utf8, 而client存的時候就是3位,到connection轉到6位,到存儲時轉回3位,這時候到result又轉到6位,天然是亂碼的。 只要result轉回latin1,就能夠了。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+--------+ | id | name | +------+--------+ | 1 | 中國 | +------+--------+ 1 row in set (0.00 sec) 4)不丟失字符,可是字符集不一樣的亂碼. (root@localhost)[sample2]> show create table test; +-------+-------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample2]> insert into test values('中國'); Query OK, 1 row affected (0.01 sec) (root@localhost)[sample2]> (root@localhost)[sample2]> select * from test; +------+ | name | +------+ | | +------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> set character_set_results=utf8; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample2]> select * from test; +--------+ | name | +--------+ | 中國 | +--------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> 修改客戶端字符集的5中方法: 一、 運行,set names <字符集>; 二、 在SQL文件中指定set names <字符集>;,用source命令導入sql文件 如: mysql> source test.sql 三、 在SQL文件中指定set names <字符集>;,而後經過重定向符,或者-e參數來執行 [root@mysql01 3307]# vi test.sql set names utf8; select * from mysql.user; [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock <test.sql [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock -e "set names <字符集>; select * from <database>.<table>;" 四、 經過指定mysql命令的字符集參數實現 --default-character-set=<字符集> [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock --default-chratacter-set=utf8 <database> <test.sql 五、 在配置文件裏設置客戶端即服務器端相關參數,此設置永久生效, [client] default-character-set=utf8 ###對於client參數,退出從新登陸,便可生效。 更改服務端的方法: [mysqld] default-character-set=utf8 ####5.1 character-set-server=utf8 ####5.5 [root@mysql01 3307]# mysql -uroot -p -S /data/3307/mysql.sock -e "show variables like '%chara%';" Warning: Using a password on the command line interface can be insecure. +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | +--------------------------+----------------------------+ [root@mysql01 3307]# 這些參數中,其中client,connection,results默認會跟隨系統的字符集設置,/etc/systemconfig/i18n 迷思。。。 character-set-server=utf8 collation-server=utf8_unicode_ci skip-character-set-client-handshake 設置 [client] default-character-set=utf8 [mysqld] character-set-server=utf8 [mysql] default-character-set=utf8 不能鎖定client端字符集 [client] default-character-set=utf8 [mysqld] character-set-server=utf8 skip-character-set-client-handshake ##加入忽略客戶端設置,使用服務端設置 [mysql] default-character-set=utf8 這樣設置之後,mysql客戶端字符集鎖定爲utf8