MySQL中utf8字符集、排序規則及utf8mb4_bin列大小寫不敏感方法

時間 2019-12-10

標籤 mysql utf8 utf 字符集排序規則 utf8mb4 bin 大小寫敏感方法欄目 MySQL 简体版

原文原文鏈接

utf8mb4 和 utf8 比較

utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character.

utf8: An alias for utfmb3.
(https://dev.mysql.com/doc/ref...html

UTF-8是使用1~4個字節，一種變長的編碼格式。（字符編碼）mysql

mb4即 most bytes 4，使用4個字節來表示完整的UTF-8。而MySQL中的utf8是utfmb3，只有三個字節，節省空間但不能表達所有的UTF-8，只能支持「基本多文種平面」（Basic Multilingual Plane，BMP）。git

推薦使用utf8mb4。github

utf8mb4_unicode_ci 和 utf8mb4_general_ci 比較

general_ci 更快，unicode_ci 更準確web

in German and some other languages ß is equal to ss.

這種狀況unicode_ci能準確判斷。算法

具體有什麼差異呢？參見下面的鏈接。sql

http://mysql.rjweb.org/utf8mb...函數

utf8mb4_general_ci           P=p  Q=q  R=r=Ř=ř   S=s=ß=Ś=ś=Ş=ş=Š=š  sh  ss    sz
utf8mb4_unicode_ci           P=p  Q=q  R=r=Ř=ř   S=s=Ś=ś=Ş=ş=Š=š    sh  ss=ß  sz

能夠看到utf8mb4_general_ci中S=ß，而utf8mb4_unicode_ci中ss=ß 。編碼

使用utf8mb4_bin能夠將上面的字符區分開來。spa

貌似general_ci 也快不了多少，因此更推薦unicode_ci。

大小寫敏感

utf8mb4_general_cs 大小寫敏感

utf8mb4_bin 大小寫敏感

但貌似不存在utf8_unicode_cs ，多是算法決定的吧？

utf8mb4_bin 列大小寫不敏感方法

需求

插入的時候Uman 和Umān和uman 看作不一樣的單詞。
查詢的時候Uman 和Umān和uman 都能同時查出來。

解決方案

使用MySQL虛擬生成列。MYSQL UTF8_bin case insensitive unique index

create table test_utf8_bin_ci
( u8 varchar(50) charset utf8mb4 collate utf8mb4_unicode_ci,
  u8_bin_ci varchar(50) charset utf8mb4 collate utf8mb4_bin as (lower(u8)) unique
);

insert into test_utf8_bin_ci (u8)
values ('A'),('Ä'),('Å'),('Â'),('Á'),('À');