1.字符集查看方式;
2.服務端、客戶端字符集設置;
3.亂碼分析;數據庫
1.字符集查看方式:
1.1 經過 nls_database_parameters 視圖查詢數據庫字符集(數據來源於props$):session
1 SQL> select parameter,value from nls_database_parameters where parameter like '%CHARACTER%'; 2 3 PARAMETER VALUE 4 ------------------------------ ------------------------------ 5 NLS_NUMERIC_CHARACTERS ., 6 NLS_CHARACTERSET AL32UTF8 7 NLS_NCHAR_CHARACTERSET AL16UTF16 8 Elapsed: 00:00:00.02
1.2 經過 nls_session_parameters 查詢當前session 環境變量(數據來源於:X$NLS_PARAMETERS):
SQL> select * from nls_session_parameters;
1.3 經過v$nls_parameters 視圖查詢(數據來源於:X$NLS_PARAMETERS):async
1 SQL> select parameter,value from v$nls_parameters where parameter like '%CHARACTER%'; 2 3 PARAMETER VALUE 4 ---------------------------------------------------------------- ------------------------------ 5 NLS_NUMERIC_CHARACTERS ., 6 NLS_CHARACTERSET AL32UTF8 7 NLS_NCHAR_CHARACTERSET AL16UTF16 8 Elapsed: 00:00:00.01
1.4 經過用戶環境變量查詢:測試
1 SQL> select userenv('language') from dual; 2 3 USERENV('LANGUAGE') 4 ---------------------------------------------------- 5 AMERICAN_AMERICA.AL32UTF8 6 Elapsed: 00:00:00.03
得到的結果包括:語言(NLS_LANGUAGE)、地區(NLS_TERRITORY)、字符集(NSL_CHARACTERSET);spa
2.服務端、客戶端字符集設置:
2.1 服務端字符集設置:
2.1.1 新的字符集是舊的字符集超類:
數據庫建立時提供字符集設置,一般是操做系統平臺字符集,也能夠在建立數據庫後修改字符集,但新的字符集必須支持舊的字符集(舊字符集的超集);
修改前備份全部數據,修改字符集後導入數據到新字符集中;
修改步驟:操作系統
1 SQL> shutdown immediate 2 SQL> startup nomount 3 SQL> alter database mount exclusive; --裝載數據爲專用的高級模式; 4 SQL> alter system enable restricted session; --啓用受限制的session模式 5 SQL> alter system set job_queue_processes=0; --'maximum number of job queue slave processes' 設置工做隊列的最大進程數爲0 6 SQL> alter system set aq_tm_processes=0; 7 SQL> alter database open; 8 SQL> alter database character set AL32UTF8; --新的字符集必須支持舊的字符集(舊字符集的超集),相關錯誤:(ORA-12712: new character set must be a superset of old character set) 9 SQL> shutdown immediate 10 SQL> startup
重啓後字符集改變:rest
1 Verifying file header compatibility for 11g tablespace encryption.. 2 Verifying 11g file header compatibility for tablespace encryption completed 3 SMON: enabling tx recovery 4 Database Characterset is AL32UTF8 5 No Resource Manager plan active 6 replication_dependency_tracking turned off (no async multimaster replication found) 7 WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected. 8 Completed: ALTER DATABASE OPEN
2.1.2 新的字符集不是舊的字符集超類:
若是新字符集不是舊字符集的超類,如從 WE8MSWIN1252 ==>AL328TF8,修改方式以下,測試環境(ORACLE 11GR2):code
SHUTDOWN IMMEDIATE; startup mount; ALTER SYSTEM ENABLE RESTRICTED SESSION; ALTER SYSTEM SET job_queue_processes =0; ALTER DATABASE OPEN; ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8; SHUTDOWN IMMEDIATE; STARTUP ;
在RAC環境中,設置時先中止全部節點上的數據庫和實例,而後在單個節點上啓動實例和數據庫,設置 cluster_database=false,關閉數據庫和實例,根據字符集繼續2.1.1 或 2.1.2 的操做步驟,設置成功後,再設置 cluster_database=true,最後關閉節點上的數據庫和實例,使用srvctl 啓動全部節點上的實例和數據庫:blog
alter system set cluster_database=false scope=spfile; SHUTDOWN IMMEDIATE;
#執行2.1.1 或 2.1.2的操做
alter system set cluster_database=true scope=spfile;
shutdown immediate;
srvctl start database -d dbname #最後使用srvctl 啓動因此節點數據庫
2.2 客戶端字符集設置:隊列
當客戶端鏈接服務端時讀取環境變量NLS_LANG和其它環境變量,當設置了NLS_LANG 環境變量後,相關環境變量(NLS_LANGUAGE、NLS_TERRITORY)會因該變量的設置而變化,由於它們默認狀況下都是源於NLS_LANG環境變量;其它的環境變量(NLS_DATE_FORMAT、NLS_TIMESTAMP_FORMAT、NLS_NUMBERIC_CHARACTERS..)會因NLS_TERRITORY變量的設置而變化;WINDOWS 平臺上NLS_LANG環境變量被設置在註冊表內,在個人機器中默認值是:SIMPLIFIED CHINESE_CHINA.ZHS16GBK,LINUX 平臺上經過NLS_LANG設置,如未設置或安裝時使用Oracle Universal Install 安裝,NLS_LANG環境變量是不會被設置的,其默認值爲:AMERICAN_AMERICA.US7ASCII;
3.亂碼分析:
以當前的環境爲例,我並未設置NLS_LANG 環境變量,數據庫的字符碼爲:AMERICAN_AMERICA.AL32UTF8
1 SQL> select userenv('language') from dual; 2 3 USERENV('LANGUAGE') 4 ---------------------------------------------------- 5 AMERICAN_AMERICA.AL32UTF8
而操做系統的字符集是:
1 [sywu@wusuyuan ~]$ locale 2 LANG=zh_CN.UTF-8
查詢數據和插入數據都是亂碼的:
1 SQL> select * from tb_distree; 2 3 ID NAME 4 ---------- ------------------------------------------------------------------ 5 3 ?? 6 3 ?? 7 4 ?? 8 5 ?? 9 SQL> insert into tb_distree values(17,'德國'); 10 11 1 row created.
從10046 trace 中已經能夠清晰看出後臺亂碼
1 SQL ID: 5naprsgt1dqj3 2 108 Plan Hash: 0 3 109 insert into tb_distree 4 110 values 5 111 (18,'������') 6 112 7 113 8 114 call count cpu elapsed disk query current rows 9 115 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- 10 116 Parse 1 0.00 0.00 0 0 0 0 11 117 Execute 1 0.02 0.02 0 1 5 1 12 118 Fetch 0 0.00 0.00 0 0 0 0 13 119 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- 14 120 total 2 0.02 0.02 0 1 5 1 15 121 16 122 Misses in library cache during parse: 1 17 123 Optimizer mode: ALL_ROWS 18 124 Parsing user id: 85 19 125 20 126 Rows Row Source Operation 21 127 ------- --------------------------------------------------- 22 128 0 LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=0 us) 23 129
但此時數據庫的字符集是AMERICAN_AMERICA.AL32UTF8,只是沒有設置NLS_LANG環境變量且機器自己的字符集與數據庫字符集不一致,在官方文檔中代表該環境變量在未設置時默認爲:AMERICAN_AMERICA.US7ASCII,US7ASCII字符集自己不支持中文,保存數據時,數據庫進行字符轉換,從US7ASCII轉換爲AL32UTF8;
1 SQL> select id,name,dump(name,'1016') from tb_distree; 2 3 ID NAME DUMP(NAME,'1016') 4 ---------- ---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5 3 ?? Typ=1 Len=6 CharacterSet=AL32UTF8: e4,ba,91,e5,8d,97 6 3 ?? Typ=1 Len=6 CharacterSet=AL32UTF8: e5,b9,bf,e4,b8,9c 7 4 ?? Typ=1 Len=6 CharacterSet=AL32UTF8: e5,8c,97,e4,ba,ac 8 5 ?? Typ=1 Len=6 CharacterSet=AL32UTF8: e5,9b,9b,e5,b7,9d 9 18 ?????? Typ=1 Len=18 CharacterSet=AL32UTF8: ef,bf,bd,ef,bf,bd,ef,bf,bd,ef,bf,bd,ef,bf,bd,ef,bf,bd
經嘗試怎麼轉換都是亂碼:
1 SQL> select convert('中國','US7ASCII') from dual; 2 3 CO 4 -- 5 ?? 6 7 SQL> select convert(convert('中國','US7ASCII'),'AL32UTF8') FROM DUAL; 8 9 CO 10 -- 11 ??
設置環境變量:
1 [sywu@wusuyuan ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 2 [sywu@wusuyuan ~]$ echo $NLS_LANG 3 AMERICAN_AMERICA.AL32UTF8 4 5 SQL> select * from tb_distree; 6 7 ID NAME 8 ---------- ---------- 9 3 雲南 10 3 廣東 11 4 北京 12 5 四川 13 6 重慶 14 7 上海 15 8 香港 16 15 ������ 17 17 ������ 18 18 ������
這樣字符就顯示正常了,但以前在沒有設置環境變量NLS_LANG=AMERICAN_AMERICA.AL32UTF8以前插入的數據依舊是亂碼;總結:當客戶端和服務端字符集相同時,不存在字符集轉換,數據直接保存數據;當客戶端和服務端字符集不相同時,在設置了NLS_LANG環境變量(未設置默認值:AMERICAN_AMERICA.US7ASCII)時,保存或提早數據,數據庫都要通過字符轉換,正確一致的設置字符集能夠提升數據庫效率;