Python全棧開發之Python基礎-字符編碼與轉碼html
詳細文章:python
http://www.cnblogs.com/yuanchenqi/articles/5956943.htmlpython2.7
http://www.diveintopython3.net/strings.html網站
需知:編碼
1.在python2默認編碼是ASCII, python3裏默認是utf-8spa
2.unicode 分爲 utf-32(佔4個字節),utf-16(佔兩個字節),utf-8(佔1-4個字節), so utf-8就是unicode.net
3.在py3中encode,在轉碼的同時還會把string 變成bytes類型,decode在解碼的同時還會把bytes變回stringcode
1、python2htm
2、python3blog
編碼應用比較多的場景應該是爬蟲了,互聯網上不少網站用的編碼格式很雜,雖然總體趨向都變成utf-8,但如今仍是很雜,因此爬網頁時就須要你進行各類編碼的轉換,不過生活正在變美好,期待一個不須要轉碼的世界。
最後,編碼is a piece of fucking shit, noboby likes it.
ps:
python2 的用法
1 [root@python2 scripts]# cat encode.py 2 #!/usr/bin/env python 3 # -*- coding:utf-8 -*- 4 #Author: nulige 5 6 import sys 7 print(sys.getdefaultencoding()) 8 9 s = "你好" 10 s_to_unicode = s.decode("utf-8") 11 print(s_to_unicode) 12 s_to_gbk = s_to_unicode.encode("gbk") 13 print(s_to_gbk) 14 15 gbk_to_utf8 = s_to_gbk.decode("gbk").encode("utf-8") 16 print(gbk_to_utf8)
執行結果:
1 [root@python2 scripts]# python encode.py 2 ascii #系統默認編碼 3 你好 4 ?oí 5 你好 #gbk轉成utf-8
utf-8是unicode的擴展集
1 [root@python2 scripts]# cat encode.py 2 #!/usr/bin/env python 3 # -*- coding:utf-8 -*- 4 #Author: nulige 5 6 import sys 7 print(sys.getdefaultencoding()) 8 9 s = u"你好" 10 print(s) 11 12 s_to_unicode = s.decode("utf-8") 13 print(s_to_unicode) 14 s_to_gbk = s_to_unicode.encode("gbk") 15 print(s_to_gbk) 16 17 gbk_to_utf8= s_to_gbk.decode("gbk").encode("utf-8") 18 print(gbk_to_utf8)
執行結果:
1 [root@python2 scripts]# python encode.py 2 ascii 3 你好 #utf-8是unicode的擴展集,因此這裏也是能夠顯示中文的 4 Traceback (most recent call last): 5 File "encode.py", line 11, in <module> 6 s_to_unicode = s.decode("utf-8") 7 File "/usr/local/lib/python2.7/encodings/utf_8.py", line 16, in decode 8 return codecs.utf_8_decode(input, errors, True) 9 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
1 [root@python2 scripts]# cat encode.py 2 #!/usr/bin/env python 3 # -*- coding:utf-8 -*- 4 #Author: nulige 5 6 import sys 7 print(sys.getdefaultencoding()) 8 9 s = u"你好" 10 print(s) 11 12 s_to_gbk = s.encode("gbk") 13 print(s_to_gbk) 14 15 gbk_to_utf8= s_to_gbk.decode("gbk").encode("utf-8") 16 print(gbk_to_utf8)
執行結果:
1 [root@python2 scripts]# python encode.py 2 ascii 3 你好 4 ?oí 5 你好
python3
1 #!/usr/bin/env python 2 #Author: nulige 3 4 import sys 5 print(sys.getdefaultencoding()) 6 7 s = "你哈" #默認是utf-8 8 s_gbk = s.encode("gbk") #utf-8轉成gbk 9 10 print(s_gbk) 11 print(s.encode())
執行結果:
1 utf-8 #python默認是utf-8 2 b'\xc4\xe3\xb9\xfe' #utf-8轉成gbk
3 b'\xe4\xbd\xa0\xe5\x93\x88'
1 #!/usr/bin/env python 2 #Author: nulige 3 4 import sys 5 print(sys.getdefaultencoding()) 6 7 s = "你哈" 8 s_gbk = s.encode("gbk") 9 10 print(s_gbk) 11 print(s.encode()) 12 13 gbk_to_utf8 = s_gbk.decode("gbk").encode("utf-8") #gbk轉成utf-8 14 print("utf8",gbk_to_utf8)
執行結果:
1 utf-8 2 b'\xc4\xe3\xb9\xfe' 3 b'\xe4\xbd\xa0\xe5\x93\x88' 4 utf8 b'\xe4\xbd\xa0\xe5\x93\x88'
總結
把PyCharm字符編碼調成gbk
1 #!/usr/bin/env python 2 # -*-coding:gbk-*- 3 #Author: nulige 4 5 #不一樣字符編碼要先轉成uncode 6 import sys 7 print(sys.getdefaultencoding()) 8 9 s = '你好' #默認uncode 10 print(s.encode("gbk")) 11 print(s.encode("utf-8")) 12 print(s.encode("utf-8").decode("utf-8").encode("gb2312")) 13 print(s.encode("utf-8").decode("utf-8").encode("gb2312").decode("gb2312"))
執行結果:
1 utf-8 2 b'\xc4\xe3\xba\xc3' 3 b'\xe4\xbd\xa0\xe5\xa5\xbd' 4 b'\xc4\xe3\xba\xc3' 5 你好
做業:
python2 or python 3
記住:全部字符集的轉換,都要通過unicode一、把gbk2312 to utf-8二、把utf-8 to gbk