在字符編碼問題上,python2 和python3 仍是有點不一樣的.今日寫篇博客,完全理清這個問題..python
如上圖,在python2中,若是須要把utf8 轉換成gbk,須要經過unicode中轉.其中encode編碼,decode解碼..code
utf8 --> unicode -->gbkblog
或者gbk --> unicode -->utf8utf-8
1 s="字符編碼問題" ##utf8編碼格式 2 s_to_unicode = s.decode("utf-8") 3 print(s_to_unicode) ##字符編碼問題(unicode) 4 print(s) ##字符編碼問題(utf8) 5 print(type(s_to_unicode),type(s),s_to_unicode,s) ##輸出一個,(<type 'unicode'>, <type 'str'>),(u'\u5b57\u7b26\u7f16\u7801\u95ee\u9898', '\xe5\xad\x97\xe7\xac\xa6\xe7\xbc\x96\xe7\xa0\x81\xe9\x97\xae\xe9\xa2\x98') 6 7 unicode_to_gbk = s_to_unicode.encode("gbk") 8 print(unicode_to_gbk) ##�ַ���������
python3 幫咱們作了個優化,編碼轉換不須要在通過unicode了,utf8直接轉成gbk,或者gbk直接轉成utf8
utf8 --> gbk
gbk -->utf8
python 3代碼演示:
1 s="字符編碼問題python 3" ##utf8編碼格式 2 s_to_gbk = s.encode("gbk") 3 print(s_to_gbk) ## b'\xd7\xd6\xb7\xfb\xb1\xe0\xc2\xeb\xce\xca\xcc\xe2python 3' 4 5 s_to_gbk = s.encode("utf8") ##b'\xe5\xad\x97\xe7\xac\xa6\xe7\xbc\x96\xe7\xa0\x81\xe9\x97\xae\xe9\xa2\x98python 3' 6 print(s_to_gbk)
代碼演示: ps:文件的編碼格式和字符串的編碼格式以及終端的編碼格式一致才能正常的輸出想要的字符串。這裏單獨print(i),會不正常顯示..
1 name="字符" 2 for i in name: 3 # print(i) 4 print(type(i),i) 5 # print(i) 6 #結果: 7 (<type 'str'>, '\xe5') 8 (<type 'str'>, '\xad') 9 (<type 'str'>, '\x97') 10 (<type 'str'>, '\xe7') 11 (<type 'str'>, '\xac') 12 (<type 'str'>, '\xa6')
1 s="字符編碼問題python 3" ##utf8編碼格式 2 for i in s: 3 print(i) 4 #結果: 5 字 6 符 7 編 8 碼 9 問 10 題 11 p 12 y 13 t 14 h 15 o 16 n 17 18 3
python3 比pythn2 更加友好,更加高級..因此下面仍是用python3吧.
Python 3最重要的新特性大概要算是對文本和二進制數據做了更爲清晰的區分。文本老是Unicode,由str類型表示,二進制數據則由bytes類型表示。Python 3不會以任意隱式的方式混用str和bytes,正是這使得二者的區分特別清晰。你不能拼接字符串和字節包,也沒法在字節包裏搜索字符串(反之亦然),也不能將字符串傳入參數爲字節包的函數(反之亦然)。
1 res1 = '€20'.encode('utf-8') 2 res2 = b'\xe2\x82\xac20'.decode('utf-8') 3 print(res1,res2) 4 結果: 5 b'\xe2\x82\xac20' €20
先介紹2個函數: bytes() 和str()
1 def __init__(self, value=b'', encoding=None, errors='strict'): # known special case of bytes.__init__ 2 """ 3 bytes(iterable_of_ints) -> bytes 4 bytes(string, encoding[, errors]) -> bytes 5 bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer 6 bytes(int) -> bytes object of size given by the parameter initialized with null bytes 7 bytes() -> empty bytes object 8 9 Construct an immutable array of bytes from: 10 - an iterable yielding integers in range(256) 11 - a text string encoded using the specified encoding 12 - any object implementing the buffer API. 13 - an integer 14 # (copied from class doc) 15 """ 16 pass
def __init__(self, value='', encoding=None, errors='strict'): # known special case of str.__init__ """ str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. # (copied from class doc) """ pass
一個簡單的題目:將你的名字,轉成2進制顯示出來(python 3)
1 name = "張三" 2 for i in name: 3 i_bytes = bytes(i,encoding='utf8') 4 for i in i_bytes: 5 print(bin(i)) 6 結果: 7 0b11100101 8 0b10111100 9 0b10100000 10 0b11100100 11 0b10111000 12 0b10001001