PyTips 0x07 - Python 字符串

時間 2019-12-07

原文原文鏈接

全部用過 Python (2&3)的人應該都看過下面兩行錯誤信息：git

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)github

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byteapi

這就是 Python 界的"錕斤拷"！工具

今天和接下來幾期的內容將主要關注 Python 中的字符串（str）、字節（bytes）及二者之間的相互轉換（encode/decode）。也許不能讓你忽然間解決全部亂碼問題，但但願能夠幫助你迅速找到問題所在。spa

定義

Python 中對字符串的定義以下：code

Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.orm

Python 3.5 中字符串是由一系列 Unicode 碼位（code point）所組成的不可變序列：繼承

('S' 'T' 'R' 'I' 'N' 'G')

'STRING'

不可變是指沒法對字符串自己進行更改操做：ip

s = 'Hello'
print(s[3])
s[3] = 'o'

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-2-ce8cf24852f9> in <module>()
      1 s = 'Hello'
      2 print(s[3])
----> 3 s[3] = 'o'


TypeError: 'str' object does not support item assignment

而序列（sequence）則是指字符串繼承序列類型（list/tuple/range）的通用操做：

[i.upper() for i in "hello"]

['H', 'E', 'L', 'L', 'O']

至於 Unicode 暫時能夠看做一張很是大的地圖，這張地圖裏面記錄了世界上全部的符號，而碼位則是每一個符號所對應的座標（具體內容將在後面的幾期介紹）。

s = '雨'
print(s)
print(len(s))
print(s.encode())

雨
1
b'\xe9\x9b\xa8'

經常使用操做

len：字符串長度；
split & join
find & index
strip
upper & lower & swapcase & title & capitalize
endswith & startswith & is*
zfill

# split & join
s = "Hello world!"
print(",".join(s.split())) # 經常使用的切分 & 重組操做

"https://github.com/rainyear/pytips".split("/", 2) # 限定切分次數

Hello,world!

['https:', '', 'github.com/rainyear/pytips']

s = "coffee"
print(s.find('f'))    # 從左至右搜索，返回第一個下標
print(s.rfind('f'))   # 從右至左搜索，返回第一個下表

print(s.find('a'))    # 若不存在則返回 -1
print(s.index('a'))   # 若不存在則拋出 ValueError，其他與 find 相同

2
3
-1

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-6-59556fd9319f> in <module>()
      4 
      5 print(s.find('a'))    # 若不存在則返回 -1
----> 6 print(s.index('a'))   # 若不存在則拋出 ValueError，其他與 find 相同


ValueError: substring not found

print(" hello world    ".strip())
print("helloworld".strip("heo"))
print("["+"          i         ".lstrip() +"]")
print("["+"          i         ".rstrip() +"]")

hello world
lloworld
[i         ]
[          i]

print("{}\n{}\n{}\n{}\n{}".format(
    "hello, WORLD".upper(),
    "hello, WORLD".lower(),
    "hello, WORLD".swapcase(),
    "hello, WORLD".capitalize(),
    "hello, WORLD".title()))

HELLO, WORLD
hello, world
HELLO, world
Hello, world
Hello, World

print("""
{}|{}
{}|{}
{}|{}
{}|{}
{}|{}
{}|{}
""".format(
    "Python".startswith("P"),"Python".startswith("y"),
    "Python".endswith("n"),"Python".endswith("o"),
    "i23o6".isalnum(),"1 2 3 0 6".isalnum(),
    "isalpha".isalpha(),"isa1pha".isalpha(),
    "python".islower(),"Python".islower(),
    "PYTHON".isupper(),"Python".isupper(),
))

True|False
True|False
True|False
True|False
True|False
True|False

"101".zfill(8)

'00000101'

format / encode

格式化輸出 format 是很是有用的工具，將會單獨進行介紹；encode 會在 bytes-decode-Unicode-encode-bytes 中詳細介紹。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。