在使用Lua的過程當中,常常會遇到須要截取字符串或者得到字符串真實長度的狀況,而Lua自帶的string.sub()對於中文字符會看成3個字符來處理,截取時會形成亂碼。因此須要本身改造下,下面的SubStringUTF8()方法是我改編的SubString方法,經測試能夠識別中英混合的字符串,無論是英文字符仍是中文字符都看成一個字符來計算index,而且能夠像系統的string.sub()同樣使用負數來從末尾截取字符。其餘的幾個方法是SubStringUTF8()的依賴方法,也能夠單獨拿來使用。測試
1 --截取中英混合的UTF8字符串,endIndex可缺省 2 function SubStringUTF8(str, startIndex, endIndex) 3 if startIndex < 0 then 4 startIndex = SubStringGetTotalIndex(str) + startIndex + 1; 5 end 6
7 if endIndex ~= nil and endIndex < 0 then 8 endIndex = SubStringGetTotalIndex(str) + endIndex + 1; 9 end 10
11 if endIndex == nil then 12 return string.sub(str, SubStringGetTrueIndex(str, startIndex)); 13 else
14 return string.sub(str, SubStringGetTrueIndex(str, startIndex), SubStringGetTrueIndex(str, endIndex + 1) - 1); 15 end 16 end 17
18 --獲取中英混合UTF8字符串的真實字符數量 19 function SubStringGetTotalIndex(str) 20 local curIndex = 0; 21 local i = 1; 22 local lastCount = 1; 23 repeat 24 lastCount = SubStringGetByteCount(str, i) 25 i = i + lastCount; 26 curIndex = curIndex + 1; 27 until(lastCount == 0); 28 return curIndex - 1; 29 end 30
31 function SubStringGetTrueIndex(str, index) 32 local curIndex = 0; 33 local i = 1; 34 local lastCount = 1; 35 repeat 36 lastCount = SubStringGetByteCount(str, i) 37 i = i + lastCount; 38 curIndex = curIndex + 1; 39 until(curIndex >= index); 40 return i - lastCount; 41 end 42
43 --返回當前字符實際佔用的字符數 44 function SubStringGetByteCount(str, index) 45 local curByte = string.byte(str, index) 46 local byteCount = 1; 47 if curByte == nil then 48 byteCount = 0
49 elseif curByte > 0 and curByte <= 127 then 50 byteCount = 1
51 elseif curByte>=192 and curByte<=223 then 52 byteCount = 2
53 elseif curByte>=224 and curByte<=239 then 54 byteCount = 3
55 elseif curByte>=240 and curByte<=247 then 56 byteCount = 4
57 end 58 return byteCount; 59 end