原文地址:http://www.javashuo.com/article/p-vomruado-s.html html
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:數組
This is how the UTF-8 encoding would work:編碼
Char. number range | UTF-8 octet sequence (hexadecimal) | (binary) --------------------+--------------------------------------------- 0000 0000-0000 007F | 0xxxxxxx 0000 0080-0000 07FF | 110xxxxx 10xxxxxx 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding.spa
Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.code
Example 1:htm
data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true. It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2:blog
data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false. The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character. The next byte is a continuation byte which starts with 10 and that's correct. But the second continuation byte does not start with 10, so it is invalid.
UTF-8 中的一個字符可能的長度爲 1 到 4 字節,遵循如下的規則:utf-8
這是 UTF-8 編碼的工做方式:ci
Char. number range | UTF-8 octet sequence (hexadecimal) | (binary) --------------------+--------------------------------------------- 0000 0000-0000 007F | 0xxxxxxx 0000 0080-0000 07FF | 110xxxxx 10xxxxxx 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
給定一個表示數據的整數數組,返回它是否爲有效的 utf-8 編碼。unicode
注意:
輸入是整數數組。只有每一個整數的最低 8 個有效位用來存儲數據。這意味着每一個整數只表示 1 字節的數據。
示例 1:
data = [197, 130, 1], 表示 8 位的序列: 11000101 10000010 00000001. 返回 true 。 這是有效的 utf-8 編碼,爲一個2字節字符,跟着一個1字節字符。
示例 2:
data = [235, 140, 4], 表示 8 位的序列: 11101011 10001100 00000100. 返回 false 。 前 3 位都是 1 ,第 4 位爲 0 表示它是一個3字節字符。 下一個字節是開頭爲 10 的延續字節,這是正確的。 但第二個延續字節不以 10 開頭,因此是不符合規則的。
104ms
1 class Solution { 2 func validUtf8(_ data: [Int]) -> Bool { 3 var cnt:Int = 0 4 for d in data 5 { 6 if cnt == 0 7 { 8 if (d >> 5) == 0b110 {cnt = 1} 9 else if (d >> 4) == 0b1110 {cnt = 2} 10 else if (d >> 3) == 0b11110 {cnt = 3} 11 else if d >> 7 == 1 {return false} 12 //else if d>>7 == 0b1 { return false } 13 } 14 else 15 { 16 if (d >> 6) != 0b10 {return false} 17 cnt -= 1 18 } 19 } 20 return cnt == 0 21 } 22 }