題目連接:https://leetcode.com/problems...web
這道題關鍵是搞懂題目意思。oop
1 byte: characters from 0 to 127 == ASCII
2 bytes: characters from 127 to 2047
3 bytes: characters from 2048 to 65535
4 bytes: characters from 65536 to 1112064spa
The leading bits tell: the length of the bytescode
1 byte: the 1st bit is 0utf-8
2 bytes:leetcode
1st byte: start with "110"get
2nd byte: start with "10"it
3 bytes:io
1st byte: start with "1110"class
2nd byte: start with "10"
3rd byte: start with "10"
4 bytes:
1st byte: start with "11110"
2nd byte: start with "10"
3rd byte: start with "10"
4th byte: start with "10"
知道意思以後,這道題就很簡單了。
一個loop,每次分三步來作,loop invariant是每次data[i]都是first byte of 新的character
統計data[i]後8位裏面,從前開始有多少個1,用變量ones來保存,其中ones可能的值只有0, 2, 3, 4
從 data[i+1] 開始檢查,後八位中的前兩位是否爲'10',一共檢查ones - 1
更新i的值爲 i + ones
public class Solution { public boolean validUtf8(int[] data) { /* 1. check how many '1's = ones * 2. check (i + 1, i + ones - 1) for '10' * 3. update i = i + ones * valid ones: 0, 2, 3, 4 */ int i = 0; while(i < data.length) { // 1. find ones int ones = 0; while(((data[i] >> (7 - ones)) & 1) == 1) { ones++; } // invalid ones if(ones == 1 || ones > 4) return false; // 2. check 1s i++; while(ones-- > 1) { if(i >= data.length || ((data[i] >> 6) & 3) != 2) return false; // 3. update i i++; } } return true; } }
Advantage of UTF-8
implement Unicode: encode different symbols(Chinese...)
web pages are often coded in UTF-8, XML, JSON
only use binary representation: 0 and 1
endianness independent
Disadvantage of UTF-8
space: use more bytes, larger
time: calculate