在python裏,直接decode('utf-8')便可html
>>> "\xE5\x85\x84\xE5\xBC\x9F\xE9\x9A\xBE\xE5\xBD\x93 \xE6\x9D\x9C\xE6\xAD\x8C".decode('utf-8')
u'\u5144\u5f1f\u96be\u5f53 \u675c\u6b4c'
>>> print "\xE5\x85\x84\xE5\xBC\x9F\xE9\x9A\xBE\xE5\xBD\x93 \xE6\x9D\x9C\xE6\xAD\x8C".decode('utf-8')
兄弟難當 杜歌
>>>java
在java裏未發現直接解碼的函數,不過只要理解了數據是如何編碼的,就能夠很快的進行解碼,推薦閱讀http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.htmlpython
UTF-8是unicode編碼的一種落地方案:app
Unicode符號範圍 | UTF-8編碼方式
(十六進制) | (二進制)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx函數
\x對應的是UTF-8編碼的數據,經過轉化規則能夠轉換爲Unicode編碼,就能獲得對應的漢字,轉換規則很簡單,先將\x去掉,轉換爲數字,而後進行對應的位移操做便可,須要注意的是先要判斷utf-8的位數:post
val pattern = """(\d+\.\d+\.\d+\.\d+) \- (\S+) (\S+) \[([^\]]+)\] \"(\w+) (\S+) \S+\" (\S+) (\S+) \"([^\"]+)\" \"([^\"]+)\" \"([^\"]+)\" \"([^\"]+)""".r val decodeDataPattern = """(\\x([0-9A-Z]){2})+""".r def decodeUtf8(utf8Str:String):String={ var data = decodeDataPattern.replaceAllIn(utf8Str, m=>{ var item = decodeXdata(m.toString()) item }) return data } def decodeXdata(utf8Str:String):String={ var arr = utf8Str.split("\\\\x") var result = new StringBuilder() var isMatchEnd = true var matchIndex = 0 var currentWordLength = 0 var current = 0 var e0=0xe0; for(item <-arr){ var str = item.trim if(str.length()>0){ var currentCode = Integer.parseInt(str, 16); if(isMatchEnd){ isMatchEnd = false var and = currentCode & e0; if(and == 0xe0){ matchIndex = 1; currentWordLength = 3; current = (currentCode & 0x1f) <<12 // 3位編碼的 }else if(and==96){ matchIndex = 1; currentWordLength = 2; current = (currentCode & 0x1f) <<6 // 2位編碼的 }else{ current = currentCode // 1位編碼的 } }else{ matchIndex = matchIndex+1; if(matchIndex == 2) { current+=(currentCode & 0x3f) <<6 }else{ current+=(currentCode & 0x3f) } } if(matchIndex==currentWordLength){ var hex = Integer.toHexString(current) hex = if(hex.length()<4) "\\u00"+hex else "\\u"+hex //補0 result.append(new String(StringEscapeUtils.unescapeJava(hex).getBytes,"utf-8")) current = 0 matchIndex=0 isMatchEnd = true } } } return result.toString() }