【thrift】vc中使用thrift中文字符串亂碼問題解決

時間 2019-11-10

原文原文鏈接

問題描述：

VC中使用Apache thrift時，若是字符串中包含中文，會出現亂碼問題，這個問題的緣由是因爲thrift爲了達到跨語言交互而使用了UTF-8格式發送字符串，這點對java或者C#不會形成影響，可是在VC中UTF-8卻很成問題。VC中的string編碼隨項目編碼通常是multibytes或者unicode，雖然倡導使用unicode，但實際上使用multibytes多字節開發仍然普遍存在，下面的解決方案主要解決的是多字節下的亂碼問題。java

解決方案

一、手動轉換

第一種解決方案就是在使用的時候，本身手動轉換，讀取時從utf-8轉爲multibytes，寫入時從multibytes轉爲utf-8。顯然這樣費時費力，只適用於中文字符存在較少的場景。c++

二、修改thrift lib庫

爲了達到一勞永逸的目的，能夠修改thrift c++ lib庫來完成轉換，這裏只分析使用TBinaryProtocol的場景，其餘Protocol若是出現相同狀況請參照。函數

打開TBinaryProtocol.h和TBinaryProtocol.tcc，修改其readString和writeString方法測試

template <class Transport_>
template<typename StrType>
uint32_t TBinaryProtocolT<Transport_>::readString(StrType& str) {
  uint32_t result;
  int32_t size;
  result = readI32(size);
  result += readStringBody(str, size);
 //modified by xiaosuiba
  //convert utf-8 to multibytes
#ifdef _WIN32
    str = utf8_to_mb(str);
#endif
  return result;
}

template <class Transport_>
template<typename StrType>
uint32_t TBinaryProtocolT<Transport_>::writeString(const StrType& str) {
    //modified by xiaosuiba
    //添加多字節到UTF-8轉換
    
#ifdef _WIN32
    StrType theStr = mb_to_utf8(str);
#else
    const StrType &theStr = str;
#endif

  if(theStr.size() > static_cast<size_t>((std::numeric_limits<int32_t>::max)()))
    throw TProtocolException(TProtocolException::SIZE_LIMIT);
  uint32_t size = static_cast<uint32_t>(theStr.size());
  uint32_t result = writeI32((int32_t)size);
  if (size > 0) {
    this->trans_->write((uint8_t*)theStr.data(), size);
  }
  return result + size;
}