謹慎使用STL

時間 2019-11-10

標籤謹慎使用 stl 简体版

原文原文鏈接

最近解決了一個由於大量使用STL形成的嚴重內存泄漏問題，再次記錄下。linux

上週上線了基於用戶tag的推薦，沒天服務器會自動生成新的tag數據，而後scp到指定目錄下，推薦服務中，對文件作了監控，若是改變就會從新加載解析。windows

函數代碼以下：服務器

void ReposManager::ReLoadUidTagData(const std::string& file_name) {
  std::string file_content;
  bool ret = file::ReadFileToString(file_name, &file_content);
  std::unordered_map<int, std::unordered_map<std::string, double> > uid_tags_map;
  if (ret == false) return;
  std::vector<std::string> content;
  SplitString(file_content, '\n', &content);
  ifstream fin(file_name);  
  for (auto line: content)
    std::vector<std::string> strs;
 	std::vector<std::string> vstr;
    SplitString(line, '=', &strs);
    if (strs.size() < 2) continue;
    double weight = StringToDouble(strs[1]);
    SplitString(strs[0], ':', &vstr);
    if (vstr.size() < 2) continue;
    int uid = StringToInt(vstr[0]);
    std::string tag = vstr[1];
    std::unordered_map<std::string, double>& tags_map = uid_tags_map[uid];
    tags_map[tag] = weight;
  }
  for (auto it = uid_tags_map.begin(); it != uid_tags_map.end(); ++it) {
    int uid = it->first;
    std::unordered_map<std::string, double>& uid_tags = uid_tags_map[uid];
    std::vector<std::string> tags;
    for (auto it = uid_tags.begin(); it != uid_tags.end(); ++it) {
      tags.push_back(it->first);
    }
    auto cmp = [&](const std::string& a, const std::string& b) {
      return uid_tags[a] > uid_tags[b];
    };
    std::sort(tags.begin(), tags.end(), cmp);
    uid_tags_[uid] = tags;
  }
}

文件存儲格式爲uid:tag=weight 有不少行，共220M左右，函數的功能是對每行進行分割，而後存到成員變量uid_tags_中。函數初看之下沒什麼問題，但上線後次日內存漲到3.6G，線上服務器內存很大也不能這麼浪費，我記得剛啓動加載完畢才2.0G，怎麼漲了這麼多，考慮緣由，估計是凌晨的文件變動再次解析致使的，重啓改動文件測試，果真是這個。數據結構

文件220M，所有加載進內存，解析過程當中算上額外的數據結構的確會引發內存增加，但這些內存消耗只是暫時的，局部變量函數結束後，內存就釋放了，顯然這裏的一些局部變量沒有釋放掉，這個狀況還的確沒遇到過。函數

google瞭解到STL的內存分配器，分多級，對於申請交大的內存，他會在堆上去申請。咱們理解的局部變量出了做用域就會釋放掉，這是由於大部分的局部變量都在函數棧裏，函數執行完了，整個棧都會釋放掉。週末在家試了下，嘗試手動經過vector的swap以及map的clear，手動釋放，在本機有必定的效果，內存沒有繼續增加，覺得解決了。性能

週一到公司，放到測試服務器上，發現仍是沒有解決了，內存仍是會出現較大增加，一樣的二進制文件，爲什麼會有差別，考慮再三，估計是虛擬機和測試服的配置不一樣，虛擬機內存不大，因此內存釋放得會快點，測試服內存十幾個G，內存不是很緊張，因此長時間得不到釋放。(看來保持測試環境和線上環境的統一，這個仍是頗有必要的，保證了運行環境的一致，便於排查問題)。測試

在知乎看到有人回答malloc_trim（0），google了到使用malloc_trim(0),後來發現無效，而後看C語言的malloc只是提供了接口，具體的實如今各個平臺不同，linux下是glibc的裏的malloc，瞭解到能夠用google的tcmalloc替換掉linux默認的malloc，替換後內存從原來高達3.6G到2.6G，看來效果仍是槓桿的! 回頭本身對代碼作了下檢查，以爲存儲的文件能夠再修改下，ui

uid:tag1=weight
uid:tag2=weight
...

改成：google

uid:tag1=weight,tag2=weight,tag3=weight....

將每一個uid的所屬tag整合到一塊兒，原來220M的文件變味了150M左右，而後修改文件解析函數：spa

void ReposManager::ReLoadUidTagData(const std::string& file_name) {                                                     
  std::string file_content;                                                   
  bool ret = file::ReadFileToString(file_name, &file_content);                
  if (ret == false) return;                                                   
  std::vector<std::pair<std::string, double>> tag_list;                       
  using namespace std;                                                        
  auto cmp = [](const pair<string, double>& a, const pair<string, double>& b) {
    return a.second > b.second;                                               
  };                                                                          
  std::vector<std::string> content;                                           
  SplitString(file_content, '\n', &content);                                                                           
  for (auto line: content) {                                                  
    if (line.length() == 0) continue;                                         
    std::vector<std::string> strs;                                            
    SplitString(line, ':', &strs);                                            
    if (strs.size() < 2) continue;                                            
    int uid = StringToInt(strs[0]);                                           
    std::vector<std::string> tags;                                            
    SplitString(strs[1], ',', &tags);                                         
    tag_list.clear();                                                         
    for (int i = 0; i < tags.size(); i++) {                                   
      std::vector<std::string> tmp;                                           
      SplitString(tags[i], '=', &tmp);                                        
      if (tmp.size() < 2) continue;                                           
      std::string tag = tmp[0];                                               
      double weight = StringToDouble(tmp[1]);                                 
      tag_list.push_back(std::pair<string, double>(tag, weight));             
    }                                                                         
    std::sort(tag_list.begin(), tag_list.end(), cmp); 
	uid_tags_[uid].clear();	
	for (int i = 0; i < tag_list.size(); ++i) {       
	  uid_tags_[uid].push_back(tag_list[i].first);    
    }
  }
 }

相比於修改前的，減小了一個 STL局部變量。

std::unordered_map<int, std::unordered_map<std::string, double> > uid_tags_map;

修改完畢，放到測試服務器測試，內存佔用穩定到1.5G左右，屢次加載不會出現增加，問題搞定，上線部署。

代碼都會寫，如何寫出高性能，高質量的代碼，這就是個技術活。

總結：

malloc()/free()做爲C標準，ANSI C並無指定它們具體應該如何實現。各個平臺上（windows, mac, linux等等），調用這兩個函數時，實現不同。
在linux下，malloc()/free()的實現是由glibc庫負責的。STL的內存釋放，有時候並無直接返還給os，只是返還給了分配器。
針對大量數據，謹慎大量使用STL局部變量，雖然棧上分配的，但它維護的隊列是分配在heap上的，它操做的內存不必定可以即便釋放，可能產生碎片。 stackoverflow 問題 ptmalloc理解

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。