STL中map與hash_map容器的選擇

時間 2019-12-18

標籤 stl map hash 容器選擇简体版

原文原文鏈接

先看看alvin_lee 朋友作的解析，我以爲仍是很正確的，從算法角度闡述了他們之間的問題！html

實際上這個問題不光C++會遇到，其餘全部語言的標準容器的實現及選擇上都是要考慮的。作應用程序你可能以爲影響不大，可是寫算法或者核心代碼就要當心了。今天改進代碼，順便又來溫習基礎功課了。ios

　　還記得Herb Sutter那極有味道的《C++對話系列》麼，在其中《產生真正的hash對象》這個故事裏就講了map的選擇。順便回顧一下，也講一下我在實用中的理解。c++

　　選擇map容器，是爲了更快的從關鍵字查找到相關的對象。與使用list這樣的線性表容器相比，一能夠簡化查找的算法，二可使任意的關鍵字作索引，並與目標對象配對，優化查找算法。在C++的STL中map是使用樹來作查找算法，這種算法差很少至關與list線性容器的折半查找的效率同樣，都是O (log2N)，而list就沒有map這樣易定製和操做了。算法

　　相比hash_map，hash_map使用hash表來排列配對，hash表是使用關鍵字來計算表位置。當這個表的大小合適，而且計算算法合適的狀況下，hash表的算法複雜度爲O(1)的，可是這是理想的狀況下的，若是hash表的關鍵字計算與表位置存在衝突，那麼最壞的複雜度爲O(n)。數據結構

　　那麼有了這樣的認識，咱們應該怎麼樣選用算法呢？前兩天看Python文章的時候，不知道哪一個小子說Python的map比c++的map快，如何如何的。可是他並不知道Python是默認使用的 hash_map，並且這些語言特徵本質上是使用c/c++寫出來的，問題在與算法和手段，而不是在於語言自己的優劣，你熟悉了各類算法，各類語言的細節、設計思想，還能在這偏激的嚷嚷孰好孰壞(片面與偏激的看待事物只能代表愚昧與無知，任何事物都有存在的價值，包括技術)。顯然C++的STL默認使用樹結構來實現map，是有考究的。ide

　　樹查找，在總查找效率上比不上hash表，可是它很穩定，它的算法複雜度不會出現波動。在一次查找中，你能夠判定它最壞的狀況下其複雜度不會超過O(log2N)。而hash表就不同，是O(1)，仍是O(N)，或者在其之間，你並不能把握。倘若你在開發一個供外部調用的接口，其內部有關鍵字的查找，可是這個接口調用並不頻繁，你是會但願其調用速度快、但不穩定呢，仍是但願其調用時間平均、且穩定呢。反之倘若你的程序須要查找一個關鍵字，這個操做很是頻繁，你但願這些操做在整體上的時間較短，那麼hash表查詢在總時間上會比其餘要短，平均操做時間也會短。這裏就須要權衡了。函數

　　這裏總結一下，選用map仍是hash_map，關鍵是看關鍵字查詢操做次數，以及你所須要保證的是查詢整體時間仍是單個查詢的時間。若是是要不少次操做，要求其總體效率，那麼使用hash_map，平均處理時間短。若是是少數次的操做，使用 hash_map可能形成不肯定的O(N)，那麼使用平均處理時間相對較慢、單次處理時間恆定的map，考慮總體穩定性應該要高於總體效率，由於前提在操做次數較少。若是在一次流程中，使用hash_map的少數操做產生一個最壞狀況O(N)，那麼hash_map的優點也所以喪盡了。oop

下面先看一段代碼，從Codeproject的 Jay Kint：post

// familiar month example used
// mandatory contrived example to show a simple point
// compiled using MinGW gcc 3.2.3 with gcc -c -o file.o
// file.cpp性能

#include <string>
#include <ext/hash_map>
#include <iostream>

using namespace std;
// some STL implementations do not put hash_map in std
using namespace __gnu_cxx;

hash_map<const char*, int> days_in_month;

class MyClass {
   static int totalDaysInYear;
public:
   void add_days( int days ) { totalDaysInYear += days; }
   static void printTotalDaysInYear(void)
   {
       cout << "Total Days in a year are "
           << totalDaysInYear << endl;
   }
};

int MyClass::totalDaysInYear = 0;

int main(void)
{
   days_in_month["january"] = 31;
   days_in_month["february"] = 28;
   days_in_month["march"] = 31;
   days_in_month["april"] = 30;
   days_in_month["may"] = 31;
   days_in_month["june"] = 30;
   days_in_month["july"] = 31;
   days_in_month["august"] = 31;
   days_in_month["september"] = 30;
   days_in_month["october"] = 31;
   days_in_month["november"] = 30;
   days_in_month["december"] = 31;

   // ERROR: This line doesn't compile.
   accumulate( days_in_month.begin(), days_in_month.end(),
       mem_fun( &MyClass::add_days ));

MyClass::printTotalDaysInYear();

return 0;
}

固然上面的代碼徹底可使用STL來實現：

引用

Standard C++ Solutions
The Standard C++ Library defines certain function adaptors, select1st, select2nd and compose1, that can be used to call a single parameter function with either the key or the data element of a pair associative container.

select1st and select2nd do pretty much what their respective names say they do. They return either the first or second parameter from a pair.

compose1 allows the use of functional composition, such that the return value of one function can be used as the argument to another. compose1(f,g) is the same as f(g(x)).

Using these function adaptors, we can use for_each to call our function.

hash_map my_map;
for_each( my_map.begin(), my_map.end(),
compose1( mem_fun( &MyType::do_something ),
select2nd MyType>::value_type>()));
Certainly, this is much better than having to define helper functions for each pair, but it still seems a bit cumbersome, especially when compared with the clarity that a comparable for loop has.

for( hash_map::iterator i =
my_map.begin();
i != my_map.end(), ++i ) {

i->second.do_something();
}
Considering it was avoiding the for loop for clarity's sake that inspired the use of the STL algorithms in the first place, it doesn't help the case of algorithms vs. hand written loops that the for loop is more clear and concise.

with_data and with_key
with_data and with_key are function adaptors that strive for clarity while allowing the easy use of the STL algorithms with pair associative containers. They have been parameterized much the same way mem_fun has been. This is not exactly rocket science, but it is quickly easy to see that they are much cleaner than the standard function adaptor expansion using compose1 and select2nd.

Using with_data and with_key, any function can be called and will use the data_type or key_type as the function's argument respectively. This allows hash_map, map, and any other pair associative containers in the STL to be used easily with the standard algorithms. It is even possible to use it with other function adaptors, such as mem_fun.

hash_map my_vert_buffers;

void ReleaseBuffers(void)
{
   // release the vertex buffers created so far.
   std::for_each( my_vert_buffers.begin(),
       my_vert_buffers.end(),
       with_data( boost::mem_fn(
           &IDirect3DVertexBuffer9::Release )));
}
Here boost::mem_fn is used instead of mem_fun since it recognizes the __stdcall methods used by COM, if the BOOST_MEM_FN_ENABLE_STDCALL macro is defined.

另外添加一些實戰的例子：
鏈接是:
http://blog.sina.com.cn/u/4755b4ee010004hm

摘錄以下:

引用

一直都用的STL的map,直到最近庫裏數據量急劇增大,聽別的作檢索的同窗說到hash_map,一直都打算換回來,今天好好作了個實驗測試了哈hash_map的功能,效果以及與map比較的性能.
     首先,要說的是這兩種數據結構的都提供了KEY-VALUE的存儲和查找的功能.可是實現是不同的,map是用的紅黑樹,查詢時間複雜度爲log (n),而hash_map是用的哈希表.查詢時間複雜度理論上能夠是常數,可是消耗內存大,是一種以存儲換時間的方法.
     就應用來講,map已是STL標準庫的東西,但是hash_map暫時還未進入標準庫,但也是很是經常使用也很是重要的庫.
     此次所作的測試是對於100W及的文件列表,去重的表現,便是對文件名string,作map!
用到的頭文件:

#include <time.h>    //計算時間性能用
     #include <ext/hash_map>   //包含hash_map 的頭文件
     #include <map>             //stl的map
     using namespace std;        //std 命名空間
     using namespace __gnu_cxx;    //而hash_map是在__gnu_cxx的命名空間裏的

//測試3個環節:用map的效率,hash_map系統hash函數的效率及自寫hash函數的效率.

    11 struct str_hash{      //自寫hash函數
    12     size_t operator()(const string& str) const
    13     {
    14         unsigned long __h = 0;
    15         for (size_t i = 0 ; i < str.size() ; i ++)
    16         {
    17             __h = 107*__h + str[i];
    18         }
    19         return size_t(__h);
    20     }
    21 };

    23 //struct str_hash{    //自帶的string hash函數
    24 //        size_t operator()(const string& str) const
    25 //      {
    26 //          return __stl_hash_string(str.c_str());
    27 //      }
    28 //};

    30 struct str_equal{      //string 判斷相等函數
    31     bool operator()(const string& s1,const string& s2) const
    32     {
    33         return s1==s2;
    34     }
    35 };

//用的時候
    37 int main(void)
    38 {
    39     vector<string> filtered_list;
    40     hash_map<string,int,str_hash,str_equal> file_map;
    41     map<string,int> file2_map;
    42     ifstream in("/dev/shm/list");
    43     time_t now1 = time(NULL);
    44     struct tm * curtime;
    45     curtime = localtime ( &now1 );
    46     cout<<now1<<endl;
    47     char ctemp[20];
    48     strftime(ctemp, 20, "%Y-%m-%d %H:%M:%S" , curtime);
    49     cout<<ctemp<<endl;
    50     string temp;
    51     int i=0;
    52     if(!in)
    53     {
    54         cout<<"open failed!~"<<endl;
    55     }
    56     while(in>>temp)
    57     {
    58         string sub=temp.substr(0,65);
    59         if(file_map.find(sub)==file_map.end())
    60 //      if(file2_map.find(sub)==file2_map.end())
    61         {
    62             file_map[sub]=i;
    63 //          file2_map[sub]=i;
    64             filtered_list.push_back(temp);
    65             i++;
    66 //          cout<<sub<<endl;
    67         }
    68     }
    69     in.close();
    70     cout<<"the total unique file number is:"<<i<<endl;
    71     ofstream out("./file_list");
    72     if(!out)
    73     {
    74         cout<<"failed open"<<endl;
    75     }
    76     for(int j=0;j<filtered_list.size();j++)
    77     {
    78         out<<filtered_list[j]<<endl;
    79     }
    80     time_t now2=time(NULL);
    81     cout<<now2<<endl;
    82     curtime = localtime ( &now2 );
    83     strftime(ctemp, 20, "%Y-%m-%d %H:%M:%S" , curtime);
    84     cout<<now2-now1<<"\t"<<ctemp<<endl;
    85     return 0;
    86 }

引用

得出來的結論是:(文件list有106W,去重後有51W)
   1.map完成去重耗時34秒
   2.hash_map用系統自帶的函數,耗時22秒
   3.hash_map用本身寫的函數,耗時14秒
      測試結果充分說明了hash_map比map的優點,另外,不一樣的hash函數對性能的提高也是不一樣的,上述hash函數爲一同窗,測試N多數據後得出的經驗函數.
能夠預見,當數量級越大時越能體現出hash_map的優點來!~

固然最後做者的結論是錯誤的，hash_map的原理理解錯誤！從第一個朋友的回答就能夠體會到這個問題！

最後對於C++Builder用戶，應該經過如下方法添加:
#include "stlport\hash_map"
才能夠正確的使用hash_map

本文來自CSDN博客，轉載請標明出處：http://blog.csdn.net/skyremember/archive/2008/09/18/2941076.aspx