Tair支撐了淘寶幾乎全部系統的緩存信息(Tair = Taobao Pair,Pair即Key-Value鍵值對),內置了三個存儲引擎:mdb(默認,相似於Memcache)、rdb(相似於Redis)、ldb(高性能KV存儲),其中前2者定位於cache緩存,ldb則定位於持久化存儲。Tair屬於分佈式系統,由一箇中心控制節點(Config Server)和一系列的服務節點(Data Server)組成,Config Server負責管理維護全部的Data Server狀態信息。Data Server對外提供各類數據服務,並以心跳(Heartbeat)的形式將自身情況彙報給Config Server。Config Server是一個輕量級的控制點,能夠採用Master-Slaver的形式來保證其可靠性,全部的Data Server地位都是等價的。持久化的數據存放於磁盤中,爲了解決磁盤損壞致使的數據丟失, Tair能夠配置數據的備份數目, 自動將一份數據的不一樣備份放到不一樣的主機上。linux
一、系統環境:CentOS 6.5 (64位)算法
二、Tair Server 源碼採用 C++ 編寫,如下基於源碼在linux環境下 make installjson
1)wiki中描述須要安裝依賴automake 、autoconfig、libtool庫,實際CentOS6.5已經集成了以上lib,若是沒有請經過yum獲取緩存
命令: 「 yum install libtool 」服務器
命令: 「 yum install boost-devel 」app
3)安裝gcc-c++ 負載均衡
命令: 「 yum install gcc-c++ 」
四、經過svn獲取 tb-common-utils 和 Tair 源碼
tb-common-utils 的 SVN地址: http://code.taobao.org/svn/tb-common-utils/trunk
Tair 的 SVN地址: http://code.taobao.org/svn/tair/trunk
2)給build.sh 腳本文件增長可執行X權限,不然可能會提示「權限不夠」
命令: 「 chmod +x build.sh 」
命令: export TBLIB_ROOT="/home/glf/tairlib"
2)給bootstrap.sh 腳本文件增長可執行X權限,不然可能提示「權限不夠"
命令: 「 chmod +x bootstrap.sh 」
3)執行 bootstrap.sh 腳本
4)執行 configure
5)執行 make
6)執行 make install ,安裝成功後會把tair安裝到 /root/tair_bin ,至此安裝tair 服務器完畢。
名稱 | IP | 端口 |
Config Server(Master) | | 5198 |
Config Server(Slaver) | | 5200 |
Data Server A | | 5191 |
Data Server B | | 5192 |
關於IP、端口如何配置請參照下面的conf文件,但須要注意的是Config Server的心跳端口(Heartbeat Port)爲Port+1,
例如Port=5198那麼Heartbeat Port默認=5199,因此在配置其餘端口的時候注意預留,不要重複。
二、CP 4份tair_bin文件夾,依次重命名以下圖,後面都在4個CP的目錄修改配置,原tair_bin保留不做任何修改( 好不容易纔裝起來的呀:) )。
tair_bin_cs1:做爲Config Server(Master)目錄
tair_bin_cs2:做爲Config Server(Slaver)目錄
tair_bin_ds1:做爲Data Server A 目錄
tair_bin_ds2:做爲Data Server B 目錄
三、依次在CP的4個目錄下建立(mkdir) data 和 logs 文件夾,用於設置相關配置文件中的路徑(不肯定是否必須,也有可能服務啓動的時候會根據conf設置的路徑自動建立)
四、每一個Server的etc 目錄下都包含如下文件(安裝時建立的sample文件),相關文件中已經存在對應的配置項解釋說明,
「configserver.conf.default」 (Config Server使用)
「dataserver.conf.default」 (Data Server使用)
「group.conf.default」 (Config Server使用)
「invalserver.conf.default」 (暫未使用)
五、配置Config Server
1)在 tair_bin_cs1 和 tair_bin_cs2 的etc目錄下將 「configserver.conf.default」 重命名爲「configserver.conf」,將「group.conf.default」重命名爲「group.conf」,做爲服務器的正式配置文件。
2)打開 tair_bin_cs1\etc\configserver.conf 參照以下代碼進行配置,其中第一行config_server爲master服務器,第二行爲slaver服務器,
使用絕對路徑修改 log_file、pid_file、goup_file、data_dir 目錄,使用 ifconfig 命名查看當前網卡的dev_name和ip,如下修改過的內容用紅色字體標識。
# # tair 2.3 --- configserver config # [public] config_server= config_server= [configserver] port=5198 log_file=/root/tair_bin_cs1/logs/config.log pid_file=/root/tair_bin_cs1/logs/config.pid log_level=warn group_file=/root/tair_bin_cs1/etc/group.conf data_dir=/root/tair_bin_cs1/data/data dev_name=eth1
3)在 tair_bin_cs1\etc\group.conf 參照以下代碼進行配置,主要用於註冊DataSever服務器的IP和Port
#group name
[group_1] # data move is 1 means when some data serve down, the migrating will be start. # default value is 0
#_min_data_server_count: when data servers left in a group less than this value, config server will stop serve for this group #default value is copy count.
_build_strategy=1 #1 normal 2 rack
_build_diff_ratio=0.6 #how much difference is allowd between different rack # diff_ratio = |data_sever_count_in_rack1 - data_server_count_in_rack2| / max (data_sever_count_in_rack1, data_server_count_in_rack2) # diff_ration must less than _build_diff_ratio
_pos_mask=65535 # 65535 is 0xffff this will be used to gernerate rack info. 64 bit serverId & _pos_mask is the rack info,
_copy_count=1 _bucket_number=1023
# accept ds strategy. 1 means accept ds automatically
# data center A
_server_list= _server_list=
#quota info
4)Config Server(Slave)的配置也基本一致,注意修改conf文件中的路徑、ip
configserver.conf 參照以下:
# # tair 2.3 --- configserver config # [public] config_server= config_server= [configserver] port=5200 log_file=/root/tair_bin_cs2/logs/config.log pid_file=/root/tair_bin_cs2/logs/config.pid log_level=warn group_file=/root/tair_bin_cs2/etc/group.conf data_dir=/root/tair_bin_cs2/data/data dev_name=eth1
group.conf 參照以下:
#group name [group_1] # data move is 1 means when some data serve down, the migrating will be start. # default value is 0 _data_move=0 #_min_data_server_count: when data servers left in a group less than this value, config server will stop serve for this group #default value is copy count. _min_data_server_count=1 #_plugIns_list=libStaticPlugIn.so _build_strategy=1 #1 normal 2 rack _build_diff_ratio=0.6 #how much difference is allowd between different rack # diff_ratio = |data_sever_count_in_rack1 - data_server_count_in_rack2| / max (data_sever_count_in_rack1, data_server_count_in_rack2) # diff_ration must less than _build_diff_ratio _pos_mask=65535 # 65535 is 0xffff this will be used to gernerate rack info. 64 bit serverId & _pos_mask is the rack info, _copy_count=1 _bucket_number=1023 # accept ds strategy. 1 means accept ds automatically _accept_strategy=1 # data center A _server_list= _server_list= #quota info _areaCapacity_list=0,1124000;
5)至此2臺 Config Server 配置完畢
六、配置Data Server(默認爲 mdb 引擎)
1)在2個Data Server的 etc 目錄下將 「dataserver.conf.default」 重命名爲「dataserver.conf」,做爲服務器的正式配置文件
2)打開 tair_bin_ds1\etc\dataserver.conf 參照以下代碼進行修改配置,注意[public]節點的2行config_server必須和configserver上的配置保持一致
# # tair 2.3 --- tairserver config # [public] config_server= config_server= [tairserver] # #storage_engine: # # mdb # kdb # ldb # storage_engine=mdb local_mode=0 # #mdb_type: # mdb # mdb_shm # mdb_type=mdb_shm # # if you just run 1 tairserver on a computer, you may ignore this option. # if you want to run more than 1 tairserver on a computer, each tairserver must have their own "mdb_shm_path" # # mdb_shm_path=/mdb_shm_path01 #tairserver listen port port=5191 heartbeat_port=6191 process_thread_num=16 # #mdb size in MB # slab_mem_size=1024 log_file=/root/tair_bin_ds1/logs/server.log pid_file=/root/tair_bin_ds1/logs/server.pid log_level=warn dev_name=eth1 ulog_dir=/root/tair_bin_ds1/data/ulog ulog_file_number=3 ulog_file_size=64 check_expired_hour_range=2-4 check_slab_hour_range=5-7 dup_sync=1 do_rsync=0 # much resemble json format # one local cluster config and one or multi remote cluster config. # {local:[master_cs_addr,slave_cs_addr,group_name,timeout_ms,queue_limit],remote:[...],remote:[...]} rsync_conf={local:[,,group_local,2000,1000],remote:[,,group_remote,2000,3000]} # if same data can be updated in local and remote cluster, then we need care modify time to # reserve latest update when do rsync to each other. rsync_mtime_care=0 # rsync data directory(retry_log/fail_log..) rsync_data_dir=/root/tair_bin_ds1/data/remote # max log file size to record failed rsync data, rotate to a new file when over the limit rsync_fail_log_size=30000000 # whether do retry when rsync failed at first time rsync_do_retry=0 # when doing retry, size limit of retry log's memory use rsync_retry_log_mem_size=100000000 [fdb] # in MB index_mmap_size=30 cache_size=256 bucket_size=10223 free_block_pool_size=8 data_dir=/root/tair_bin_ds1/data/fdb fdb_name=tair_fdb [kdb] # in byte map_size=10485760 # the size of the internal memory-mapped region bucket_size=1048583 # the number of buckets of the hash table record_align=128 # the power of the alignment of record size data_dir=/root/tair_bin_ds1/data/kdb # the directory of kdb's data [ldb] #### ldb manager config ## data dir prefix, db path will be data/ldbxx, "xx" means db instance index. ## so if ldb_db_instance_count = 2, then leveldb will init in ## /data/ldb1/ldb/, /data/ldb2/ldb/. We can mount each disk to ## data/ldb1, data/ldb2, so we can init each instance on each disk. data_dir=/root/tair_bin_ds1/data/ldb ## leveldb instance count, buckets will be well-distributed to instances ldb_db_instance_count=1 ## whether load backup version when startup. ## backup version may be created to maintain some db data of specifid version. ldb_load_backup_version=0 ## whether support version strategy. ## if yes, put will do get operation to update existed items's meta info(version .etc), ## get unexist item is expensive for leveldb. set 0 to disable if nobody even care version stuff. ldb_db_version_care=1 ## time range to compact for gc, 1-1 means do no compaction at all ldb_compact_gc_range = 3-6 ## backgroud task check compact interval (s) ldb_check_compact_interval = 120 ## use cache count, 0 means NOT use cache,`ldb_use_cache_count should NOT be larger ## than `ldb_db_instance_count, and better to be a factor of `ldb_db_instance_count. ## each cache mdb's config depends on mdb's config item(mdb_type, slab_mem_size, etc) ldb_use_cache_count=1 ## cache stat can't report configserver, record stat locally, stat file size. ## file will be rotate when file size is over this. ldb_cache_stat_file_size=20971520 ## migrate item batch size one time (1M) ldb_migrate_batch_size = 3145728 ## migrate item batch count. ## real batch migrate items depends on the smaller size/count ldb_migrate_batch_count = 5000 ## comparator_type bitcmp by default # ldb_comparator_type=numeric ## numeric comparator: special compare method for user_key sorting in order to reducing compact ## parameters for numeric compare. format: [meta][prefix][delimiter][number][suffix] ## skip meta size in compare # ldb_userkey_skip_meta_size=2 ## delimiter between prefix and number # ldb_userkey_num_delimiter=: #### ## use blommfilter ldb_use_bloomfilter=1 ## use mmap to speed up random acess file(sstable),may cost much memory ldb_use_mmap_random_access=0 ## how many highest levels to limit compaction ldb_limit_compact_level_count=0 ## limit compaction ratio: allow doing one compaction every ldb_limit_compact_interval ## 0 means limit all compaction ldb_limit_compact_count_interval=0 ## limit compaction time interval ## 0 means limit all compaction ldb_limit_compact_time_interval=0 ## limit compaction time range, start == end means doing limit the whole day. ldb_limit_compact_time_range=6-1 ## limit delete obsolete files when finishing one compaction ldb_limit_delete_obsolete_file_interval=5 ## whether trigger compaction by seek ldb_do_seek_compaction=0 ## whether split mmt when compaction with user-define logic(bucket range, eg) ldb_do_split_mmt_compaction=0 #### following config effects on FastDump #### ## when ldb_db_instance_count > 1, bucket will be sharded to instance base on config strategy. ## current supported: ## hash : just do integer hash to bucket number then module to instance, instance's balance may be ## not perfect in small buckets set. same bucket will be sharded to same instance ## all the time, so data will be reused even if buckets owned by server changed(maybe cluster has changed), ## map : handle to get better balance among all instances. same bucket may be sharded to different instance based ## on different buckets set(data will be migrated among instances). ldb_bucket_index_to_instance_strategy=map ## bucket index can be updated. this is useful if the cluster wouldn't change once started ## even server down/up accidently. ldb_bucket_index_can_update=1 ## strategy map will save bucket index statistics into file, this is the file's directory ldb_bucket_index_file_dir=/root/tair_bin_ds1/data/bindex ## memory usage for memtable sharded by bucket when batch-put(especially for FastDump) ldb_max_mem_usage_for_memtable=3221225472 #### #### leveldb config (Warning: you should know what you're doing.) ## one leveldb instance max open files(actually table_cache_ capacity, consider as working set, see `ldb_table_cache_size) ldb_max_open_files=65535 ## whether return fail when occure fail when init/load db, and ## if true, read data when compactiong will verify checksum ldb_paranoid_check=0 ## memtable size ldb_write_buffer_size=67108864 ## sstable size ldb_target_file_size=8388608 ## max file size in each level. level-n (n > 0): (n - 1) * 10 * ldb_base_level_size ldb_base_level_size=134217728 ## sstable's block size # ldb_block_size=4096 ## sstable cache size (override `ldb_max_open_files) ldb_table_cache_size=1073741824 ##block cache size ldb_block_cache_size=16777216 ## arena used by memtable, arena block size #ldb_arenablock_size=4096 ## key is prefix-compressed period in block, ## this is period length(how many keys will be prefix-compressed period) # ldb_block_restart_interval=16 ## specifid compression method (snappy only now) # ldb_compression=1 ## compact when sstables count in level-0 is over this trigger ldb_l0_compaction_trigger=1 ## write will slow down when sstables count in level-0 is over this trigger ## or sstables' filesize in level-0 is over trigger * ldb_write_buffer_size if ldb_l0_limit_write_with_count=0 ldb_l0_slowdown_write_trigger=32 ## write will stop(wait until trigger down) ldb_l0_stop_write_trigger=64 ## when write memtable, max level to below maybe ldb_max_memcompact_level=3 ## read verify checksum ldb_read_verify_checksums=0 ## write sync log. (one write will sync log once, expensive) ldb_write_sync=0 ## bits per key when use bloom filter #ldb_bloomfilter_bits_per_key=10 ## filter data base logarithm. filterbasesize=1<<ldb_filter_base_logarithm #ldb_filter_base_logarithm=12
1) * 10 * ldb_base_level_size ldb_base_level_size=134217728 ## sstable's block size # ldb_block_size=4096 ## sstable cache size (override `ldb_max_open_files) ldb_table_cache_size=1073741824 ##block cache size ldb_block_cache_size=16777216 ## arena used by memtable, arena block size #ldb_arenablock_size=4096 ## key is prefix-compressed period in block, ## this is period length(how many keys will be prefix-compressed period) # ldb_block_restart_interval=16 ## specifid compression method (snappy only now) # ldb_compression=1 ## compact when sstables count in level-0 is over this trigger ldb_l0_compaction_trigger=1 ## write will slow down when sstables count in level-0 is over this trigger ## or sstables' filesize in level-0 is over trigger * ldb_write_buffer_size if ldb_l0_limit_write_with_count=0 ldb_l0_slowdown_write_trigger=32 ## write will stop(wait until trigger down) ldb_l0_stop_write_trigger=64 ## when write memtable, max level to below maybe ldb_max_memcompact_level=3 ## read verify checksum ldb_read_verify_checksums=0 ## write sync log. 4)至此2臺 Data Server 也配置完畢
一、經過終端任意選其中一臺Server執行 set_shm.sh(須要root權限),修改系統分配內存策略,確保程序可以使用足夠的共享內存
命令: 「 ./set_shm.sh 」
二、分別經過終端進入2臺Data Server目錄,執行 tair.sh 腳本啓動服務器,注意:請先啓動DataServer,後啓動ConfigServer,相關解釋見wiki
此時因爲Config Server尚未啓動,Log會出現heartbeat錯誤。
[2014-12-19 19:28:42.336703] ERROR handlePacket (heartbeat_thread.cpp:141) [140335215126272] ControlPacket, cmd:3
[2014-12-19 19:28:43.341952] ERROR handlePacket (heartbeat_thread.cpp:141) [140335215126272] ControlPacket, cmd:2 [2014-12-19 19:28:43.341982] ERROR handlePacket (heartbeat_thread.cpp:141) [140335215126272] ControlPacket, cmd:3 [2014-12-19 19:28:44.345308] WARN update_server_table (tair_manager.cpp:1397) [140334767929088] updateServerTable, size: 2046 [2014-12-19 19:28:44.345312] WARN handlePacket (heartbeat_thread.cpp:212) [140335215126272] config server HOST UP: [2014-12-19 19:28:44.345350] WARN handlePacket (heartbeat_thread.cpp:212) [140335215126272] config server HOST UP:
命令: 「 ./tair.sh start_ds 」
三、分別經過終端進入2臺Config Server目錄,經過執行 tair.sh 腳原本啓動服務器
命令: 「 ./tair.sh start_cs 」
二、經過終端進入任意一個Server程序目錄的 sbin 文件夾,執行 tairclient 命令鏈接Tair服務器,-c參數表示鏈接configserver,-s參數表示鏈接dataserver,
參照下圖命令鏈接,「group_1」 是group.conf配置文件中的默認值。
鏈接到 tair configserver 後能夠經過 put命令 新增一個鍵值對,例如:put key1 hellowold,而後經過get命令取值,以下圖:
也能夠鏈接到tair dataserver後能夠查看單獨某個ds上的數據,下圖能夠看出當鏈接到Port=5191的DataServer能夠取到key1的值,5192的DS上則不能取到值。
Tair ConfigServer負載均衡採用一致性哈希算法進行數據分配,更多相關內容請查閱wiki。