A persistent, network resilient, full text search library for the browser and Node.jspython
https://github.com/fergiemcdowall/norchnginx
https://github.com/fergiemcdowall/search-indexc++
使用的是levelDB存儲索引,不過目前沒有明白,對於倒排索引,其是否適合?相似思路還有:git
https://github.com/patrickfrey/strusgithub
Library implementing the storage and the query evaluation for a text search engine. It uses on a key value store database interface to store its data. Currently there exists an implementation based on the google LevelDB library. http://www.project-strus.netweb
用法:https://www.codeproject.com/articles/1059766/building-a-search-engine-with-python-tornado-and-sredis
go寫的,評分比較高的,也是levelDB實現底層存儲:sql
https://github.com/blevesearch/bleve 功能強大 支持facet等mongodb
https://github.com/huichen/wukong 2000+star 貌似使用的boltBD存儲 支持了分佈式 源碼分析在https://ayende.com/blog/171745/code-reading-wukong-full-text-search-engine 能夠看到其使用的內存 持久化支持kv DB數據庫
https://github.com/basho/riak_search/ 應該使用的是riak(kv存儲)中sets來存儲倒排索引 不過是erlang語言寫的讓人很憂傷啊
https://github.com/victorparmar/zsearch 也是levelDB作的,看起來很牛叉的樣子,Low Data fragmentation and good random write performance by using levelDB Log Structured Merge Trees. High performance query speed by using CompressedBitmap to store DocumentIds in an InvertedIndex interface provided by a simple libEvent2 http server.
https://github.com/daviddengcn/gcse 貌似也是levelDB
https://github.com/eugeneware/fulltext-engine
使用mongodb作倒排存儲的:
https://github.com/c-bata/gosearch 本質是底層addToSet 見這個便可知 https://github.com/c-bata/gosearch/blob/master/models/index.go
https://github.com/hemslo/poky-engine A simple search engine in python using Tornado, Scrapy, Redis and MongoDB
360的:
linkin的:
https://github.com/linkedin/indextank-engine 比較強大 支持facet等 使用內存和文件兩種方式作索引 有時間能夠好好研究下 底層文件應該支持壓縮
https://github.com/gigablast/open-source-search-engine http://www.gigablast.com/使用的搜索引擎 代碼是c++寫的不過看起來稍微有點凌亂 也支持索引持久化到數據庫
分佈式:
https://github.com/izenecloud/sf1r-lite 使用nginx作負載均衡,底層倒排貌似是用hadoop作的 架構圖 https://raw.githubusercontent.com/izenecloud/sf1r/master/docs/source/images/sf1r.png
值得一看的:
https://github.com/aaalgo/donkey
https://github.com/groonga/groonga Groonga is an open-source fulltext search engine and column store. It lets you write high-performance applications that requires fulltext search.列存儲?
ruby的:
https://github.com/mrkamel/search_cop 使用了SQL數據庫,支持SQL語句+全文搜索 Search engine like fulltext query support for ActiveRecord
使用lucene作全文搜索的:
go search網站的搜索引擎:(http://go-search.org/search?q=hello)
https://github.com/daviddengcn/gcse 用的是 https://github.com/daviddengcn/go-index 作索引 有時間能夠研究下後者
使用原始文件作倒排的:
https://github.com/bradleypeabody/fulltext Pure-Go full text indexer and search library
lucene的go移植版:
https://github.com/philipsoutham/golucy
https://github.com/ipfs-search/ipfs-search 使用ES5作搜索
用sqlite存倒排索引:
https://github.com/gansidui/gose
https://github.com/rsesek/usda-ndb 搜食品成分的
https://github.com/devict/magopie 搜BT種子的
https://github.com/yieldbot/ferret Ferret is a search engine that unifies search results from Github, Slack, Trello and more
https://github.com/ndmitchell/hoogle haskell寫的
https://github.com/BitFunnel/BitFunnel
https://github.com/Maxime2/dataparksearch
其餘:
https://github.com/KunBetter/GridSearch real-time grid search engine 網格搜索引擎 不知道原理
https://github.com/kanatohodets/carbonsearch search engine for graphite metrics
https://github.com/carrot2/carrot2 可能用到了ES Solr
https://github.com/reyesr/fullproof 使用webSQL或者瀏覽器DB來存倒排索引的JS搜索引擎
https://github.com/nolanlawson/pouchdb-quick-search 使用小型數據庫的離線搜索例如phonegap、app等
https://github.com/legendary001/SearchEngine 使用hadoop+lucene的搜索引擎
不過按照個人觀點看,搜索引擎本質上是針對各個field的特定搜索word的列存儲。因此其底層實現用tokuDB線性樹結構應該更合適,日誌的話搜索使用時間序列存儲更合適。