一些開源搜索引擎實現——倒排使用原始文件，列存儲Hbase，KV store如levelDB、mongoDB、redis，以及SQL的，如sqlite或者xxSQL

時間 2019-11-08

標籤一些開源搜索引擎實現使用原始文件存儲 hbase store leveldb mongodb redis 以及 sql sqlite 或者 xxsql 欄目搜索引擎简体版

原文原文鏈接

本文說明：除開ES，Solr，sphinx系列的其餘開源搜索引擎彙總於此。

A search engine based on Node.js and LevelDB

A persistent, network resilient, full text search library for the browser and Node.jspython

https://github.com/fergiemcdowall/norchnginx

https://github.com/fergiemcdowall/search-indexc++

使用的是levelDB存儲索引，不過目前沒有明白，對於倒排索引，其是否適合？相似思路還有：git

https://github.com/patrickfrey/strusgithub

Library implementing the storage and the query evaluation for a text search engine. It uses on a key value store database interface to store its data. Currently there exists an implementation based on the google LevelDB library. http://www.project-strus.netweb

用法：https://www.codeproject.com/articles/1059766/building-a-search-engine-with-python-tornado-and-sredis

go寫的，評分比較高的，也是levelDB實現底層存儲：sql

https://github.com/blevesearch/bleve 功能強大支持facet等mongodb

https://github.com/huichen/wukong 2000+star 貌似使用的boltBD存儲支持了分佈式源碼分析在https://ayende.com/blog/171745/code-reading-wukong-full-text-search-engine 能夠看到其使用的內存持久化支持kv DB數據庫

https://github.com/basho/riak_search/ 應該使用的是riak（kv存儲）中sets來存儲倒排索引不過是erlang語言寫的讓人很憂傷啊

https://github.com/victorparmar/zsearch 也是levelDB作的，看起來很牛叉的樣子，Low Data fragmentation and good random write performance by using levelDB Log Structured Merge Trees. High performance query speed by using CompressedBitmap to store DocumentIds in an InvertedIndex interface provided by a simple libEvent2 http server.

https://github.com/daviddengcn/gcse 貌似也是levelDB

https://github.com/eugeneware/fulltext-engine

使用mongodb作倒排存儲的：

https://github.com/c-bata/gosearch 本質是底層addToSet 見這個便可知 https://github.com/c-bata/gosearch/blob/master/models/index.go

https://github.com/hemslo/poky-engine A simple search engine in python using Tornado, Scrapy, Redis and MongoDB

360的：

A search engine which can hold 100 trillion lines of log data.

用的是hdfs存儲，MR來作併發，他號稱針對日誌搜索，其元數據是放在redis NoSQL裏，倒排索引放的是Hbase，這樣看來本質上是列存儲！待看！

https://github.com/Qihoo360/poseidon

linkin的：

https://github.com/linkedin/indextank-engine 比較強大支持facet等使用內存和文件兩種方式作索引有時間能夠好好研究下底層文件應該支持壓縮

https://github.com/gigablast/open-source-search-engine http://www.gigablast.com/使用的搜索引擎代碼是c++寫的不過看起來稍微有點凌亂也支持索引持久化到數據庫

分佈式：

https://github.com/izenecloud/sf1r-lite 使用nginx作負載均衡，底層倒排貌似是用hadoop作的架構圖 https://raw.githubusercontent.com/izenecloud/sf1r/master/docs/source/images/sf1r.png

值得一看的：

https://github.com/aaalgo/donkey

https://github.com/groonga/groonga Groonga is an open-source fulltext search engine and column store. It lets you write high-performance applications that requires fulltext search.列存儲？

ruby的：

https://github.com/mrkamel/search_cop 使用了SQL數據庫，支持SQL語句+全文搜索 Search engine like fulltext query support for ActiveRecord

使用lucene作全文搜索的：

CrateDB: The fast, scalable, easy to use SQL database with native full text search https://crate.io

http://www.opensearchserver.com/

yacy

go search網站的搜索引擎：（http://go-search.org/search?q=hello）

https://github.com/daviddengcn/gcse 用的是 https://github.com/daviddengcn/go-index 作索引有時間能夠研究下後者

使用原始文件作倒排的：

https://github.com/bradleypeabody/fulltext Pure-Go full text indexer and search library

https://github.com/dchest/static-search 搜索本地文件

https://github.com/getwe/cse 用的是https://github.com/getwe/goose 本質上是原始文件倒排是百度的一個工程師寫的 http://getwe.cn/%E6%8A%80%E6%9C%AF/%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E/goose/database-diskindex/

使用redis作倒排的：

https://github.com/hymloth/pyredise/

lucene的go移植版：

https://github.com/philipsoutham/golucy

https://github.com/ipfs-search/ipfs-search 使用ES5作搜索

用sqlite存倒排索引：

https://github.com/gansidui/gose

尚不知內在原理的：

https://github.com/gigablast/open-source-search-engine

https://github.com/sourcegraph/thesrc 源碼搜索但尚未看出其使用的搜索引擎

https://github.com/rsesek/usda-ndb 搜食品成分的

https://github.com/devict/magopie 搜BT種子的