最近公司系統中全模糊查詢不少,數據量又大,多表鏈接查詢時會很影響性能。因而考慮使用搜索引擎來作全模糊查詢,思路:java
mysql數據庫數據同步至ES類型,同步採用全量同步+定時增量方式,應用直接從ES中去查詢想要的結果。node
經過一番查找,決定使用elasticsearch-jdbc進行數據的同步,五六張表的鏈接結果,在數據量小的開發與測試環境運行正常,但在數據量比較大的性能測試環境作數據同步的話就會出現問題,如下爲同步時報的一些錯,github上也未找到相關有用的東西。羣裏問也都沒人這樣使用。mysql
一種報錯爲鏈接提交時錯誤,如下爲截取部分報錯信息git
[13:17:17,678][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 7 minutes 59 seconds = 479982 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:17:17,678][INFO ][metrics.sink.plain ][pool-3-thread-1] 7 minutes 59 seconds = 479335 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,589][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 8 minutes 44 seconds = 524264 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,589][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622247 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,590][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622895 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,590][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622247 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,595][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622900 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,598][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622256 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,599][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622904 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,599][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622257 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,618][WARN ][importer.jdbc.source.standard][pool-2-thread-1] while closing read connection: Communications link failure during commit(). Transaction resolution unknown.
github
另外一種爲線程內存溢出:sql
[17:42:34,243][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 8 minutes 30 seconds = 510305 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:43:33,523][INFO ][metrics.sink.plain ][pool-5-thread-1] 8 minutes 36 seconds = 516618 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:46:00,561][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 11 minutes 19 seconds = 679116 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:47:27,876][INFO ][metrics.sink.plain ][pool-5-thread-1] 12 minutes 37 seconds = 757511 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:48:23,974][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 13 minutes 37 seconds = 817186 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
Exception in thread "pool-5-thread-1" Exception in thread "elasticsearch[importer][generic][T#3]" java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2035)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.SynchronousQueue$TransferStack.snode(SynchronousQueue.java:318)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:361)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: pool did not terminate
at org.xbib.tools.JDBCImporter.shutdown(JDBCImporter.java:265)
at org.xbib.tools.JDBCImporter$2.run(JDBCImporter.java:322)
數據庫
需想方法來解決批量數據從mysql同步到ES的方法,查到ES有bulk_request api,但這個都是對文本或日誌進行批量處理導入,貌似還未見使用於mysql數據庫近實時同步。
api
內存溢出問題經過增長操做系統內存,調整同步腳本中每次同步記錄數值來解決。elasticsearch