目前主要有如下幾種數據插入方式:(轉自:如何將大規模數據導入Neo4j)
Cypher CREATE 語句,爲每一條數據寫一個CREATE
Cypher LOAD CSV 語句,將數據轉成CSV格式,經過LOAD CSV讀取數據。
官方提供的Java API —— Batch Inserter
大牛編寫的 Batch Import 工具
官方提供的 neo4j-import 工具javascript
這邊重點來講一下官方最快的neo4j-import
,使用的前提條件:css
比較適用:html
首次導入,沒法迭代更新
來看一下官方案例:Use the Import tool java
bin\neo4j start
bin\neo4j stop
bin\neo4j restart
bin\neo4j status
neo4j-admin
的參數:控制內存來源:10.5. Memory recommendations
node
neo4j-admin memrec [--memory=<memory dedicated to Neo4j>] [--database=<name>]
Option | Default | Description |
---|---|---|
–memory | The memory capacity of the machine | The amount of memory to allocate to Neo4j. Valid units are: k, K, m, M, g, G. |
–database | graph.db | The name of the database. This option will generate numbers for Lucene indexes, and for data volume and native indexes in the database. These can be used as an input into more detailed memory analysis. |
參考:linux
還有--pagecache
單條命令指定緩存:git
指的是,再該條導入數據的指令下,緩存設置。github
neo4j-admin
的參數:Dump and load databases - 線下備份執行該兩步操做,須要關閉數據庫。參考:10.7. Dump and load databasesweb
graph.db
轉存到.dump
須要關閉數據庫sql
$neo4j-home> bin/neo4j-admin dump --database=graph.db --to=/backups/graph.db/2016-10-02.dump
$neo4j-home> ls /backups/graph.db
$neo4j-home> 2016-10-02.dump
.dump
load進來好像能夠不用關閉
$neo4j-home> bin/neo4j stop
Stopping Neo4j.. stopped
$neo4j-home> bin/neo4j-admin load --from=/backups/graph.db/2016-10-02.dump --database=graph.db --force
若是帶--force
,那麼load以後,會更新全部的存在着的.db(any existing database gets overwritten.
)
neo4j-admin
的參數:backup and restore - 在線備份$neo4j-home> export HEAP_SIZE=2G
$neo4j-home> mkdir /mnt/backup
$neo4j-home> bin/neo4j-admin backup --from=192.168.1.34 --backup-dir=/mnt/backup --name=graph.db-backup --pagecache=4G
backup
進臨時文件夾之中。
$neo4j-home> export HEAP_SIZE=2G
$neo4j-home> bin/neo4j-admin backup --from=192.168.1.34 --backup-dir=/mnt/backup --name=graph.db-backup --fallback-to-full=true --check-consistency=true --pagecache=4G
.
movies.csv.
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
其中,title是屬性,注意此時須要有雙引號;year:int也是屬性,只不過該屬性是數值型的;
:LABEL
與:ID
同樣生成了一個新節點,也就是一套數據能夠經過:
生成雙節點
actors.csv.
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
roles.csv.
其中,:LABEL
很是有意思,是節點的附屬屬性,其中personId:ID
必定是惟一的,:LABEL
能夠不惟一。
並且,載入以後,:LABEL
單獨會成爲新的節點,並且是去重的。
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
其中,這個節點的屬性,role沒有標註:
,role是屬性,能夠加雙引號,也能夠不加。最好是指定一下格式,譬如:int
爲數值型,還有字符型roles:string[]
linux執行:
neo4j_home$ bin/neo4j-admin import --nodes import/movies.csv --nodes import/actors.csv --relationships import/roles.csv
其中,以前老版本批量導入是:neo4j-import
,如今批量導入是:neo4j-admin
。
window執行:
neo4j-import.bat --into ../data/databases/graph.db --id-type string --nodes:attribute ../import/node_attribute.csv --relationships ../import/product_SecondLeaf.csv --relationships ../import/scene_isDemond.csv
--into
,是指定存入名字,在不一樣的嘗試,能夠修更名字。--nodes:attribute
,其中,nodes:
後面是用來指定節點大類的名稱的--id-type string
,,The –id-type string is indicating that all :ID columns contain alphanumeric values (there is an optimization for numeric-only id’s).以前節點ID只能由數字組成,如今容許字符+數字
共同定義。linux最後啓動:
./bin/neo4j start
window 最後啓動:
neo4j.bat console
1 報錯信息留存在bad.log
\data\databases\graph.db\bad.log
global id space
的報錯爲節點未定義,或者節點重複
2 若是節點不惟一,直接報錯:
global id space
,同時後續的內容中端上傳,須要刪除data/database /graph.db
,從新操做一遍
主要來源於:B.2. Use the Import tool
若是導入的節點信息爲:
:START_ID;role;:END_ID;:TYPE
keanu;'Neo';tt0133093;ACTED_IN keanu;'Neo';tt0234215;ACTED_IN
那麼能夠經過--delimiter
來進行指定。
neo4j_home$ bin/neo4j-admin import --nodes import/movies2.csv --nodes import/actors2.csv --relationships import/roles2.csv --delimiter ";" --array-delimiter "|" --quote "'"
movies5a.csv.
movieId:ID,title,year:int
tt0133093,"The Matrix",1999
sequels5a.csv.
movieId:ID,title,year:int
tt0234215,"The Matrix Reloaded",2003
tt0242653,"The Matrix Revolutions",2003
actors5a.csv.
personId:ID,name
keanu,"Keanu Reeves"
laurence,"Laurence Fishburne"
carrieanne,"Carrie-Anne Moss"
執行語句:
neo4j_home$ bin/neo4j-admin import --nodes:Movie import/movies5a.csv --nodes:Movie:Sequel import/sequels5a.csv --nodes:Actor import/actors5a.csv
執行的時候,把movies5a.csv
定義一個節點名字nodes:Movie
;
在sequels5a.csv
定義節點名字有兩個::Movie:Sequel
。
roles5b.csv.
:START_ID,role,:END_ID
keanu,"Neo",tt0133093
keanu,"Neo",tt0234215
keanu,"Neo",tt0242653
laurence,"Morpheus",tt0133093
laurence,"Morpheus",tt0234215
laurence,"Morpheus",tt0242653
carrieanne,"Trinity",tt0133093
執行內容:
neo4j_home$ bin/neo4j-admin import --relationships:ACTED_IN import/roles5b.csv
其中,:ACTED_IN
將關係名稱定義爲ACTED_IN
;同時定義關係的屬性也有role
節點數據集,標題:movies4-header.csv.
movieId:ID,title,year:int,:LABEL
節點數據集,內容模塊1:movies4-part1.csv.
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
節點數據集,內容模塊2:movies4-part2.csv.
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
關係數據集,標題:roles4-header.csv.
:START_ID,role,:END_ID,:TYPE
關係數據集,內容1:roles4-part1.csv.
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
關係數據集,內容2:roles4-part2.csv.
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
執行:
neo4j_home$ bin/neo4j-admin import --nodes "import/movies4-header.csv,import/movies4-part1.csv,import/movies4-part2.csv" --relationships "import/roles4-header.csv,import/roles4-part1.csv,import/roles4-part2.csv"
標題與內容單獨分開,而後由:標題,內容模塊1,內容模塊2
,分塊導入。
這個會比較常常出現,兩個節點集合中,擁有相同字段,若是不設置,就會出現報錯。
movies7.csv.
movieId:ID(Movie-ID),title,year:int,:LABEL
1,"The Matrix",1999,Movie
2,"The Matrix Reloaded",2003,Movie;Sequel
3,"The Matrix Revolutions",2003,Movie;Sequel
其中,(Movie-ID)
,是將ID進行標記
actors7.csv.
personId:ID(Actor-ID),name,:LABEL
1,"Keanu Reeves",Actor
2,"Laurence Fishburne",Actor
3,"Carrie-Anne Moss",Actor
roles7.csv.
:START_ID(Actor-ID),role,:END_ID(Movie-ID)
1,"Neo",1
1,"Neo",2
1,"Neo",3
2,"Morpheus",1
2,"Morpheus",2
2,"Morpheus",3
3,"Trinity",1
3,"Trinity",2
3,"Trinity",3
執行:
neo4j_home$ bin/neo4j-admin import --nodes import/movies7.csv --nodes import/actors7.csv --relationships:ACTED_IN import/roles7.csv
在關聯表中定義::START_ID(Actor-ID)
與:END_ID(Movie-ID)
,來指定相應的ID。
錯誤的關係出現:
roles8a.csv.
:START_ID,role,:END_ID,:TYPE
carrieanne,"Trinity",tt0242653,ACTED_IN emil,"Emil",tt0133093,ACTED_IN
譬如多出了節點,emil
此時執行:
neo4j_home$ bin/neo4j-admin import --nodes import/movies8a.csv --nodes import/actors8a.csv --relationships import/roles8a.csv --ignore-missing-nodes
其中的--ignore-missing-nodes
就是跳過報錯的節點,其中,錯誤信息會記錄在bad.log之中:
InputRelationship:
source: roles8a.csv:11
properties: [role, Emil]
startNode: emil (global id space)
endNode: tt0133093 (global id space)
type: ACTED_IN
referring to missing node emil
actors8b.csv.
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
laurence,"Laurence Harvey",Actor
在節點數據集actors8b.csv.
中,由重複的節點:laurence
須要執行:
neo4j_home$ bin/neo4j-admin import --nodes import/actors8b.csv --ignore-duplicate-nodes
其中,–ignore-duplicate-nodes就是重複節點忽略
會在bad.log之中顯示報錯:
Id 'laurence' is defined more than once in global id space, at least at actors8b.csv:3 and actors8b.csv:5
vv