知識圖譜聽起來很高大上,並且也應用普遍。而圖數據庫,你能夠到網上搜搜,基本就是像 neo4j, janusgraph, HugeGraph...
若是想讓作個相似的圖譜的東西,你會怎麼辦呢?一來就上真的圖譜真的好嗎?也許前期就三兩個關係鏈,也許只是業務試水,你就去搞個真的圖數據庫過來?是否是太浪費了。
是的,實際上前期咱們最好本身實現一些簡單的關係鏈維護便可。
那麼,爲了可以適應稍微的關係變化,也許咱們仍是須要效仿下圖數據庫的概念。那麼,如今的第一個問題就是:如何使用文字表述一個圖關係鏈?java
圖數據庫三大要素: 實體, 關係, 客體 。
實際上要解決這個問題倒也不難,只要本身定一種表示方法,本身能看懂就行,不去管其餘人。好比用 '1,2,3' 表明先1後2再3... 但實際上,想要表示稍微複雜點的結構,也許並非特別容易呢。並且,若是想要考慮後續可能的切真正的圖數據庫,爲什麼不參考下別人的標準呢?
好比如今通用些的,cypher, gremlin... 你們能夠網上搜索下資料,參考下來,好像cypher更形象化些,尤爲是各類箭頭的使用比較方便。
好比要表示A與的B的關係能夠是: (:A)-[:關係]->(:B)
而對於多個複雜關係,則能夠用多個相似的關係關聯起來就能夠了。
嗯,看起來不錯。表示的方式定好了,那麼咱們如何具體處理關係呢?node
以下圖所示,咱們有以下關係,應該如何定義字符表達方法,以達到配置的目的?數據庫
按照第1節中咱們定義的規範,咱們能夠用以下字符串表示。數據結構
(:PEOPLE)-[:養寵物]->(:CAT)-[:吃]->(:RICE) ,(:PEOPLE)-[:吃]->(:RICE) ,(:PEOPLE)-[:養寵物]->(:DOG) ,(:PEOPLE)-[:擁有]->(:HOUSE) ,(:PEOPLE)-[:幹活]->(:JOB) ,(:CAT)-[:朋友]->(:DOG) ,(:DOG)-[:吃]->(:RICE) ,(:JOB)-[:產出]->(:BRICK) ,(:HOUSE)<-[:構件]-(:BRICK) ,(:HOUSE)<-[:構件]-(:GLASS)
應該說仍是比較直觀的,基本上咱們只要按照圖所示的關係,描述出出入邊和關係就能夠了。並且還有相應的cypher官方規範支持,也不用寫文檔,你們就能夠很方便的接受了。mvc
如上,咱們已經用字符串表示出了關係了。但單是字符串,是並不能被應用理解的。咱們須要解析爲具體的數據結構,而後才能夠根據關係推導出具體的血緣依賴。這是本文的重點。app
實際也不復雜,咱們僅僅使用到了cypher中很是少的幾個元素表示法,因此也僅需解析出該幾個字符,而後在內存中構建出相應的關係便可。框架
具體代碼實現以下:ide
所謂框架就是總體流程管控代碼,它會讓你明白整個系統是如何work的。單元測試
import com.my.mvc.app.common.helper.graph.GraphNodeEntityTree; import com.my.mvc.app.common.helper.graph.NodeDiscoveryDirection; import com.my.mvc.app.common.helper.graph.VertexEdgeSchemaDescriptor; import com.my.mvc.app.common.helper.graph.VertexOrEdgeType; import com.my.mvc.app.common.util.CommonUtil; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 功能描述: 簡單圖語法解析器(類 cypher 語法) * * 請參考網上 cypher 資料 * */ public class SimpleGraphSchemaSyntaxParser { /** * 解析配置圖譜關係配置爲樹結構 * * @param cypherGraphSchema 類cypher語法的 關係表示語句 * @return 解析好的樹結構 */ public static GraphNodeEntityTree parseGraphSchemaAsTree(String cypherGraphSchema) { List<VertexEdgeSchemaDescriptor> flatNodeList = tokenize(cypherGraphSchema); return buildGraphAstTree(flatNodeList); } /** * 構建圖關係抽象語法樹 * * @param flatNodeList 平展的圖節點列表 * @return 構建好的實例 */ private static GraphNodeEntityTree buildGraphAstTree( List<VertexEdgeSchemaDescriptor> flatNodeList) { Map<String, GraphNodeEntityTree> uniqVertexContainer = new HashMap<>(); GraphNodeEntityTree root = new GraphNodeEntityTree(flatNodeList.get(0)); uniqVertexContainer.put(flatNodeList.get(0).getVertexLabelType(), root); GraphNodeEntityTree parent; GraphNodeEntityTree afterNode; for ( int i = 1; i < flatNodeList.size(); i++ ) { VertexEdgeSchemaDescriptor vertexOrEdge1 = flatNodeList.get(i); if(vertexOrEdge1.getNodeType() == VertexOrEdgeType.EDGE) { // 存在重複節點,需重建關係 VertexEdgeSchemaDescriptor vertexPrev = flatNodeList.get(i - 1); if(vertexPrev.getNodeType() != VertexOrEdgeType.VERTEX) { continue; } if(++i >= flatNodeList.size()) { throw new RuntimeException("缺乏客體關係配置, near 邊[" + vertexOrEdge1.getRawWord() + "]"); } VertexEdgeSchemaDescriptor relation = vertexOrEdge1; VertexEdgeSchemaDescriptor vertexAfter = flatNodeList.get(i); parent = uniqVertexContainer.get(vertexPrev.getVertexLabelType()); afterNode = uniqVertexContainer.get(vertexAfter.getVertexLabelType()); if(parent == null) { parent = root; uniqVertexContainer.putIfAbsent(vertexAfter.getVertexLabelType(), parent); } if(afterNode == null) { afterNode = new GraphNodeEntityTree(vertexAfter); uniqVertexContainer.put(vertexAfter.getVertexLabelType(), afterNode); } if(relation.getDirection() == NodeDiscoveryDirection.OUT) { parent.addOutVertex(afterNode, relation); } else { parent.addInVertex(afterNode, relation); } } } root.setUniqVertexTypes(uniqVertexContainer); return root; } /** * 拆分圖關係schema爲 可理解的邊和點 * * @param cypherGraphSchema 建關係語句,如 (:BASE_LABEL)-[:被組合引用]->(:COMPOSE_LABEL) * @return 拆解後的token列表 */ private static List<VertexEdgeSchemaDescriptor> tokenize(String cypherGraphSchema) { String[] relationArr = cypherGraphSchema.split(","); List<VertexEdgeSchemaDescriptor> flatNodeList = new ArrayList<>(); for (String relation1 : relationArr) { char[] src = relation1.trim().toCharArray(); for (int i = 0; i < src.length; i++) { char ch = src[i]; // 頂點 if(ch == '(') { StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String vertexLabel = CommonUtil.readSplitWord( src, i, ':', ')', false); flatNodeList.add(VertexEdgeSchemaDescriptor.newVertex( specNameBuilder.toString() + ":" + vertexLabel, vertexLabel)); i += vertexLabel.length() + 2; break; } specNameBuilder.append(nextCh); ++i; } continue; } // 邊關係, (:SRC)-[:RELATION]->(:DST) if(ch == '-' && i + 1 < src.length && src[i + 1] == '[') { ++i; StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String edgeLabel = CommonUtil.readSplitWord( src, i, ':', ']', false); int nextVertexStart = i + edgeLabel.length() + 2; if(nextVertexStart + 2 >= src.length) { throw new RuntimeException("血緣圖譜配置錯誤: 缺乏客體" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } if(src[++nextVertexStart] != '-' || src[++nextVertexStart] != '>') { throw new RuntimeException("血緣圖譜配置錯誤: 主體後面需緊跟關係 ->" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge( specNameBuilder.toString() + ":" + edgeLabel, edgeLabel, NodeDiscoveryDirection.OUT)); i = nextVertexStart; break; } specNameBuilder.append(nextCh); ++i; } continue; } // 邊關係, (:SRC)<-[:RELATION]-(:DST) if(ch == '<') { if(i + 2 > src.length) { throw new RuntimeException("血緣配置錯誤: 長度不匹配, near '" + new String(src, i, src.length - i)); } if(src[++i] != '-' || src[++i] != '[') { throw new RuntimeException("血緣配置錯誤: 邊關係配置錯誤, near '" + new String(src, i, src.length - i)); } StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String edgeLabel = CommonUtil.readSplitWord( src, i, ':', ']', false); int nextVertexStart = i + edgeLabel.length() + 2; if(nextVertexStart + 2 >= src.length) { throw new RuntimeException("血緣圖譜配置錯誤: 缺乏客體" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } if(src[++nextVertexStart] != '-' || src[nextVertexStart + 1] != '(') { throw new RuntimeException("血緣圖譜配置錯誤: 主體後面需緊跟關係 -> " + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge( specNameBuilder.toString() + ":" + edgeLabel, edgeLabel, NodeDiscoveryDirection.IN)); i = nextVertexStart; break; } specNameBuilder.append(nextCh); ++i; } } } } return flatNodeList; } }
怎麼樣,不復雜吧。就是兩個步驟:1. 解析每一個單個元素信息; 2. 根據單元素信息,構建出上下級關係;測試
使用 IN 表明入方向關係,用 OUT 表明出方向關係,每兩個頂點之間都有一條邊相連。大致就是這樣了。可是明顯,還有許多細節須要咱們去考慮,好比邊關係放在哪裏?如何添加相關節點?這些東西是須要特定的數據結構支持的。看我細細道來:
所謂單節點,便是站在任意關係點上來看總體圖的結構,若是整個圖是連通的,那麼理論上,經過這個節點因此探索到任意其餘節點。因此,其實它很是重要。
package com.my.mvc.app.common.helper.graph; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 功能描述: 簡單圖結構樹描述類 * */ public class GraphNodeEntityTree { /** * 當前頂點描述 */ private VertexEdgeSchemaDescriptor vertex; /** * 關係邊容器 */ private Map<NodeDiscoveryDirection, List<RelationWithVertexDescriptor>> relations = new HashMap<>(); /** * 入射方向節點 */ private List<GraphNodeEntityTree> in = new ArrayList<>(); /** * 出射方向節點 */ private List<GraphNodeEntityTree> out = new ArrayList<>(); /** * 全部頂點實例容器 */ private Map<String, GraphNodeEntityTree> uniqVertexTypes; public GraphNodeEntityTree(VertexEdgeSchemaDescriptor vertex) { this.vertex = vertex; uniqVertexTypes = new HashMap<>(); } public void setUniqVertexTypes(Map<String, GraphNodeEntityTree> uniqVertexTypes) { this.uniqVertexTypes = uniqVertexTypes; } public void addRelation(VertexEdgeSchemaDescriptor srcEntity, VertexEdgeSchemaDescriptor relation, VertexEdgeSchemaDescriptor dstEntity) { List<RelationWithVertexDescriptor> list = relations.computeIfAbsent( relation.getDirection(), k -> new ArrayList<>()); list.add(new RelationWithVertexDescriptor(srcEntity, relation, dstEntity)); } public GraphNodeEntityTree addInVertex(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { embeddedEntity.addOutVertexInner(this, relation.reverseDirection()); addInVertexInner(embeddedEntity, relation); return embeddedEntity; } private GraphNodeEntityTree addInVertexInner(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { in.add(embeddedEntity); addRelation(vertex, relation, embeddedEntity.getVertex()); return embeddedEntity; } public GraphNodeEntityTree addOutVertex(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { embeddedEntity.addInVertexInner(this, relation.reverseDirection()); addOutVertexInner(embeddedEntity, relation); return embeddedEntity; } private GraphNodeEntityTree addOutVertexInner(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { out.add(embeddedEntity); addRelation(this.getVertex(), relation, embeddedEntity.getVertex()); return embeddedEntity; } public VertexEdgeSchemaDescriptor getVertex() { return vertex; } /** * 獲取關係名稱 * * @param nodeIndex 節點序號 * @param direction 方向 * @return 關係名稱描述 */ public String getRelationName(int nodeIndex, NodeDiscoveryDirection direction) { List<RelationWithVertexDescriptor> list = relations.get(direction); if(list == null || list.isEmpty()) { return null; } return list.get(nodeIndex).getRelationName(); } public List<GraphNodeEntityTree> getIn() { return in; } public List<GraphNodeEntityTree> getOut() { return out; } /** * 快速獲取圖節點根(根據頂點label) * * @param vertexLabel 頂點標識 * @return 節點所在實例, 找不到對應節點則返回 null */ public GraphNodeEntityTree getNodeEntityTreeByVertexLabel( String vertexLabel) { return uniqVertexTypes.get(vertexLabel); } }
能夠說後續的操做入口都是在這裏的,因此重點關注。
最開始有一個token化的過程,那麼token化以後,如何定義也比較重要,咱們統一使用一個描述類來定義:
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 圖頂點和邊描述類 * */ public class VertexEdgeSchemaDescriptor { private String rawWord; private VertexOrEdgeType nodeType; private String vertexLabelType; private String relationName; private NodeDiscoveryDirection direction; private VertexEdgeSchemaDescriptor(String rawWord, VertexOrEdgeType nodeType, String vertexLabelType, String relationName, NodeDiscoveryDirection direction) { this.rawWord = rawWord; this.nodeType = nodeType; this.vertexLabelType = vertexLabelType; this.relationName = relationName; this.direction = direction; } /** * 新建頂點實例 * * @param rawWord 原始字符描述 * @param vertexLabelType 解析後的頂點類型(枚舉完成全部點類型) * @return 頂點實例 */ public static VertexEdgeSchemaDescriptor newVertex(String rawWord, String vertexLabelType) { return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.VERTEX, vertexLabelType, null, null); } /** * 新建邊實例 * * @param rawWord 原始字符描述 * @param relationName 關係名稱(當id使用) * @param direction 關係方向( -> 出方向OUT, <- 入方向IN ) * @return 邊實例 */ public static VertexEdgeSchemaDescriptor newEdge(String rawWord, String relationName, NodeDiscoveryDirection direction) { return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.EDGE, null, relationName, direction); } public String getRawWord() { return rawWord; } public VertexOrEdgeType getNodeType() { return nodeType; } public String getVertexLabelType() { return vertexLabelType; } public String getRelationName() { return relationName; } public NodeDiscoveryDirection getDirection() { return direction; } public VertexEdgeSchemaDescriptor reverseDirection() { return new VertexEdgeSchemaDescriptor(rawWord, nodeType, vertexLabelType, "-" + relationName, direction.reverse()); } @Override public String toString() { // 點描述 if(nodeType == VertexOrEdgeType.VERTEX) { return nodeType + "{" + "rawWord='" + rawWord + '\'' + ", vertexLabelType=" + vertexLabelType + '}'; } // 邊描述 return nodeType + "{" + "rawWord='" + rawWord + '\'' + ", relationName='" + relationName + '\'' + ", direction=" + direction + '}'; } }
主要就是原始字符串,定義邊、定義點。相似與單詞的聚合吧。
咱們須要清楚地知道各個點與各個點間的關係,因此須要一個關係描述類,來展現這東西。(實際上核心並未使用該關係)
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 關係實例, 實體 -> 關係 -> 客體 * */ public class RelationWithVertexDescriptor { /** * 源點、起點 */ private final VertexEdgeSchemaDescriptor srcVertex; /** * 目標點 */ private final VertexEdgeSchemaDescriptor dstVertex; /** * 關係(名稱) */ private final VertexEdgeSchemaDescriptor relation; public RelationWithVertexDescriptor(VertexEdgeSchemaDescriptor srcVertex, VertexEdgeSchemaDescriptor relation, VertexEdgeSchemaDescriptor dstVertex) { this.srcVertex = srcVertex; this.dstVertex = dstVertex; this.relation = relation; } public VertexEdgeSchemaDescriptor getSrcVertex() { return srcVertex; } public VertexEdgeSchemaDescriptor getDstVertex() { return dstVertex; } /** * 獲取當前關係名稱 */ public String getRelationName() { return relation.getRelationName(); } @Override public String toString() { if(relation.getDirection() == NodeDiscoveryDirection.OUT) { return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" + " -> " + relation.getRelationName() + " -> " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")" ; } return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" + " <- " + relation.getRelationName() + " <- " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")" ; } }
雖實際用處不大,可是當你在debug的時候,這個描述類能夠很方便地讓你觀察到解析是否正確。
1. 方向定義
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 探索方向定義 * * @since 2020/10/12 */ public enum NodeDiscoveryDirection { /** * 入方向, 上游 */ IN, /** * 出方向, 下游 */ OUT, ; public NodeDiscoveryDirection reverse() { if(this == OUT) { return IN; } return OUT; } }
2. 邊或點類型定義
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 邊或點類型定義 * */ public enum VertexOrEdgeType { VERTEX, EDGE, ; }
如此,整個解析模塊就完成了。你能夠完整的將如上字符解析爲實體關係了。
通過測試纔算真正可用。
package com.my.test.common.parser; import com.my.mvc.app.common.helper.SimpleGraphSchemaSyntaxParser; import com.my.mvc.app.common.helper.graph.GraphNodeEntityTree; import com.my.mvc.app.common.helper.graph.NodeDiscoveryDirection; import org.junit.Test; import java.util.List; public class SimpleGraphSchemaSyntaxParserTest { // 測試腳本 @Test public void testParseGraphSchema() throws InterruptedException { String graphSchema = "(:PEOPLE)-[:養寵物]->(:CAT)-[:吃]->(:RICE)\n" + ",(:PEOPLE)-[:吃]->(:RICE)\n" + ",(:PEOPLE)-[:養寵物]->(:DOG)\n" + ",(:PEOPLE)-[:擁有]->(:HOUSE)" + ",(:PEOPLE)-[:幹活]->(:JOB)" + ",(:CAT)-[:朋友]->(:DOG)" + ",(:DOG)-[:吃]->(:RICE)" + ",(:JOB)-[:產出]->(:BRICK)" + ",(:HOUSE)<-[:構件]-(:BRICK)" + ",(:HOUSE)<-[:構件]-(:GLASS)" ; GraphNodeEntityTree tree = SimpleGraphSchemaSyntaxParser .parseGraphSchemaAsTree(graphSchema); String searchFromLabel = "PEOPLE"; NodeDiscoveryDirection direction = NodeDiscoveryDirection.OUT; int maxDepth = 10; System.out.println("->" + searchFromLabel + ", direction:" + direction + ", depth:" + maxDepth); GraphNodeEntityTree searchRootFrom = tree.getNodeEntityTreeByVertexLabel(searchFromLabel); int allNodes = traversalNodesWithDirection(searchRootFrom, direction, maxDepth, maxDepth); System.out.println("allNodes: " + allNodes); Thread.sleep(5); } /** * 按某方向遍歷全部節點 * * @param root 搜索起點 * @param direction 方向, IN, OUT * @param maxDepth 搜索最大深度 * @param remainSearchDepth 剩餘搜索深度 * @return 全部節點數 */ private static int traversalNodesWithDirection(GraphNodeEntityTree root, NodeDiscoveryDirection direction, int maxDepth, int remainSearchDepth) { if(remainSearchDepth <= 0) { return 0; } List<GraphNodeEntityTree> subBranches; if(direction == NodeDiscoveryDirection.OUT) { subBranches = root.getOut(); } else { subBranches = root.getIn(); } if(subBranches == null || subBranches.isEmpty()) { return 0; } String whitespaceUnit = " "; StringBuilder preWhitespaceBuilder = new StringBuilder(whitespaceUnit); for (int i = 1; i < maxDepth - remainSearchDepth + 1; i++) { preWhitespaceBuilder.append(whitespaceUnit); } int allNodes = 0; String preWhitespace = preWhitespaceBuilder.toString(); for (int i = 0; i < subBranches.size(); i++) { GraphNodeEntityTree br1 = subBranches.get(i); String relationName = root.getRelationName(i, direction); allNodes++; System.out.println(preWhitespace + "->" + relationName + "->" + br1.getVertex().getRawWord()); allNodes += traversalNodesWithDirection(br1, direction, maxDepth, remainSearchDepth - 1); } return allNodes; } }
結果樣例以下:
->PEOPLE, direction:OUT, depth:10 ->養寵物->:CAT ->吃->:RICE ->朋友->:DOG ->吃->:RICE ->吃->:RICE ->養寵物->:DOG ->吃->:RICE ->擁有->:HOUSE ->幹活->:JOB ->產出->:BRICK ->-構件->:HOUSE