How the MATCH clause find the relations from connected nodes works


(Ardan7779) #1

Hellow there..

I'am student and I need to learn more about Neo4j, especially the Algorithm about how the MATCH clause in cypher works.
How it works for finding the relations about two nodes that connected through several other nodes.
Maybe the pseudocode, or the flowchart, or the algorithm about it..?

Thanks for answering, and sorry if my English is not so good.


(Bratanic Tomaz) #2

Probably best to start with the documentation : https://neo4j.com/docs/developer-manual/current/cypher/clauses/match/


(Ardan7779) #3

Thanks for answering. But there explain the "using" of the clause, not how the clause works.

For example in finding the shortest path, in the documentation explained how to use the cypher, not how the logic of the cypher work, that maybe use A* or depth first or djikstra or other.


(Andrew Bowman) #4

You may want to EXPLAIN or PROFILE your query to see how the query is planned and executed. In general, you're going to see:

  1. Lookup of starting node(s) in the graph. This will generally attempt (from most to least efficient) index lookup of specific nodes, then label scans, then all nodes scans. Which is chosen depends upon what properties or parameters are provided and present in the query, and the statistics and metadata about the graph data.

  2. Filtering and expansion. From the starting node(s) relationships are traversed from those nodes, and filtering performed according to the query. Neo4j uses index-free adjacency for expansion, meaning each node knows the relationships attached to it. No table joins or index lookups for relationship expansion are needed. This is basically pointer chasing in memory (node to relationship to node etc), and the main reason why native graph dbs outperform rdbms or table-based databases.

  3. For variable-length relationships, Cypher should be using dfs for expansion. Depending on the WHERE clause, some predicates can be evaluated during expansion (notably when using all() and none() to apply predicates to nodes or relationships for the entirety of the expansion). When using shortestPath(), bfs is used instead.


(Ardan7779) #5

Thanks for the answer. It really help me for understanding the logic. Now, i still learn about DFS and BFS.


(Ardan7779) #6

So, in Graph Databases, like NEO4j, there is no binary searching or binary operation for looking the nodes, relations, or the path, right..?


(Andrew Bowman) #7

When we're looking up nodes in the graph (as opposed to traversing relationships to connected nodes from nodes already in scope), such as when looking up starting nodes, we utilize some form of lookup depending on what indexes are available and what useful information is present in the query (labels on nodes, predicates on properties, etc).

When we have what we need to perform an index lookup then we'll use that, and that will either be via lucene or our native indexing. If it's lucene, then it's an inverted index of some sort. For our native indexing we're using B-Trees.

Again, this is only for lookup of starting nodes in the graph, or where the planner determines that such a lookup can be used on other nodes in the query and may be more efficient than expansion + filtering. These lookups generally make up a small portion of the query.

The majority of the work when executing a Cypher query is traversing from these nodes via their connected relationships and applying predicates and filtering to find all possible paths matching the given MATCH pattern and WHERE clauses.

It wouldn't make sense to use binary searching when expanding, as the graph itself generally isn't a binary tree, and you typically wouldn't use an entire graph database simply to implement a binary tree.