My graph is a tree consisting of nodes with the label of "Entity". Entities are related via a CHILD_OF relationship. So each Entity has an outgoing CHILD_OF relationship with it's parent Entity. Furthermore, each Entity has one or more nodes with the label of "Component". Components are attached to Entities via an outgoing ATTACHED_TO relationship. The tree is rooted by an Entity with the id of "#root".
The following illustrates the structure of the graph by showing a small section.
One of the common queries that on this graph is to find a subtree of the graph, starting with an Entity with a given ID. Where the ID is an application assigned ID and not the internal Neo4j ID. So for example, find the subtree whose root node has the ID '#root', or perhaps '#resources'. etc.
Here are two cypher queries that perform this operation:
MATCH (n:Entity {id: '#root'})
WITH n
CALL apoc.path.subgraphAll(n, {
relationshipFilter : '<CHILD_OF|<ATTACHED_TO',
optional: true
}) yield nodes, relationships
WITH nodes, relationships
RETURN nodes, relationships
and
MATCH (n:Entity {id: '#root'})
WITH n
CALL apoc.path.subgraphAll(n, {
relationshipFilter : '<CHILD_OF',
optional: true
}) yield nodes, relationships
WITH nodes
UNWIND nodes as e
OPTIONAL MATCH (e)<-[:ATTACHED_TO]-(c:Component)
RETURN e, c
The first one returns a list of nodes and a list of relationships. The second one returns Entity, Component pairs.
Based on my understanding of Cypher I assume the first one is a much more efficient query. But the second one returns the results in a much easier format for our application to consume. It is exactly what we need, which is the list of Entity, Component pairs. Obviously those pairs can be constructed from the list of nodes and relationships returned by the first query.
Performance is of utmost importance in our use case, so I have four questions regarding these two queries.
-
Is my assumption correct, in that the first query is much more efficient and performant that the second query?
-
If I go with the first query, is it true that the nodes and relationship lists have the same ordering. i.e. If node x occurs first in the nodes list and I can expect that its relationships occur first in the relationships list. This is what I observe to be true. But is that guaranteed by Cypher semantics?
-
Is there an efficient way to turn the results of the first query into Entity, Components pairs, so that the same results are returned as the second query?
-
Are the results from apoc.path.subgraphAll guaranteed to be in breadth first order?
Thanks,
Michael-