Which query is more expensive?

My graph is a tree consisting of nodes with the label of "Entity". Entities are related via a CHILD_OF relationship. So each Entity has an outgoing CHILD_OF relationship with it's parent Entity. Furthermore, each Entity has one or more nodes with the label of "Component". Components are attached to Entities via an outgoing ATTACHED_TO relationship. The tree is rooted by an Entity with the id of "#root".

The following illustrates the structure of the graph by showing a small section.


One of the common queries that on this graph is to find a subtree of the graph, starting with an Entity with a given ID. Where the ID is an application assigned ID and not the internal Neo4j ID. So for example, find the subtree whose root node has the ID '#root', or perhaps '#resources'. etc.

Here are two cypher queries that perform this operation:

MATCH (n:Entity {id: '#root'})
WITH n
CALL apoc.path.subgraphAll(n, {
    relationshipFilter : '<CHILD_OF|<ATTACHED_TO',
    optional: true
}) yield nodes, relationships
WITH nodes, relationships
RETURN nodes, relationships

and

MATCH (n:Entity {id: '#root'})
WITH n
CALL apoc.path.subgraphAll(n, {
    relationshipFilter : '<CHILD_OF',
    optional: true
}) yield nodes, relationships
WITH nodes
UNWIND nodes as e
OPTIONAL MATCH (e)<-[:ATTACHED_TO]-(c:Component)
RETURN e, c

The first one returns a list of nodes and a list of relationships. The second one returns Entity, Component pairs.

Based on my understanding of Cypher I assume the first one is a much more efficient query. But the second one returns the results in a much easier format for our application to consume. It is exactly what we need, which is the list of Entity, Component pairs. Obviously those pairs can be constructed from the list of nodes and relationships returned by the first query.

Performance is of utmost importance in our use case, so I have four questions regarding these two queries.

  1. Is my assumption correct, in that the first query is much more efficient and performant that the second query?

  2. If I go with the first query, is it true that the nodes and relationship lists have the same ordering. i.e. If node x occurs first in the nodes list and I can expect that its relationships occur first in the relationships list. This is what I observe to be true. But is that guaranteed by Cypher semantics?

  3. Is there an efficient way to turn the results of the first query into Entity, Components pairs, so that the same results are returned as the second query?

  4. Are the results from apoc.path.subgraphAll guaranteed to be in breadth first order?

Thanks,

Michael-

Responses:

  1. I think your intuition is correct, the first query will be more efficient compared to the second. This would be due to the extra match statements needed to get the components nodes in the second query, where these are easily retrieved while traversing the sub graph with the apoc method.

  2. not sure, would have to look at the apoc code.

  3. see query below. Relationships entities have the following: 1) 'id' retrieved with id(r), a 'type' retrieved with type(r), its properties retrieved by properties(r), its start node retrieved with startNode(r), and its end node retrieved by endNode(r). I used these fact to extract the 'ATTACHED_TO' relationships from the list of relationships and then returned the start and end nodes, which represent 'c' and 'e' respectively.

  4. don't know, again would have to look at the code

MATCH (n:Entity {id: '#root'})
CALL apoc.path.subgraphAll(n, {
    relationshipFilter : '<CHILD_OF|<ATTACHED_TO',
    optional: true
}) yield relationships
UNWIND [i in relationships where type(i) = 'ATTACHED_TO'] as result
RETURN endNode(result) as e, startNode(result) as c

Note: I could not find a reference to the 'optional' config parameter you are using.

Wow thanks. That is a good trick to know. I tried it and it works great. Thank you!

I noticed there is a configuration for the apoc method to specify breadth vs depth searching, so you can chose. The config parameter is 'bfs'.

Thanks I missed that!