cancel
Showing results for 
Search instead for 
Did you mean: 

Variable length path - reduce duplication in results in code

ChrisM
Node

Hi, I have a fairly simple data structure with two types of node 'Stock' and 'Recipe'. The following 2 relationships are possible:

 

(:Stock)-[:HAS_ASSIGNEE_OF]->(:Recipe)
(:Recipe)-[:CONTAINS]->(:Stock)

 

As such you could have a chain of these relationships that is arbitrarily deep/long (note that my API does not allow a path to be created with a node being repeated along the chain, so the path will never be circular).

Also, while a Stock node will only ever have one HAS_ASSIGNEE_OF relationship to a single Recipe node, a recipe node can have multiple CONTAINS relationships to other Stock nodes.

I am trying to write a query to retrieve all of the nodes and relationships starting from a single Stock node and traversing the HAS_ASSIGNEE_OF or CONTAINS relationships as far as they go. I have written this query:

 

MATCH p = (:Stock { id: 'some_id'})-[:HAS_ASSIGNEE_OF|CONTAINS*]->() RETURN nodes(p), relationships(p)

 

This query 'works' in the sense that it does return the whole tree of nodes and relationships for the parent Stock node. However, the result in code contains a lot of duplicate data.

For example if the following path exists:

 

(a1:Stock { name: 'Grandparent })-[:HAS_ASSIGNEE_OF]->(a2:Recipe)-[:CONTAINS]->(b1:Stock { name: 'Parent' })-[:HAS_ASSIGNEE_OF]->(b2:Recipe)-[:CONTAINS]->(c1:Stock { name: 'Child' })->[:HAS_ASSIGNEE_OF]->(c2:Recipe)

 

In the result in code I get 5 sets of results, summarized as follows:

  1. a1, a2, HAS_ASSIGNEE_OF
  2. a1, a2, b1 HAS_ASSIGNEE_OF, CONTAINS
  3. a1, a2, b1, b2, HAS_ASSIGNEE_OF, CONTAINS, HAS_ASSIGNEE_OF
  4. a1, a2, b1, b2, c1 HAS_ASSIGNEE_OF, CONTAINS, HAS_ASSIGNEE_OF, CONTAINS
  5. a1, a2, b1, b2, c1, c2 HAS_ASSIGNEE_OF, CONTAINS, HAS_ASSIGNEE_OF, CONTAINS, HAS_ASSIGNEE_OF

Is there a way I can rewrite my query so that I only get the set of results in bullet point 5? Bearing in mind that a2, b2 and c2 could all have multiple CONTAINS relationships which would also need to be included. In other words, can I restrict/limit the results to only contain the nodes and relationships for "complete" paths, i.e. paths where the node at the end of the path has no further matching relationships?

1 REPLY 1

Is that basically the same relationship just the inverse? You usually don't need to store the inverse direction in neo4j as you can always traverse in both directions.

In plain cypher you could use:

 

MATCH p = ...
UNWIND nodes(p) as n
UNWIND rels(p) as r
RETURN collect(distinct n) as nodes,collect(distinct r) as rels

 

there are also some apoc functions, like return apoc.graph.fromPaths(collect(p)) as graph
or even better you could use apoc.expand.subgraph

or apoc.convert.toTree