Cypher query optimisation


I have a graph made up of the nodes 'recipe' and 'ingredient'. The links connect the ingredients to the recipes.
I want to be able to retrieve recipes based on the ingredients I have in my fridge. I have noticed that the time of retrieval grows exponentially as I add more ingredients (in fact, it is not possible to search for more than 4 ingredients at a time).
Suppose I want to retrieve all recipes that contain lemon and peach. This is the type of query I have come up with in Cypher.

MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep MATCH (r)-[:CONTAINS]-(i:Ingredient {name:"peach"}) WHERE r IN hRep RETURN r LIMIT 5

Is there a better/more efficient way to write it?
Any advice would be more than welcome.
Thanks a lot

Hi Barbara!

A couple questions:

  • What's the size of your database? Depending on the size, a common ingredient can become a supernode and that might cause you some trouble.
  • Do you have any indexes in it (specially in the name attribute of the Ingredient Type)?

Are you building your query dynamically? It seems you could run your initial query (with two ingredients) as follow: MATCH (i:Ingredient {name:"peach"})-[:CONTAINS]-(r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) by doing so you remove one less NodeByLabelScan.

1 Like

Try this:
With four ingredients:

MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep, i
UNWIND hRep as r1

MATCH (r1)-[:CONTAINS]-(i1:Ingredient {name:"peach"}) WITH collect(r1) AS hRep1, i, i1
UNWIND hRep1 as r2

MATCH (r2)-[:CONTAINS]-(i2:Ingredient {name:"banana"}) WITH collect(r2) AS hRep2, i, i1, i2
UNWIND hRep2 as r3

MATCH (r3)-[:CONTAINS]-(i3:Ingredient {name:"guava"}) WITH collect(r3) AS hRep3, i, i1, i2, i3
UNWIND hRep3 as r4

RETURN as Recipe, as Ingr1, as Ingr2, as Ingr3, as Ingr4

1 Like

Thanks a lot, folks. I'll try both solutions.

Actually, I realised that the last version of my query is the following:
CALL db.index.fulltext.queryNodes('ingredients', 'lemons') YIELD node WITH node AS ing0
CALL db.index.fulltext.queryNodes('ingredients', 'carrot') YIELD node WITH node AS ing1, ing0
MATCH (r:Recipe)-[:CONTAINS]-(ing0), (r:Recipe)-[:CONTAINS]-(ing1)

Any idea why it takes forever?
Graph has 80.000 nodes (60.000 recipes + 20.000 ingredients) and 600.000 links.
Thanks again!