Cypher query optimisation


I have a graph made up of the nodes 'recipe' and 'ingredient'. The links connect the ingredients to the recipes.
I want to be able to retrieve recipes based on the ingredients I have in my fridge. I have noticed that the time of retrieval grows exponentially as I add more ingredients (in fact, it is not possible to search for more than 4 ingredients at a time).
Suppose I want to retrieve all recipes that contain lemon and peach. This is the type of query I have come up with in Cypher.

MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep MATCH (r)-[:CONTAINS]-(i:Ingredient {name:"peach"}) WHERE r IN hRep RETURN r LIMIT 5

Is there a better/more efficient way to write it?
Any advice would be more than welcome.
Thanks a lot

Hi Barbara!

A couple questions:

  • What's the size of your database? Depending on the size, a common ingredient can become a supernode and that might cause you some trouble.
  • Do you have any indexes in it (specially in the name attribute of the Ingredient Type)?

Are you building your query dynamically? It seems you could run your initial query (with two ingredients) as follow: MATCH (i:Ingredient {name:"peach"})-[:CONTAINS]-(r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) by doing so you remove one less NodeByLabelScan.

1 Like

Try this:
With four ingredients:

MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep, i
UNWIND hRep as r1

MATCH (r1)-[:CONTAINS]-(i1:Ingredient {name:"peach"}) WITH collect(r1) AS hRep1, i, i1
UNWIND hRep1 as r2

MATCH (r2)-[:CONTAINS]-(i2:Ingredient {name:"banana"}) WITH collect(r2) AS hRep2, i, i1, i2
UNWIND hRep2 as r3

MATCH (r3)-[:CONTAINS]-(i3:Ingredient {name:"guava"}) WITH collect(r3) AS hRep3, i, i1, i2, i3
UNWIND hRep3 as r4

RETURN as Recipe, as Ingr1, as Ingr2, as Ingr3, as Ingr4

1 Like

Thanks a lot, folks. I'll try both solutions.

Actually, I realised that the last version of my query is the following:
CALL db.index.fulltext.queryNodes('ingredients', 'lemons') YIELD node WITH node AS ing0
CALL db.index.fulltext.queryNodes('ingredients', 'carrot') YIELD node WITH node AS ing1, ing0
MATCH (r:Recipe)-[:CONTAINS]-(ing0), (r:Recipe)-[:CONTAINS]-(ing1)

Any idea why it takes forever?
Graph has 80.000 nodes (60.000 recipes + 20.000 ingredients) and 600.000 links.
Thanks again!


So you may find this article on performing match intersection useful for what you're trying to do.

As for why you're seeing your latest query taking forever, this is due to cardinality issues you're introducing from these back-to-back lookups. Please review this article on understanding Cypher cardinality for more detail, but the gist of it is that operations produce records/rows as results, and operations execute per input record/row. Your second index call is being executed for every result that was generated from your first index call, and the result leads to a multiplicative number of rows. You could better visualize this from a PROFILE of the query, but with the cardinality issue in play here it may not finish.