Multiple matches performance drop

(Alex Mandalios) #1

Hello everyone,

I am facing a weird issue with Neo4j. I have a relatively large graph with about 2 million nodes, and I would like to run personalized pagerank on some lists of nodes. I use the following syntax to grab the nodes I need
MATCH (a:type {id:value})
MATCH (b:type {id:value2})
MATCH (c:type {id:value3})
and it seems not to be working out well in terms of performance.

More specifically, fetching 500 nodes, even without feeding them to pagerank, takes about a minute, and 1000 takes about 10, which is not the linear increase I expected.
Using PROFILE reveals that cartesian products are formed, first for a, b then for a, b, c, etc. Given that id is a unique index and that I provide the right type of node, is this performance drop for multiple matches expected? If not, what could be the culprit?

Thanks in advance,

(Andrew Bowman) #2

In this kind of case, a cartesian product is expected and correct, and since these are unique indexes your result should only be a single row.

It would help to confirm the existence of an index on :type(id), and to see the PROFILE query plan with all elements expanded.

(Alex Mandalios) #3

Thanks for the feedback.

One thing I should note, even though it may be clear from the first post, is that when I need to find N nodes, I perform N matches, so the final query has about ~N lines. From what I searched online, this is considered a bad practice. Maybe I should use some form of batching instead? Or would that not be relevant?

(Andrew Bowman) #4

I think we would need to see the full query with some description on what it's supposed to do before we can make that call.

(Michael Hunger) #5

You should

  1. have an index or constraint
  2. use parameters
  3. use IN
MATCH (n:Type) WHERE IN $params