Cross Product Cypher queries will not perform well

performance
cypher
knowledge-base

(David Gordon) #1

Just like SQL, if you do not properly connect the parts of your query, it will result in a cross (cartesian) product, which is seldom what you want.
Take the following example:

MATCH (p:Person), (m:Movie)
RETURN p, m;

In Cypher, what happens is that p contains all of the nodes in the graph with the :Person label, and m contains all of the nodes in the graph with the :Movie label.
Returning both of these results in a combination of each node p being returned with each node m, like so:

If there are three nodes with label Person:

  • Neo,
  • Trinity, and
  • Morpheus

and three nodes with label Movie:

  • The Matrix,
  • The Matrix Reloaded, and
  • The Matrix Revolutions

The result of the above Cypher would be:

[options=header]
|===
| p | m
|Neo | The Matrix
|Neo | The Matrix Reloaded
|Neo | The Matrix Revolutions
|Trinity | The Matrix
|Trinity | The Matrix Reloaded
|Trinity | The Matrix Revolutions
|Morpheus | The Matrix
|Morpheus | The Matrix Reloaded
|Morpheus | The Matrix Revolutions
|===

Keep in mind, this is a simple example, so the result set is small.
With a production size graph, this would be a very large, potentially memory intensive query.

In general, inadvertent cross products happen in more complex queries.
They are common in queries with many WITH clauses, and a close look at the query is needed to flush out the issue.
By following general performance best practices, this can easily be avoided.
Be as specific with your query as possible, make sure to use identifiers to properly tie parts of the query together, and only return the data you need.
And profile your slow queries so that you can see where the time and effort is spent.

From Neo4j 2.3 on there is a warning issued in Neo4j browser or if you run your query with EXPLAIN that highlights this issue.