Just like SQL, if you do not properly connect the parts of your query, it will result in a cross (cartesian) product, which is seldom what you want.
Take the following example:
MATCH (p:Person), (m:Movie)
RETURN p, m;
In Cypher, what happens is that p
contains all of the nodes in the graph with the :Person
label, and m
contains all of the nodes in the graph with the :Movie
label.
Returning both of these results in a combination of each node p
being returned with each node m
, like so:
If there are three nodes with label Person
:
- Neo,
- Trinity, and
- Morpheus
and three nodes with label Movie:
- The Matrix,
- The Matrix Reloaded, and
- The Matrix Revolutions
The result of the above Cypher would be:
[options=header]
|===
| p | m
|Neo | The Matrix
|Neo | The Matrix Reloaded
|Neo | The Matrix Revolutions
|Trinity | The Matrix
|Trinity | The Matrix Reloaded
|Trinity | The Matrix Revolutions
|Morpheus | The Matrix
|Morpheus | The Matrix Reloaded
|Morpheus | The Matrix Revolutions
|===
Keep in mind, this is a simple example, so the result set is small.
With a production size graph, this would be a very large, potentially memory intensive query.
In general, inadvertent cross products happen in more complex queries.
They are common in queries with many WITH
clauses, and a close look at the query is needed to flush out the issue.
By following general performance best practices, this can easily be avoided.
Be as specific with your query as possible, make sure to use identifiers to properly tie parts of the query together, and only return the data you need.
And profile your slow queries so that you can see where the time and effort is spent.
From Neo4j 2.3 on there is a warning issued in Neo4j browser or if you run your query with EXPLAIN
that highlights this issue.