cancel
Showing results for 
Search instead for 
Did you mean: 

Bug with MATCH and COUNT when matching multiple variables?

florian_schumme
Node Link

Hi,
I found the following interesting behaviour today wenn running some basic statistics on my dataset.
What the relevant part of the dataset looks like:

  • there are multiple nodes for each of the labels :LABEL1 :LABEL2 :LABEL3
    What I did:

  • MATCH (p:LABEL1), (pr:LABEL2), (fp:LABEL3) RETURN COUNT(pr), COUNT(p), COUNT(fp)

  • What I expected as a result: getting amount of nodes for the respective labels:

    • count(pr): 12
    • count(p): 206
    • count(fp): 126
  • The actual result: all 3 showed the same number as count (311472) (that's the product of the three expected results by the way^^)

Now that was some interesting result.

I tried again with separating the queries into single matches connected with WITH statements.

  • MATCH (p:LABEL1) WITH p MATCH(pr:LABEL2) WITH p, pr MATCH (fp:LABEL3) RETURN count(pr), count(p), count(fp)

The result stayed the same.
I tried again with counting each of the variables before matching the next one and now got the result I'd expect.

match(p:LABEL1) with count(p) as pcount match (pr:LABEL2) with pcount, count(pr) as prcount match (fp:LABEL3) RETURN count(fp), pcount, prcount

It's of course possible, that 311472 is just the correct result for the query I posted and that I don't understand well enough how the count() function works.
In that case I'd be very happy if someone explains to me where my line of thinking is flawed. In case it's a real bug and you're able to reproduce it I'm happy to have been of help

neo4j Desktop, database version: 4.1.4

1 ACCEPTED SOLUTION

dana_canzano
Neo4j
Neo4j

your initial problem statement and cypher MATCH is performing a cartesian join and thus the result of 311472 is expected as it is the same as 12*206*126.
And if you are using the Neo4j Browser when entering said match statement there should have been a yellow warning icon to the left of the text which upon hovering over would report a cartesian join was in effect.

If you simply want to return counts for each label you could do the equivalent of

MATCH (p:LABEL1) WITH count(p) as countp
MATCH (pr:LABEL2) WITH count(pr) as countpr, countp
MATCH (fp:LABEL3) WITH count(fp) as countfp,countpr, countp
return countp, countpr, countfp;

View solution in original post

5 REPLIES 5

dana_canzano
Neo4j
Neo4j

your initial problem statement and cypher MATCH is performing a cartesian join and thus the result of 311472 is expected as it is the same as 12*206*126.
And if you are using the Neo4j Browser when entering said match statement there should have been a yellow warning icon to the left of the text which upon hovering over would report a cartesian join was in effect.

If you simply want to return counts for each label you could do the equivalent of

MATCH (p:LABEL1) WITH count(p) as countp
MATCH (pr:LABEL2) WITH count(pr) as countpr, countp
MATCH (fp:LABEL3) WITH count(fp) as countfp,countpr, countp
return countp, countpr, countfp;

top, thanks a lot for the explanation!

tard_gabriel
Ninja
Ninja

I'm seriously thinking about writing a 2 to 6 pages examples book on how MATCH clause works. There is so many people here who are still using MATCH as an SQL clause.

Graph VS Projected Data
The underlying truth behind the weird and unexpected behaviour of your Cypher queries

For beginner to intermediate

Please do!

The Cypher language has a lot of subtleties to it.

You might add a technique I use:

Before creating nodes or relationships, RETURN the nodes and relationships (perhaps their names), so that you can see if you got you what you think you got before making a mess out of your DB.

that'd indeed be worth a read!
My problem here initially was not that I didn't get that cypher would create a cartesian map of the nodes I called, but the wrong asusmption, that by querying the count of the individual variables I entered I'd get the lengths of each side and not the whole product...