Hi,
I found the following interesting behaviour today wenn running some basic statistics on my dataset.
What the relevant part of the dataset looks like:
there are multiple nodes for each of the labels :LABEL1 :LABEL2 :LABEL3
What I did:
MATCH (p:LABEL1), (pr:LABEL2), (fp:LABEL3) RETURN COUNT(pr), COUNT(p), COUNT(fp)
What I expected as a result: getting amount of nodes for the respective labels:
count(pr): 12
count(p): 206
count(fp): 126
The actual result: all 3 showed the same number as count (311472) (that's the product of the three expected results by the way^^)
Now that was some interesting result.
I tried again with separating the queries into single matches connected with WITH statements.
MATCH (p:LABEL1) WITH p MATCH(pr:LABEL2) WITH p, pr MATCH (fp:LABEL3) RETURN count(pr), count(p), count(fp)
The result stayed the same.
I tried again with counting each of the variables before matching the next one and now got the result I'd expect.
match(p:LABEL1) with count(p) as pcount match (pr:LABEL2) with pcount, count(pr) as prcount match (fp:LABEL3) RETURN count(fp), pcount, prcount
It's of course possible, that 311472 is just the correct result for the query I posted and that I don't understand well enough how the count() function works.
In that case I'd be very happy if someone explains to me where my line of thinking is flawed. In case it's a real bug and you're able to reproduce it I'm happy to have been of help
your initial problem statement and cypher MATCH is performing a cartesian join and thus the result of 311472 is expected as it is the same as 12*206*126.
And if you are using the Neo4j Browser when entering said match statement there should have been a yellow warning icon to the left of the text which upon hovering over would report a cartesian join was in effect.
If you simply want to return counts for each label you could do the equivalent of
MATCH (p:LABEL1) WITH count(p) as countp
MATCH (pr:LABEL2) WITH count(pr) as countpr, countp
MATCH (fp:LABEL3) WITH count(fp) as countfp,countpr, countp
return countp, countpr, countfp;
I'm seriously thinking about writing a 2 to 6 pages examples book on how MATCH clause works. There is so many people here who are still using MATCH as an SQL clause.
Graph VS Projected Data
The underlying truth behind the weird and unexpected behaviour of your Cypher queries
The Cypher language has a lot of subtleties to it.
You might add a technique I use:
Before creating nodes or relationships, RETURN the nodes and relationships (perhaps their names), so that you can see if you got you what you think you got before making a mess out of your DB.
that'd indeed be worth a read!
My problem here initially was not that I didn't get that cypher would create a cartesian map of the nodes I called, but the wrong asusmption, that by querying the count of the individual variables I entered I'd get the lengths of each side and not the whole product...