Count wrong on OPTIONAL MATCH

Given that MATCH (product:Artefact)-[:ASSIGNED_SKU]->(assigned_sku:Sku) RETURN count(product) as count returns 141

Why does

MATCH (product:Artefact)-[:ASSIGNED_SKU]->(assigned_sku:Sku)
OPTIONAL MATCH (assigned_sku)-[IN_HIERARCHICAL_CATEGORY]->(assigned_categories:Category)
RETURN count(product) as count

Return 150?

Even more concerning: if I change it back to RETURN product, the Neo4j browser shows 141 nodes

  • Neo4j Browser version: 3.2.18

  • Neo4j Server version: 3.5.4 (community)

  • what kind of API / driver do you use: Using the browser directly

  • screenshot of PROFILE:
    PROFILE

  • of EXPLAIN:
    EXPLAIN

  • which plugins / extensions / procedures do you use: Only APOC

  • neo4j.log does not show anything when the query is run. Will enable debug logging if requested.

Looks like you have at least one assigned_sku that has multiple relationships (note that the OPTIONAL MATCH in your query as-is with match ANY outgoing relationship, since IN_HIERARCHICAL_CATEGORY is being treated as a variable, not a relationship type, since it's not prefixed with a :. Fix that up since I think you mean for this to be a type), so a distinct product node will occur on multiple lines and thus be counted multiple times in that count.

You can use count(DISTINCT product) to only count each distinct product only once.

(The missing : is a typo in my post, good catch)

It looks like you’re right: doing distinct count does return the right amount, and does make sense.

Thank you!

P.s. Not sure if there’s a desire to change the documentation, but a mention of “duplicates” in the section for optional match might save the next person some time

It's not necessarily an OPTIONAL MATCH thing...the same thing can happen with MATCH. Remember that Cypher is all about finding all possible paths that match a pattern. That may (and often does) include multiple paths where the same node is present for a variable across those paths, but other elements of those paths will be different. Doing a count() by default doesn't count distinct elements, you need count(DISTINCT ...) in those cases.