Inconsistent Query Behavior with WITH Clause - Potential Bug Report

Hello

I've encountered an inconsistent behavior in your system when running two variations of a similar query. Below are the details:

  1. Query 1:
MATCH (m:Movie {id: 1})
RETURN collect(m.title) as movies

Actual Result: [{'movies': []}] (as expected)

  1. Query 2:
WITH 'a' as a
MATCH (m:Movie {id: 1})
RETURN a, collect(m.title) as movies

Expected Result: [{'a': 'a', 'movies': []}] Actual Result: [] (unexpected)

It appears that including a WITH clause changes the behavior unexpectedly, leading to a completely empty result set. This inconsistency could signify a bug in the query processing mechanism.

@artem.gurskiy

any details specific to Neo4j version?

If you change

WITH 'a' as a
MATCH (m:Movie {id: 1})
RETURN a, collect(m.title) as movies

to

WITH 'a' as a
OPTIONAL MATCH (m:Movie {id: 1})
RETURN a, collect(m.title) as movies

do you then get 'expected' results?

The behavior is not particular to the 'with' clause. It seems to occur when you have at least one grouping term with the aggregation.

I agree that this behavior seems inconsistent on how the aggregation works.

When there is not a grouping term, then the aggregation method returns a value when there are no matched records.

When there is a grouping term, then the query returns no results when there are no matched records.

I used 5.11.0 for the above queries.

Hello, this is expected behavior, we have a knowledge base article describing this in more detail.

Thank you very much for the reference. It is good to know it is expected behavior and that the optional match avoids the issue.

I still could argue that the behavior is not consistent. In one case you don't have any rows and you return a result and in the other you also don't have any rows and you do return a result.

I read the explanation in the article about why you can't get a zero count value when there is a grouping factor. I think it is not intuitive. I think if you have an aggregate function, which will be an eager operation, that no rows should be thrown away so you can calculate the aggregation for those grouping values that have no match.

Here is an example:

create(:Person{name:"steve"}), (:Person{name:"bob"}), (:Email{recipient: "steve"}), (:Email{recipient: "steve"})

Now I would like to count how many emails each user received.

match(n:Person)
with n
match(m:Email{recipient:n.name})
return n.name, count(m) as noOfEmails

I would have liked to get a row with bob showing zero emails. This does not seem unreasonable since a query without a grouping value and zero rows returns zero instead of "no changes, no records".

I understand this is a matter of design preference and that is fine. Fortunately, there is a work around using optional match to get the desired result.

That all being said, the behavior is actually the same in sql. If I have a query that counts records and there are zero matches and there is no "group by" clause, then I get a row returned with zero count. In contrast, if I add a group by clause I get zero records back. In my example above, using an optional match is kinda equivalent to a left join, which would return a zero value for persons that don't have any emails.

Ok...I succumb.

Thanks for the clarification.

Hi Gary,

There are a few different approaches you could use to get the kind of behavior you want.

As you saw and mentioned, OPTIONAL MATCH (which behaves like a left join as you pointed out) can be used when we are not sure if such a pattern / node exists, and and rather than removing (or not emitting) a row for the lack of result, it will instead emit a null for the variable value, so that's an important tool.

An alternate approach may be to use a subquery, as subqueries execute per row, and you do not have to keep an imported variable around within the subquery when you no longer need it, which can allow you to perform an aggregation without a grouping key:

MATCH (n:Person)

CALL {
  WITH n
  MATCH (m:Email {recipient:n.name})
  RETURN count(m) as noOfEmails
}
RETURN n.name, noOfEmails

That is a good one...I love the subquery.

You could also use the COUNT subquery.

MATCH (n:Person)
RETURN n.name, COUNT {
  MATCH (m:Email {recipient:n.name})
} as noOfEmails

@andrew_bowman, thanks for your insights