cancel
Showing results for 
Search instead for 
Did you mean: 

Getting a warning when using collect

ferdz_steenkamp
Node Link

Hi, I'm relatively new to Cypher, I'm getting a warning when using collect in the return statement of the following query.

MATCH	(user:User)-[:HAS]->(session:Session),
        (user)-[:HAS]->(doc:Doc)

WHERE 	id(user)	=	61

WITH
    id(session)             AS SessionID,
    properties(session)     AS SessionProps,
    count(doc)              AS NoteCount

WITH
    *,
    SessionProps{.*, ID: SessionID} AS Session
    
RETURN {
    Sessions:       collect(Session),
    NoteCount:      NoteCount
}

I get the result that I'm looking for (with Sessions returning an array), but I get a deprecation warning, full warning:

This feature is deprecated and will be removed in future versions.

Aggregation column contains implicit grouping expressions. Aggregation expressions with implicit grouping keys are deprecated and will be removed in a future version. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b', and this expression is now deprecated. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause.

I'm not really sure what the 'implicit grouping expression' is. Any ideas on how to restructure this query? Raising the collect up to the WITH clause doesn't return the Sessions as an array like I want.

Thanks in advance!

1 ACCEPTED SOLUTION

glilienfield
Ninja
Ninja

The above solution does work to avoid the implicit group warning, but there may be a more efficient way to write the query that eliminates the problem. What is occurring is that your match for one user is 'joining' two path results, one for sessions and one for documents. Let's assume your data has 5 sessions and 6 doc for this user. The result will be 30 rows, where the same doc is repeated for each set of 5 sessions, until all 6 docs are iterated through. The first 'with' clause then explicitly groups the doc results by each session, since the with has session in it. This reduces the 30 rows down to 5, with each row by session having the same doc count of 6. The second 'with' clause just rearranges the data, but there are still 5 rows, one per session, all with an identical count of 6. The 'return' statement is where the implicit group occurs, as the collect statement needs to group on something, which is implicitly specified as the NoteCount column of each row since it is not part of an aggregate function. Since the NoteCount column has the same value of 6 for each row, the result is one row with a count of 6 and the collection of sessions for that single count of 6. Moving the collect(Session) operation out of the 'return' operation and into the previous 'with' clause, as suggested by @giuseppe.villani, makes the grouping explicit and removes the warning.

You can avoid all of this if you refactor your query to first get your collection of session data, then add to it the count of the docs. This removes the 'join' operation that resulted in an initial 30 rows to process. In this case, the first match results in 5 rows, which is reduced to one. Then, the second match gets the 6 doc rows and reduces it to one with the count.

MATCH (user:User)-[:HAS]->(session:Session)
WHERE id(user) = 61
with user, id(session) as sessionId, properties(session) AS sessionProps
with user, collect(sessionProps{.*, ID: sessionId}) AS sessions
match (user)-[:HAS]->(doc:Doc)
with user, sessions, count(doc) as docCount
return {NoteCount: docCount, Sessions: sessions}

View solution in original post

4 REPLIES 4

giuseppe_villan
Graph Fellow

@ferdz.steenkamp

In short words, the warning says that you should not execute aggregation functions, in your case collect(Session), with other operation, in your case the mapping {Sessions: collect(Session), NoteCount: NoteCount}`

You could split aggregation functions, collect(Session) in your case from other stuff,
to explicitly say what do you want to group, that is, instead of:

WITH *, SessionProps{.*, ID: SessionID} AS Session
RETURN {
    Sessions:       collect(Session),
    NoteCount:      NoteCount
}

you can write:

WITH *, SessionProps{.*, ID: SessionID} AS Session // same as above
WITH NoteCount, collect(Session) as collectSession // here we collect Session
RETURN { 
    Sessions: collectSession, 
    NoteCount: NoteCount
}

See here for more deprecation details: Deprecations, additions and compatibility - Neo4j Cypher Manual

glilienfield
Ninja
Ninja

The above solution does work to avoid the implicit group warning, but there may be a more efficient way to write the query that eliminates the problem. What is occurring is that your match for one user is 'joining' two path results, one for sessions and one for documents. Let's assume your data has 5 sessions and 6 doc for this user. The result will be 30 rows, where the same doc is repeated for each set of 5 sessions, until all 6 docs are iterated through. The first 'with' clause then explicitly groups the doc results by each session, since the with has session in it. This reduces the 30 rows down to 5, with each row by session having the same doc count of 6. The second 'with' clause just rearranges the data, but there are still 5 rows, one per session, all with an identical count of 6. The 'return' statement is where the implicit group occurs, as the collect statement needs to group on something, which is implicitly specified as the NoteCount column of each row since it is not part of an aggregate function. Since the NoteCount column has the same value of 6 for each row, the result is one row with a count of 6 and the collection of sessions for that single count of 6. Moving the collect(Session) operation out of the 'return' operation and into the previous 'with' clause, as suggested by @giuseppe.villani, makes the grouping explicit and removes the warning.

You can avoid all of this if you refactor your query to first get your collection of session data, then add to it the count of the docs. This removes the 'join' operation that resulted in an initial 30 rows to process. In this case, the first match results in 5 rows, which is reduced to one. Then, the second match gets the 6 doc rows and reduces it to one with the count.

MATCH (user:User)-[:HAS]->(session:Session)
WHERE id(user) = 61
with user, id(session) as sessionId, properties(session) AS sessionProps
with user, collect(sessionProps{.*, ID: sessionId}) AS sessions
match (user)-[:HAS]->(doc:Doc)
with user, sessions, count(doc) as docCount
return {NoteCount: docCount, Sessions: sessions}

Thanks for the detailed answer! Think I'll be working through this example a couple of times before I understand it fully, really appreciate it

glilienfield
Ninja
Ninja

You are very welcome. If you want to help visualize it, run the following queries and look at the data. The third query should join the results of the first two queries, so that each session id is repeated once for each doc id. They are grouped to make is visible. The last query groups by session and counts the documents for each session. This is what the first part of your query does.

MATCH (user:User)-[:HAS]->(session:Session)
WHERE id(user) = 61
RETURN id(session)

MATCH (user)-[:HAS]->(doc:Doc)
WHERE id(user) = 61
RETURN id(doc)

MATCH (user:User)-[:HAS]->(session:Session),
(user)-[:HAS]->(doc:Doc)
WHERE id(user) = 61
RETURN id(session) as session_id, id(doc) as doc_id
ORDER BY session_id, doc_id

MATCH (user:User)-[:HAS]->(session:Session),
(user)-[:HAS]->(doc:Doc)
WHERE id(user) = 61
RETURN id(session) as session_id, count(doc) as docCount

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.