Cypher Query Warning::The query contains an aggregation function that skips null values

I am getting this warning and not sure how to proceed

Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.AggregationSkippedNull} {category: UNRECOGNIZED} {title: The query contains an aggregation function that skips null values.} {description: null value eliminated in set function.} {position: None} for query: "\n            UNWIND $updates AS update\n            MATCH (d:Domain)\n            WHERE elementId(d) = update.element_id AND d.checked_out_by = $process_id\n\n            // Unset the checked_out flags\n            SET d.checked_out = FALSE, d.checked_out_by = NULL, d.checked_out_at = NULL\n            WITH d, update\n\n            // Find the latest status event per domain\n            OPTIONAL MATCH (d)-[:HAS_STATUS]->(s:StatusEvent)\n            WITH d, s, update\n\n            OPTIONAL MATCH (d)-[:HAS_STATUS]->(s:StatusEvent)\n            ORDER BY s.checkedAt DESC\n            WITH d, collect(s) AS statuses, update\n\n            // Handle the case where statuses might be empty\n            WITH d, \n                 CASE WHEN size(statuses) > 0 THEN statuses[0] ELSE NULL END AS latestStatus, \n                 update\n\n            // Decide whether to update the existing status or create a new one\n            FOREACH (_ IN CASE \n                WHEN update.new_status = 'Unknown' AND latestStatus IS NOT NULL AND latestStatus.status = 'Unknown' THEN [1] \n                WHEN latestStatus IS NOT NULL AND latestStatus.status = update.new_status THEN [1] \n                ELSE [] \n                END | \n                SET latestStatus.checkedAt = date()\n            )\n\n            // Create a new 'StatusEvent' if necessary\n            FOREACH (_ IN CASE \n                WHEN latestStatus IS NULL OR (latestStatus IS NOT NULL AND latestStatus.status <> update.new_status AND update.new_status <> 'Unknown') THEN [1] \n                ELSE [] \n                END | \n                CREATE (newStatus:StatusEvent {status: update.new_status, checkedAt: date()})\n                CREATE (d)-[:HAS_STATUS]->(newStatus)\n            )\n\n            // Return the updated domain information\n            RETURN d.url AS domain, \n                   update.new_status AS new_status, \n                   date() AS last_checked\n\n            "

Here's the exact query

            UNWIND $updates AS update
            MATCH (d:Domain)
            WHERE elementId(d) = update.element_id AND d.checked_out_by = $process_id

            // Unset the checked_out flags
            SET d.checked_out = FALSE, d.checked_out_by = NULL, d.checked_out_at = NULL
            WITH d, update

            // Find the latest status event per domain
            OPTIONAL MATCH (d)-[:HAS_STATUS]->(s:StatusEvent)
            WITH d, s, update

            OPTIONAL MATCH (d)-[:HAS_STATUS]->(s:StatusEvent)
            ORDER BY s.checkedAt DESC
            WITH d, collect(s) AS statuses, update

            // Handle the case where statuses might be empty
            WITH d, 
                 CASE WHEN size(statuses) > 0 THEN statuses[0] ELSE NULL END AS latestStatus, 
                 update

            // Decide whether to update the existing status or create a new one
            FOREACH (_ IN CASE 
                WHEN update.new_status = 'Unknown' AND latestStatus IS NOT NULL AND latestStatus.status = 'Unknown' THEN [1] 
                WHEN latestStatus IS NOT NULL AND latestStatus.status = update.new_status THEN [1] 
                ELSE [] 
                END | 
                SET latestStatus.checkedAt = date()
            )

            // Create a new 'StatusEvent' if necessary
            FOREACH (_ IN CASE 
                WHEN latestStatus IS NULL OR (latestStatus IS NOT NULL AND latestStatus.status <> update.new_status AND update.new_status <> 'Unknown') THEN [1] 
                ELSE [] 
                END | 
                CREATE (newStatus:StatusEvent {status: update.new_status, checkedAt: date()})
                CREATE (d)-[:HAS_STATUS]->(newStatus)
            )

            // Return the updated domain information
            RETURN d.url AS domain, 
                   update.new_status AS new_status, 
                   date() AS last_checked

Please keep the following things in mind:

neo4j --version
5.24.0

I am using no plugins just the browsers and some scripts to communicate via python.

I am not sure if that warning is severe, is it possible to rework the query not give warnings?

Hi @gurenchan ,
The notification warns you that collect(s) encountered a NULL value and ignored it. This behaviour of ignoring NULL can be confusing to some users and that is why the notification is there. If that is what you expect you can safely just ignore it.

I think for this particular query since you do OPTIONAL MATCH followed by collect, I think you can simply remove the OPTIONAL and get the same behaviour (without the warning). That said you can also safely just ignore the warning and leave the query as is.

Best regards,
Pontus

You can use the COLLECT subquery to make the code more readable. Also, the HEAD to get the first element, which will return NULL it one does not exists. The CALL subquery may also be more readable then the FOREACH implementations.

I think this should give you the same result. I reduced some of the boolean logic, which seemed unnecessary.

Sorry I don't have any test data to validate the refactored query with.

UNWIND $updates AS update
MATCH (d:Domain)
WHERE elementId(d) = update.element_id AND d.checked_out_by = $process_id

// Unset the checked_out flags
SET d.checked_out = FALSE, d.checked_out_by = NULL, d.checked_out_at = NULL

// Find the latest status event per domain
WITH d, update, HEAD(COLLECT {
    MATCH (d)-[:HAS_STATUS]->(s:StatusEvent)
    RETURN s
    ORDER BY s.checkedAt DESC
    LIMIT 1
}) AS latestStatus

// Update the existing status if necessary
CALL (update, latestStatus) {
    WITH update, latestStatus
    WHERE latestStatus IS NOT NULL AND latestStatus.status = update.new_status
    SET latestStatus.checkedAt = date()
}

// Create a new 'StatusEvent' if necessary
CALL (update, latestStatus) {
    WITH update, latestStatus
    WHERE latestStatus IS NULL 
    OR (latestStatus.status <> update.new_status AND update.new_status <> 'Unknown')
    CREATE (newStatus:StatusEvent {status: update.new_status, checkedAt: date()})
    CREATE (d)-[:HAS_STATUS]->(newStatus)
}

// Return the updated domain information
RETURN 
    d.url AS domain, 
    update.new_status AS new_status, 
    date() AS last_checked
1 Like