How can I write a cypher query to return the node that has the biggest value of field2 for a given field1 . i.e.
1,4
2,5
3,11
4,4
My attempts so far have resulted in the following (which doesn't work):
MATCH(a:A)
WITH COLLECT({field1: a.field1, field2: MAX(a.field2)}) as rowInfos
UNWIND rowInfos as rowInfo
WITH rowInfo.field1 as field1, rowInfo.field2 as maxField2
MATCH (a)
WHERE a.field1 = field1 AND a.field2 = maxField2
return a
If these nodes were rows in a SQL database then the following query would be used:
SELECT
a.* from A
INNER JOIN (
SELECT
field1, max(field2) as field2
FROM A
GROUP BY
field1
) ut
ON
a.field1 = ut.field1
AND a.field2 = ut.field2
You don't need that first WHERE clause, the aggregation you're doing on line 2 is enough to get your groupings correct.
I've got a new aggregating APOC proc committed which should be able to more concisely handle a case like this, giving you all items that have the max (or min) of a certain value without having to do a rematch as in this query. That should go out with the next APOC release.
The function is apoc.agg.maxItems() (there's a minItems() variant as well).
You can use CALL apoc.help('maxItems()') to find the description and signature of the function. If it doesn't return, then your version of APOC doesn't contain it, and you might want to look for the latest compatible upgrade for your Neo4j version.
Usage would be like:
MATCH (A:Node1)-[r:by]-(c:NodeB)
WITH A, apoc.agg.maxItems(c, r.key) as result
RETURN A, result.value as max, result.items as topBNodes
In the params, note that we're treating the item (c) separate from the value for which we're calculating the max (r.key). In the resulting map, we can extract the value (this is the maximum value) and the items are the list of all the items with that max value (in case there were ties). If you just need 1, then you can use result.items[0] or head(result.items) to get the first one in the list.
Just for future reference for other newbies: In the
apoc.agg.maxItems(c, r.key) as result
Instead of just 'c' I used c.term where term is a property key that I am interested. Using just 'c' returned the whole node description and I am just interested in a property key value.
That's fine, but it will be more efficient to keep the function parameters as-is (keep using the variable for just the node) and wait to project the property in your RETURN, since you only need to return the property for the top item. During query execution, a node variable is a very lightweight structure, it will only access properties when you project them.