Cypher query for return node which <some_field> = max(<some_field>) value out of a group of nodes

Suppose I have the following data:

field1, field2
1,1
1,4
1,3
2,5
2,2
3,1
3,8
3,9
3,11
4,4

How can I write a cypher query to return the node that has the biggest value of field2 for a given field1 . i.e.

1,4
2,5
3,11
4,4

My attempts so far have resulted in the following (which doesn't work):

MATCH(a:A)
WITH COLLECT({field1: a.field1, field2: MAX(a.field2)}) as rowInfos
UNWIND rowInfos as rowInfo
WITH rowInfo.field1 as field1, rowInfo.field2 as maxField2
MATCH (a)
WHERE a.field1 = field1 AND a.field2 = maxField2
return a

If these nodes were rows in a SQL database then the following query would be used:

SELECT
a.* from A
INNER JOIN (
SELECT
field1, max(field2) as field2
FROM A
GROUP BY
field1
) ut
ON
a.field1 = ut.field1
AND a.field2 = ut.field2

Here is a solution,

MATCH (f:Fld1)-->(g:Fld2)
RETURN f.f1 as Fld1, max(g.f2) as Fld2;

Result:
lsmm2

MATCH (f:Fld1)-->(g:Fld2)
WITH f.f1 as Fld1, max(g.f2) as Fld2
MATCH (f1:Fld1)-->(g1:Fld2)
WHERE g1.f2 IN [Fld2]
RETURN f1, g1;

Result:
lsmm1

Turns out the query to achieve this is rather simple:

MATCH(b:A)
WITH b.field1 as field1, MAX(b.field2) as field2
WHERE b.field1 = field1
MATCH (b)
WHERE b.field1 = field1 AND b.field2 = field2
RETURN b

You don't need that first WHERE clause, the aggregation you're doing on line 2 is enough to get your groupings correct.

I've got a new aggregating APOC proc committed which should be able to more concisely handle a case like this, giving you all items that have the max (or min) of a certain value without having to do a rematch as in this query. That should go out with the next APOC release.

Hi,

How about a slight modification where the key value is in the relationship.
(A:Node1)-[r:by]-(c:NodeB)

Goal: for each A return B where r.key is maximum for the grouping of A to multiple B nodes.

Also which of the APOC procedures would avoid needing to do the rematch?
Andy

The function is apoc.agg.maxItems() (there's a minItems() variant as well).

You can use CALL apoc.help('maxItems()') to find the description and signature of the function. If it doesn't return, then your version of APOC doesn't contain it, and you might want to look for the latest compatible upgrade for your Neo4j version.

Usage would be like:

MATCH (A:Node1)-[r:by]-(c:NodeB)
WITH A, apoc.agg.maxItems(c, r.key) as result
RETURN A, result.value as max, result.items as topBNodes

In the params, note that we're treating the item (c) separate from the value for which we're calculating the max (r.key). In the resulting map, we can extract the value (this is the maximum value) and the items are the list of all the items with that max value (in case there were ties). If you just need 1, then you can use result.items[0] or head(result.items) to get the first one in the list.

Perfect,

Thank you very much.

Just for future reference for other newbies: In the
apoc.agg.maxItems(c, r.key) as result

Instead of just 'c' I used c.term where term is a property key that I am interested. Using just 'c' returned the whole node description and I am just interested in a property key value.

Andy

That's fine, but it will be more efficient to keep the function parameters as-is (keep using the variable for just the node) and wait to project the property in your RETURN, since you only need to return the property for the top item. During query execution, a node variable is a very lightweight structure, it will only access properties when you project them.

1 Like