Cypher query for return node which <some_field> = max(<some_field>) value out of a group of nodes

loudsight · July 31, 2019, 3:54pm

Suppose I have the following data:

field1, field2
1,1
1,4
1,3
2,5
2,2
3,1
3,8
3,9
3,11
4,4

How can I write a cypher query to return the node that has the biggest value of field2 for a given field1 . i.e.

1,4
2,5
3,11
4,4

My attempts so far have resulted in the following (which doesn't work):

MATCH(a:A)
WITH COLLECT({field1: a.field1, field2: MAX(a.field2)}) as rowInfos
UNWIND rowInfos as rowInfo
WITH rowInfo.field1 as field1, rowInfo.field2 as maxField2
MATCH (a)
WHERE a.field1 = field1 AND a.field2 = maxField2
return a

If these nodes were rows in a SQL database then the following query would be used:

SELECT
a.* from A
INNER JOIN (
SELECT
field1, max(field2) as field2
FROM A
GROUP BY
field1
) ut
ON
a.field1 = ut.field1
AND a.field2 = ut.field2

ameyasoft · July 31, 2019, 9:12pm

Here is a solution,

MATCH (f:Fld1)-->(g:Fld2)
RETURN f.f1 as Fld1, max(g.f2) as Fld2;

Result:
lsmm2

MATCH (f:Fld1)-->(g:Fld2)
WITH f.f1 as Fld1, max(g.f2) as Fld2
MATCH (f1:Fld1)-->(g1:Fld2)
WHERE g1.f2 IN [Fld2]
RETURN f1, g1;

Result:
lsmm1

loudsight · August 1, 2019, 3:58pm

Turns out the query to achieve this is rather simple:

MATCH(b:A)
WITH b.field1 as field1, MAX(b.field2) as field2
WHERE b.field1 = field1
MATCH (b)
WHERE b.field1 = field1 AND b.field2 = field2
RETURN b

andrew_bowman · August 1, 2019, 5:09pm

You don't need that first WHERE clause, the aggregation you're doing on line 2 is enough to get your groupings correct.

I've got a new aggregating APOC proc committed which should be able to more concisely handle a case like this, giving you all items that have the max (or min) of a certain value without having to do a rematch as in this query. That should go out with the next APOC release.

andy_hegedus · October 9, 2020, 8:06pm

Hi,

How about a slight modification where the key value is in the relationship.
(A:Node1)-[r:by]-(c:NodeB)

Goal: for each A return B where r.key is maximum for the grouping of A to multiple B nodes.

Also which of the APOC procedures would avoid needing to do the rematch?
Andy

andrew_bowman · October 9, 2020, 10:57pm

The function is apoc.agg.maxItems() (there's a minItems() variant as well).

You can use CALL apoc.help('maxItems()') to find the description and signature of the function. If it doesn't return, then your version of APOC doesn't contain it, and you might want to look for the latest compatible upgrade for your Neo4j version.

Usage would be like:

MATCH (A:Node1)-[r:by]-(c:NodeB)
WITH A, apoc.agg.maxItems(c, r.key) as result
RETURN A, result.value as max, result.items as topBNodes

In the params, note that we're treating the item (c) separate from the value for which we're calculating the max (r.key). In the resulting map, we can extract the value (this is the maximum value) and the items are the list of all the items with that max value (in case there were ties). If you just need 1, then you can use result.items[0] or head(result.items) to get the first one in the list.

andy_hegedus · October 9, 2020, 11:50pm

Perfect,

Thank you very much.

Just for future reference for other newbies: In the
apoc.agg.maxItems(c, r.key) as result

Instead of just 'c' I used c.term where term is a property key that I am interested. Using just 'c' returned the whole node description and I am just interested in a property key value.

Andy

andrew_bowman · October 10, 2020, 1:49am

That's fine, but it will be more efficient to keep the function parameters as-is (keep using the variable for just the node) and wait to project the property in your RETURN, since you only need to return the property for the top item. During query execution, a node variable is a very lightweight structure, it will only access properties when you project them.

Topic		Replies	Views
Grouping values with Cypher query Graph Algorithms/Graph Data Science cypher	6	355	October 25, 2021
How can I find the most recent nodes that match a query? Cypher	11	1060	February 28, 2021
Neo4j returning node + subquery results Newbie Questions	2	451	June 16, 2020
Find the largest clusters of a database Neo4j Graph Platform migrated	1	84	August 26, 2022
Problem on the UNWIND clause Cypher cypher	1	330	July 15, 2021

Get Certified in June!

Cypher query for return node which <some_field> = max(<some_field>) value out of a group of nodes

Related topics