Aggregate on distinct nodes

Keith_gamble · March 1, 2022, 7:12pm

I am currently trying to return some aggregated data on a set of communities, however due to the way that my graph is set up, I am getting a duplicate of some nodes.

MATCH (l:Order)-[]-(a:Equipment) 
WITH COLLECT(DISTINCT a) AS equipment, COLLECT(DISTINCT a.name) as equipmentNames, a.community_id AS cid
MATCH (l:Order)-[]-(b:Equipment)
WHERE b.name IN equipmentNames
MATCH (l)-[]-(o:OPCODE)
WIND equipment as equip
WITH DISTINCT b, o, l
WITH b.community_id as cid
	, SUM(DISTINCT o.valueToAggregate) + SUM(DISTINCT b.valueToAggregate) AS o.valueToAggregate
	, COLLECT(DISTINCT b.name) as equipment
	, COLLECT(DISTINCT b {.*}) AS equipmentProperties
RETURN equipment, o.valueToAggregate, equipmentProperties

If I include the SUM(DISTINCT o.valueToAggregate) and SUM(DISTINCT b.valueToAggregate) then if multiple equipment has the same valueToAggregate value, then I only see one of them. However I don't want the same piece of equipment (node) to get factored in twice. It is possible for the equipment a to be feeding itself b and then each one of those gets counted two times without the DISTINCT clause

Is it possible to do something to the affect of:
SUM(valueToAggregate FOR DISTINCT b) in this case?

glilienfield · March 1, 2022, 10:33pm

There are a few things here that I need help with. First is what is your intent with the first four lines. From my understanding, the collect statements in the second line will only collect over each value of a.community_id. It is like a GROUP BY clause in SQL. As such, the collects will be singleton lists. You can verify this by taking the first two lines and executing them after replacing the WITH with RETURN.

I believe the result of the first four lines is the same as if you execute 'MATCH (l:Order)--(b:Equipment)' instead.

Is it our requirement to sum up the aggregateValue properties on the Orders and OPCODEs per each piece of equipment? If so, the following will aggregate them:

MATCH (l:Order)--(b:Equipment)
MATCH (l)--(o:OPCODE)
WITH DISTINCT b, collect(distinct o) as co, collect(distinct l) as cl

You can then calculate the sum of the aggregateValue properties for a specific value of b, using the reduce operator on each collection co and cl:

sum of co elements for a specific b: reduce(s = 0, i in co | s + i.valueToAggregate)
sum of cl elements for a specific b: reduce(s = 0, i in cl | s + i.valueToAggregate)

What are you looking for with the WITH clause on line 8? Since you are specifying b.community_id, all the sums and collections should be restricted to that specific value of b. Again, a GROUP BY operation. As a result, I think those COLLECT statements are also singleton lists and the SUM(DISTINCT b.valueToAggregate) is the sum of one value, the b.valueToAggregate for the b value for this row.

Sorry for my confusion.

Keith_gamble · March 2, 2022, 2:02am

I think that rewriting it with that change to the first four and using the reduce operator simplified the query and its still the same data. I went through and tried to validate the data by hand, looks to be correct.

MATCH (l:Order)--(b:Equipment)
MATCH (l)--(o:OPCODE)
WITH DISTINCT b, collect(distinct o) as co, collect(distinct l) as cl
WITH b.community_id as cid
	, REDUCE(s = 0, o in co | s + o.activeDelay) + SUM(b.activeDelay) AS activeDelay
	, COLLECT(DISTINCT b.name) as equipment
	, COLLECT(DISTINCT b {.*}) AS equipmentProperties
RETURN equipment, activeDelay, equipmentProperties

glilienfield · March 2, 2022, 2:45am

Is your new query giving you want you want? I would think the collects are returning single values and the SUM(b.activeDelay) is contributing a single value.

Topic		Replies	Views
Query Syntax to avoid duplicate results Newbie Questions cypher	9	420	March 27, 2021
Aggregate related nodes across all rows Cypher apoc , browser	2	3273	December 17, 2018
Return distinct values from a merged collection Cypher cypher	1	286	December 4, 2021
Using Distinct in a standarized query Neo4j Graph Platform	2	372	January 25, 2021
How to show aggregate info in Graph Browser apoc , cypher , merge	8	4278	January 17, 2019

July Summer Fun!

Aggregate on distinct nodes

Related topics