What is the proper way to use aggregation functions when creating relationships?

lalquier · November 5, 2019, 4:49pm

I have been into a finer point of Cypher syntax and I keep running into dead ends when searching about this issue.

The goal is relatively simple: I have a social graph made of Persons following Groups.
I would like to simplify it by creating a relationship between groups that have members in common... but I would like to weigh these relationships by the number of members they have in common.

I came up with this query:

MATCH p=(g1:Group)< -[r1:IS_MEMBER_OF]-(u:Person)-[r2:IS_MEMBER_OF]- >(g2:Group) 
MERGE (g1)-[r:SHARES_MEMBERS_WITH]->(g2) 
RETURN g1.group_id,g2.group_id, count(u)

This works fine - it creates the relationships and gives me a count of members for each.

My question is: how do I inject that count in the relationship itself?

I tried using 'ON CREATE SET r.members_count = count(u)' but that gives me an error about count() being used in the wrong context.

I tried giving an alias to count() with something like 'WITH count(u) as members' after the MATCH, but that gives me the total count of all members in common across all paths.. not the count for a given path.

I feel I am very close to the answer but the syntax still eludes me.

andrew_bowman · November 5, 2019, 4:56pm

You are close, but you're trying to do everything at the same time. You need to aggregate first, then use the the result of the aggregation in the MERGE of the relationship.

Also you'll want to add some filtering to avoid mirrored results (where the same groups come up on at least two rows, just switching which one is g1 and which one is g2), and that's best done by adding an inequality filter on the ids of the nodes.

Also the MERGE should probably remove direction, if the direction doesn't matter to you, otherwise it's possible you could get results where you've created two relationships between a set of group nodes, going the opposite direction from each other.

MATCH (g1:Group)< -[:IS_MEMBER_OF]-(u:Person)-[:IS_MEMBER_OF]- >(g2:Group) 
WHERE id(g1) < id(g2)
WITH g1, g2, count(u) as personCount
MERGE (g1)-[r:SHARES_MEMBERS_WITH]-(g2) 
SET r.members_count = personCount
RETURN g1.group_id,g2.group_id, personCount // only if you actually need data returned.

lalquier · November 5, 2019, 5:26pm

Thanks for sending me on the right track!
Your query returns all relationships in duplicates even with the filter but at least I get the right count per relationship, so I can work with that. I will check if it is something in my data.

andrew_bowman · November 5, 2019, 6:01pm

If there are duplicates coming back, then one of two things is likely happening:

You might already have :SHARES_MEMBER_WITH relationships between nodes, and if there are multiple relationships between two nodes already present, the MERGE would match to those relationships, and your return would contain multiple rows with the same two groups and person count.
Multiple :Group nodes may exist with the same group_id, which could make it seem like duplicate results, but the nodes are actually different.

I'd wager the first case is what's going on, let us know what you find.

lalquier · November 5, 2019, 6:21pm

That was indeed the first case. Duplicates vanished after I deleted the past relationships.

Thanks again! That allowed be to turn a whole python script into a single line of Cypher

Topic		Replies	Views
Aggregation, Returns, & Functions Neo4j Website	0	1628	August 5, 2020
Count Relationships per Row? Cypher operations	1	344	July 20, 2021
How to aggregate all relationships with same parents into one Cypher cypher	2	363	November 9, 2020
Aggregating functions and coVariance analysis Cypher cypher	3	252	March 18, 2022
How to correctly create and count new relationships to avoid duplicates? Cypher cypher	1	57	February 11, 2025

Get Certified in June!

What is the proper way to use aggregation functions when creating relationships?

Related topics