What is the proper way to use aggregation functions when creating relationships?

I have been into a finer point of Cypher syntax and I keep running into dead ends when searching about this issue.

The goal is relatively simple: I have a social graph made of Persons following Groups.
I would like to simplify it by creating a relationship between groups that have members in common... but I would like to weigh these relationships by the number of members they have in common.

I came up with this query:

MATCH p=(g1:Group)< -[r1:IS_MEMBER_OF]-(u:Person)-[r2:IS_MEMBER_OF]- >(g2:Group) 
MERGE (g1)-[r:SHARES_MEMBERS_WITH]->(g2) 
RETURN g1.group_id,g2.group_id, count(u)

This works fine - it creates the relationships and gives me a count of members for each.

My question is: how do I inject that count in the relationship itself?

I tried using 'ON CREATE SET r.members_count = count(u)' but that gives me an error about count() being used in the wrong context.

I tried giving an alias to count() with something like 'WITH count(u) as members' after the MATCH, but that gives me the total count of all members in common across all paths.. not the count for a given path.

I feel I am very close to the answer but the syntax still eludes me.

You are close, but you're trying to do everything at the same time. You need to aggregate first, then use the the result of the aggregation in the MERGE of the relationship.

Also you'll want to add some filtering to avoid mirrored results (where the same groups come up on at least two rows, just switching which one is g1 and which one is g2), and that's best done by adding an inequality filter on the ids of the nodes.

Also the MERGE should probably remove direction, if the direction doesn't matter to you, otherwise it's possible you could get results where you've created two relationships between a set of group nodes, going the opposite direction from each other.

MATCH (g1:Group)< -[:IS_MEMBER_OF]-(u:Person)-[:IS_MEMBER_OF]- >(g2:Group) 
WHERE id(g1) < id(g2)
WITH g1, g2, count(u) as personCount
MERGE (g1)-[r:SHARES_MEMBERS_WITH]-(g2) 
SET r.members_count = personCount
RETURN g1.group_id,g2.group_id, personCount // only if you actually need data returned.

Thanks for sending me on the right track!
Your query returns all relationships in duplicates even with the filter but at least I get the right count per relationship, so I can work with that. I will check if it is something in my data.

If there are duplicates coming back, then one of two things is likely happening:

  1. You might already have :SHARES_MEMBER_WITH relationships between nodes, and if there are multiple relationships between two nodes already present, the MERGE would match to those relationships, and your return would contain multiple rows with the same two groups and person count.

  2. Multiple :Group nodes may exist with the same group_id, which could make it seem like duplicate results, but the nodes are actually different.

I'd wager the first case is what's going on, let us know what you find.

That was indeed the first case. Duplicates vanished after I deleted the past relationships.

Thanks again! That allowed be to turn a whole python script into a single line of Cypher :slight_smile: