Hello community!
I have User and Group nodes. A user can be a member of any number of groups (or not a member of any) with a directed relationship IN_GROUP.
I want to find all users who are members of the same set of groups, create a separate Сluster node for them, create an IN_CLUSTER relationship between them and this Cluster node, and also create a RELATED relationship between the cluster and groups of these users.
Below are some screenshots of what I need:
I have users, each of which is in a specific set of groups:
As you can see, User_1, User_2 and User_3 have the same set of groups they belong to (Group_1, Group_2 and Group_3) - this is the first cluster. User_4 belongs to all groups - this is the second cluster. And the User_5 belongs to only one group - Group_5 - this is the third cluster.
Here's what we get:
Now we connect the clusters with users groups:
This is what I want to end up with:
I have some code that does the job, but its timing is unacceptable.
MATCH (u:User)
WITH [(u)-[:IN_GROUP]->(g:Group) | g] as groups, u
WITH apoc.coll.sortNodes(groups, "name") as groups, u
WITH apoc.util.md5(groups) as cluster_hash, groups, u
MERGE (c: Cluster {hash: cluster_hash})
CREATE (u)-[:IN_CLUSTER]->(c)
FOREACH (group IN groups |
MERGE (c)-[:RELATED]->(group))
On my dataset (several hundred thousand users and the same number of groups), this takes about 30 minutes to complete. I need a result in 5 seconds.
I'm able to use the apoc library.
Here's a cipher that creates a test data set from the above example:
Summary
CREATE (u1:User {name:"User_1"})
CREATE (u2:User {name:"User_2"})
CREATE (u3:User {name:"User_3"})
CREATE (u4:User {name:"User_4"})
CREATE (u5:User {name:"User_5"})
CREATE (g1:Group {name:"Group_1"})
CREATE (g2:Group {name:"Group_2"})
CREATE (g3:Group {name:"Group_3"})
CREATE (g4:Group {name:"Group_4"})
CREATE (g5:Group {name:"Group_5"})
MERGE (u1)-[:IN_GROUP]->(g1)
MERGE (u1)-[:IN_GROUP]->(g2)
MERGE (u1)-[:IN_GROUP]->(g3)
MERGE (u2)-[:IN_GROUP]->(g1)
MERGE (u2)-[:IN_GROUP]->(g2)
MERGE (u2)-[:IN_GROUP]->(g3)
MERGE (u3)-[:IN_GROUP]->(g1)
MERGE (u3)-[:IN_GROUP]->(g2)
MERGE (u3)-[:IN_GROUP]->(g3)
MERGE (u4)-[:IN_GROUP]->(g1)
MERGE (u4)-[:IN_GROUP]->(g2)
MERGE (u4)-[:IN_GROUP]->(g3)
MERGE (u4)-[:IN_GROUP]->(g4)
MERGE (u4)-[:IN_GROUP]->(g5)
MERGE (u5)-[:IN_GROUP]->(g5)
RETURN u1, u2, u3, u4, u5, g1, g2, g3, g4, g5
Neo4j version: 4.3.3