Merge/group nodes with a same property

Hi all,

I came across a merge/group problem. I have to build a "supernode" from nodes sharing the same property. I've been trying to use apoc.refactor.mergeNodes to get the desired result with no luck.
I need to merge the nodes but also to sum up the weights of the edges (considering direction) that will be connected to the new supernode:

I also need this procedure to be "virtual", I don't need it to be reflected back to the database. I only need these supernodes for visualization.

Is it possible? Can someone give me a hint?

Could you do something like this?

MATCH (startingNode)-->(nodesToMerge) 
  WHERE (nodesToMerge).theProperty = "some criteria"
OPTIONAL MATCH (nodesToMerge)-[theRel]->(childOfMergedNode)
WITH nodesToMerge, childOfMergedNode, sum(theRel.weight) AS weightToChild
CREATE (mergedNode {theProperty:"some criteria"})-[newRel {weight:weightToChild}]->(childOfMergedNode)

Obviously, this creates a new node rather than doing a virtual one. But if I'm understanding the situation correctly (and your drawings really helped there!!), maybe this can be a starting point toward getting the virtual node-relationship solution?

Thanks for the help. I'm going to try your approach.
I thought that this kind of operation was not that difficult.
I really need the result to be virtual, as I'm not supposed to write in the db through a visualization application. Or, at least I have to build another graph only to represent the merged nodes, but I also don't know how yet.

Making the virtual node/ relationship shouldn't be a problem. It's just been a while since I've done that and I'm sorry to say that I can't do it off the top of my head.

Hi @guinametal !

I created you example with

CREATE(n1:NODE)
SET n1.ID = 1
with *
CREATE(n2:NODE)
SET n2.ID = 2
with *
CREATE(n3:NODE)
SET n3.ID = 3
with *
CREATE(n4:NODE)
SET n4.ID = 4
with *
CREATE(n5:NODE)
SET n5.ID = 5
with *
CREATE(n6:NODE)
SET n6.ID = 6
with *
CREATE(n1)-[:CON {w : 1}]->(n2)
CREATE(n3)-[:CON {w : 1}]->(n1)
CREATE(n2)-[:CON {w : 4}]->(n3)
CREATE(n4)-[:CON {w : 2}]->(n2)
CREATE(n2)-[:CON {w : 3}]->(n5)
CREATE(n5)-[:CON {w : 2}]->(n4)
CREATE(n3)-[:CON {w : 7}]->(n5)
CREATE(n3)-[:CON {w : 3}]->(n6)
CREATE(n6)-[:CON {w : 6}]->(n5)
CREATE(n6)-[:CON {w : 4}]->(n6)

MATCH(n:NODE)
where n.ID = 2 or n.ID = 3
SET n.MERGE = true

Where I used a MERGE property to decide the nodes that are supposed to be merged.

Using

MATCH (n1:NODE{MERGE: true})-[:CON]->(n2{MERGE: true})
with n1, n2
CALL apoc.nodes.collapse([n1, n2],{properties:'combine'})
yield from, rel, to
return from, rel, to

You will get the expected result. You may keep on consideration that this work perfectly because every node expected in the final result is somehow connected to one of the nodes to be merged. Also, it works fine because you have 2 nodes to be merged, not sure what may happen if you have a 3-way supernode.

If you have some other use cases (with a script to create them) I will be happy to help.

H

Hi @Bennu ,

Thanks for the help.
In fact, I have to collapse several nodes with the same property. apoc.collapse can take an array of any size.
Now I'm stuck in how to "collect" one set of nodes with a matching a property and then collapse them.
When I use, for example, MATCH (a:NODE), (b:NODE {property: a.property}) , it return all sets of matching nodes. So I need to get a set of nodes, collapse them and do it again with all the other sets.

Hi @guinametal !

If you create a DB on Aura with your use case I can try to help directly on your data. In any case, you may like taking a look on the unwind and with predicate in order to so. Somethin' like:

MATCH(a:NODE)
with a
MATCH(b:NODE {property : a.property})
with a, collect(b) as nodes
CALL apoc.nodes.collapse(nodes,{properties:'combine'})
yield from, rel, to
return from, rel, to

Well, It came without unwind tho...

Bennu

Thanks again @Bennu!

I kinda understand your approach, however the collapsed nodes appear twice in the return.graph

Additionally, as I mentioned in the first post, I need to sum up the interger property of edges connecting the collapsed nodes with the others.

ey! @guinametal

I hope my aproach it's helps you in order to find the right answer. If you find a way to share your specific use case, lemme know :slight_smile: It's kinda hard when you get into this particular/hard problems.

Bennu

PS: I love hard problems

Hi again @Bennu!

I've managed to solve the problem of duplicate nodes. It's a classic Neo4j cardinality issue.

MATCH(a:NODE)-[:CON]-()
with a, id(a) as IDa
MATCH(b:NODE {property:a.property})
WHERE id(b) > IDa
with a + collect(b) as nodes
CALL apoc.nodes.collapse(nodes,{properties:'combine'})
yield from, rel, to
return from, rel, to

graph

However, when I tried to apply the very same code to the real graph (the graph in this post is very simplified version of the real graph), I got an error:

Failed to invoke procedure `apoc.nodes.collapse`: Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')

In both graph the matched property is a string.

Do you have any clues on that?