cancel
Showing results for 
Search instead for 
Did you mean: 

Searching for a query to draw GDS clusters results

Hi, I'm using Python and neo4j driver to communicate with my graph database. I use and external python library to plot my Cypher queries. The problems come with GDS library procedures. I'm working with Louvain, WCC, SCC algorithms but Neo4J just gives me back a table of results. For example, using WCC I get these results:

call gds.wcc.stream('biotech-interactions') yield nodeId, componentId return nodeId,componentId limit 20

3X_7_f_7fa36ac1a45de4021f6b99cdf9edf489015210b5.png
They are good, but I need to plot something that the user can see with its eyes. This is what I tried:

call gds.wcc.stream('biotech-interactions') yield nodeId, componentId 
with gds.util.asNode(nodeId) as n, componentId
call apoc.create.vNode(['component'],{componentId:componentId}) yield node
call apoc.create.vRelationship(n,"Belongs",{},node) yield rel
return n,rel,node limit 200

I tried to create virtual centroids and then attach real nodes to their respective centroids (componentIDs) with a virtual relationships, but everytime a componentId appears, Neo4J creates a new centroid. So, If you look at the table above, I have three componentID "2", so I want a virtual node marked as "2" and attach the three real nodes to it. But Neo4J creates three times the same virtual node, and this is what I get
3X_a_7_a729384d50a51d18adb77088cc4e5765743204e3.png
I tried to collect the componentsIDs in a distincted list, but they get duplicated anyway. So my problem is how can I create distinct centroids and attach nodes to them following my clustering algorithm?

1 REPLY 1

david_allen
Neo4j
Neo4j

I think the problem you're encountering with duplication is that when you get the results back from the stream, you're not ordering by / taking distinct componentIds. You're processing the records node at a time, and telling apoc to create (not merge) a vNode, so the result you're getting is expected.

I'm not going to get this exactly right, but directionally, it's going to be something like this:

call gds.wcc.stream('biotech-interactions') yield nodeId, componentId 
/* Group by distinct componentID so they only get created once */
WITH distinct(componentId) as componentId, collect(nodeId) as membersOfComponent
CALL apoc.create.vNode(['component'], {componentId: componentId}) yield node
with node, componentId, membersOfComponent
UNWIND membersOfComponent as member
with componentId, node, gds.util.asNode(member) as n
call apoc.create.vRelationship(n, "Belongs", {}, node) yield rel
RETURN n,rel,node limit 200

Do you need to be using the stream variant with virtual relationships and nodes though? This might be easier to work through if you use the mutate version of the algorithm, and then materialize your centroids.

For example something like:

CALL gds.wcc.mutate('myGraph', { mutateProperty: 'componentId' });

MATCH (m:MyNode)
WHERE m.componentId IS NOT NULL
WITH DISTINCT(m.componentId) as componentId
CREATE (c:Centroid { id: componentId })
WITH c, componentId
MATCH (m:MyNode { componentId: componentId })
CREATE (c)-[:LINK]->(m)

(Dashed off in a hurry, might not be exactly right, just showing the concept)