cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Problems with clustering (GDS) and APOC queries

Hello everyone, I'm a newbie about Neo4J and I'm working on big graph network with clustering algorithms. I'm using python with neo4j python driver. I have a problem. I ran the Weakly Connected Components algorithm into my network (~6M nodes, 23M edges) with a label I've chosen and I successfully get results back in a table:

Now I want to build a virtual graph where I create virtual nodes and each node has the parameter {component:componentId} of query above. So, I want to build virtual edges like

(node{nodeId}) ---[BELONGS_TO] ---->(component{componentId), so I can plot something like clusters where more nodes are connected to a component node. There is a problem. With the query below, neo4j creates virtual duplicated nodes, even though I set "distinct" everywhere. So, instead of connecting n nodes to a single component node, it creates n component nodes, like the screen below:
3X_8_1_810bb6843ee237909469af59e51c79ed7d8bd73f.png

I just want those 3 red nodes connected to the same node "0", but instead a new ones get created. The same goes for other components. How can I solve my query to solve these duplicates and set correct relationships?

Query:

call gds.wcc.stream('gene-interactions') yield nodeId, componentId 
with componentId,nodeId,collect(distinct componentId) as componentList 
unwind componentList as component 
with distinct component, componentId, gds.util.asNode(nodeId) as n
call apoc.create.vNode(['component'],{component:component}) yield node
call apoc.create.vRelationship(n,'BELONGS_TO',{},node) yield rel 
return n,rel,node limit 30

Hope you can help me with this, my head is gone, but I still need to plot the correct result for my project. Thank you.

6 REPLIES 6

sameer_gijare14
Graph Buddy

I see a with clause invoked twice. Can you invoke it just once and modify the CQL accordingly. Please let me know if this solves your problem.
Many thanks
Mr Sameer S G

Still problem persists.

call gds.wcc.stream('gene-interactions') yield nodeId, componentId 
with componentId, gds.util.asNode(nodeId) as n,collect(distinct componentId) as componentList 
unwind componentList as component 
call apoc.create.vNode(['component'],{component:component}) yield node
call apoc.create.vRelationship(n,'BELONGS_TO',{},node) yield rel 
return n,rel,node limit 30

EDIT: I'm trying queries to check if the node{component:componentId} already exists with apoc.do.when but the optional match gives me always null, even though I create the nodes correctly.

call gds.wcc.stream('gene-interactions') yield nodeId, componentId
optional match (c:component{component:componentId})
with componentId, gds.util.asNode(nodeId) as n, c
call apoc.do.when(c is null, "call apoc.create.vNode(['component'],{component:componentId}) yield node return node", "",{componentId:componentId}) yield value
return value limit 10

You can simply create constraint on node with 0 to be unique and then try the first query you have shown me
Fingers crossed ?
Sameer

I already ran a constraint query yesterday, but I think the problem is that no component nodes exists when I run it,
create constraint componentConstraint if not exists for (c:component) require c.component is unique

The query runs with no problems, but when I create 2 nodes with the same {component} property, it has no warning and create them.

Add a propertyId while creating node and add unique constraint on id.So it will not let you create duplicate node with same id.Then reformulate your query as per your needs.May be now you have a fixed notion of data model that you want to create but that will change as you go
on adding new nodes
Many thanks
Mr Sameer S G

Unfortunately, as I read today, constraints won't work on virtual nodes. That's a big problem.