Hello,
I am pretty new to Neo4j and the Graph Data Science library and I have to say I am often very puzzled by some of the limitations and/or design choices. I am sure there are good reasons and it is my lack of experience and understanding which is creating this confusion, hence I am turning for help here.
I have a graph database I am projecting into a GDS graph to perform network analysis. The graph contains a large component and few isolated single-node components. I want the ability to drop such nodes from the graph to avoid this noise impacting the different graph analysis algorithms but I can't find a way to do this. The GDS library has an operation to drop relationships or remove node properties but not to drop nodes. This seems like a pretty common use case so I am surprised there is no operation for it. So I guess I am missing something obvious. What is the best way to achieve this?
Welcome to the world of GDS and thanks for your question. One technique is to leverage is Weakly Connected Components a community detection algorithm. You can apply the wcc component id to each node. From there you will be able to filter your projection to only include the nodes where the componentid property matches the largest component id.
Note that gds.wcc.write will write the componentId to the database on disk.
The following cypher query will give you an output containing the componentId and the count of nodes in this largest component. Make note of this componentId as you will then feed this into your filter.
gds.run_cypher("""MATCH (n:NodeOfChoice)
RETURN n.componentId AS componentId,
COUNT(*) AS nodeCount
ORDER BY count(*) DESC
LIMIT 1""")
Now that you have written the componentId to DB and you know which is the largest component you can drop the current projection and create a new projection that contains just the nodes where the property 'componentId' matches that of the largest component.
Hello @alison.cossette thank you for replying and reaching out to help. I understand about Weakly Connected Component. The question is really about, once I have that WCC information how do I filter those nodes out.
I don't want to write back to the main database. This is just temporary information. I want to filter
that directly in the graph. But it seems that filtering using a subgraph should do the trick using gds.beta.graph.project.subgraph.