Graph Data Science: K-Nearest Neighbors

genealogy · October 22, 2020, 9:22pm

I'm running Neo4j v 4.1 and gds v1.4.

I'm trying to utilize ML tools to gain insights about a genetic genealogy graph database. It has CB_Match nodes with chromosome segment data on individuals matching one another; that is, they share segments.

I've created a virtual graph:

CALL gds.graph.create.cypher(
  'myGraph',
  "match (c:CB_Match)  return id(c) as id",
  "match (c1:CB_Match)-[r:match_by_segment{phased:'Y'}]-(c2:CB_Match)  return id(c1) as source,id(c2) as target,r.cm as weight" 
)

From this I can create an embedding variable for CB_Match nodes:

CALL gds.fastRP.stream('myGraph', {embeddingDimension: 4})
YIELD nodeId,embedding
with  gds.util.asNode(nodeId).RN as RN,gds.util.asNode(nodeId).fullname as Name, embedding
return RN,Name,embedding order by RN,Name

I have used the write procedure to add this property to the CB_Match nodes.

Now I am trying to utilize the embedding property as described in the recent GDS anouncement, specifically neighborhood detection and visualization.

Following the documentation for KNN and its default value of {} for the configuration map, I ran the following:

CALL gds.beta.knn.stream(
  'myGraph',
{ }
) 
YIELD  node1,  node2,  similarity
with  gds.util.asNode(node1).fullname as Match1, gds.util.asNode(node1).fullname as Match2, similarity
return Match1, Match2, similarity limit 50

This produced an error, saying I omitted the required nodeWeightProperty from the configuration. So I added it

CALL gds.beta.knn.stream(
  'myGraph',
{nodeWeightProperty:'embedding' }
) 
YIELD  node1,  node2,  similarity
with  gds.util.asNode(node1).fullname as Match1, gds.util.asNode(node1).fullname as Match2, similarity
return Match1, Match2, similarity limit 50

and received an error that not every node had the embedding property ... which is not true.

Is this a bug or a problem with my logic?

mats.rydberg · October 23, 2020, 4:19pm

In order to feed in the properties computed by FastRP you will need to use the mutate mode to add them to the in-memory graph (the one you call 'myGraph'). The write mode will only write them to Neo4j. You can reload them from Neo4j as well, but then you will have to project a new in-memory graph where you also declare the properties, and this is less efficient compared to using mutate.

You can read more about the different execution modes here: Running algorithms - Neo4j Graph Data Science

genealogy · October 24, 2020, 4:35am

Thanks. The in memory graph I created did have the "embedding" property. It was a two step process which was less efficient as you note. But I did have the property in the 2nd iteration of the in memory graph. Yet I still got the error. So I still am puzzled by it not working. Is it a bug or my logic?

vnickolov · October 29, 2020, 3:50am

Hello, the error "that not every node had the embedding property" is because of the nodeQuery doesn't contain the embedding property and hence it is absent from the in-memory graph even though it is in the Neo4j DB. You can check the documentation how to add the node property: https://neo4j.com/docs/graph-data-science/current/management-ops/cypher-projection/#cypher-projection-properties.

I hope this helps.

genealogy · December 5, 2020, 6:16pm

Your suggestion solved the initial problem. That is, the embedded property in the virtual graph now enables the KNN algo. Now I need to optimize the parameters!

Topic		Replies	Views
How to apply graph embedding to KNN algorithm? Neo4j Graph Platform	4	402	November 16, 2021
Failed to perform k-Nearest Neighbors on my graph Graph Algorithms/Graph Data Science gds	1	452	May 10, 2023
KNN New GDS production changes - How to write to the default db Graph Algorithms/Graph Data Science cypher	5	278	April 20, 2022
Using Graph embeddings for other than training an ML model Graph Algorithms/Graph Data Science	2	388	June 28, 2021
Apply Graph ML algorithms on neo4j database Python	1	353	December 5, 2023

July Summer Fun!

Graph Data Science: K-Nearest Neighbors

Related topics