I'm running Neo4j v 4.1 and gds v1.4.
I'm trying to utilize ML tools to gain insights about a genetic genealogy graph database. It has CB_Match nodes with chromosome segment data on individuals matching one another; that is, they share segments.
I've created a virtual graph:
CALL gds.graph.create.cypher(
'myGraph',
"match (c:CB_Match) return id(c) as id",
"match (c1:CB_Match)-[r:match_by_segment{phased:'Y'}]-(c2:CB_Match) return id(c1) as source,id(c2) as target,r.cm as weight"
)
From this I can create an embedding variable for CB_Match nodes:
CALL gds.fastRP.stream('myGraph', {embeddingDimension: 4})
YIELD nodeId,embedding
with gds.util.asNode(nodeId).RN as RN,gds.util.asNode(nodeId).fullname as Name, embedding
return RN,Name,embedding order by RN,Name
I have used the write procedure to add this property to the CB_Match nodes.
Now I am trying to utilize the embedding property as described in the recent GDS anouncement, specifically neighborhood detection and visualization.
Following the documentation for KNN and its default value of {} for the configuration map, I ran the following:
CALL gds.beta.knn.stream(
'myGraph',
{ }
)
YIELD node1, node2, similarity
with gds.util.asNode(node1).fullname as Match1, gds.util.asNode(node1).fullname as Match2, similarity
return Match1, Match2, similarity limit 50
This produced an error, saying I omitted the required nodeWeightProperty from the configuration. So I added it
CALL gds.beta.knn.stream(
'myGraph',
{nodeWeightProperty:'embedding' }
)
YIELD node1, node2, similarity
with gds.util.asNode(node1).fullname as Match1, gds.util.asNode(node1).fullname as Match2, similarity
return Match1, Match2, similarity limit 50
and received an error that not every node had the embedding property ... which is not true.
Is this a bug or a problem with my logic?