Hi everyone, hope everything is going well.
I'm trying to run a classification process potential new nodes, based on a semantic search approach. I have created a vector index on the embeddings of a given category of nodes and then I do bulk upload a new nodes that I'd like to add to the network given their semantic similarity (from the embeddings). The problem is that when I run the following query in the Python client:
def cypher_query(tagname: str):
query = f"""
MATCH (t:Tags {{tagName: "{tagname}"}})
CALL db.index.vector.queryNodes('tags-embeddings', 10, t.embedding)
YIELD node AS Tags, score
MATCH (Tags)<-[:Maps_to]-(s:Sectors)
RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
result = gds.run_cypher(query)
return result
For many new nodes I don't get results at all. What I am trying to do here is to get the 10 most similar nodes in the graph to each new node.
To get results for all the new candidate nodes was to change K to 120, so the cypher query was like this:
def cypher_query(tagname: str):
query = f"""
MATCH (t:Tags {{tagName: "{tagname}"}})
CALL db.index.vector.queryNodes('tags-embeddings', **120**, t.embedding)
YIELD node AS Tags, score
MATCH (Tags)<-[:Maps_to]-(s:Sectors)
RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
result = gds.run_cypher(query)
return result
- Is this behaviour somehow expected?
- If so, what can I do make sure I'm retrieving results for each node?
- For each new node created, I'm assuming that the vector index is automatically assigned (I read that somewhere), but maybe I have to do something different.
Thanks in advance!!