Help needed with Vector indexes and the Python Client

Hi everyone, hope everything is going well.

I'm trying to run a classification process potential new nodes, based on a semantic search approach. I have created a vector index on the embeddings of a given category of nodes and then I do bulk upload a new nodes that I'd like to add to the network given their semantic similarity (from the embeddings). The problem is that when I run the following query in the Python client:

def cypher_query(tagname: str):
      query = f"""
          MATCH (t:Tags {{tagName: "{tagname}"}})
          CALL db.index.vector.queryNodes('tags-embeddings', 10, t.embedding)
          YIELD node AS Tags, score
          MATCH (Tags)<-[:Maps_to]-(s:Sectors)
          WHERE Tags.isNew IS NULL 
          RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
          """
     result = gds.run_cypher(query)
     return result

For many new nodes I don't get results at all. What I am trying to do here is to get the 10 most similar nodes in the graph to each new node.
To get results for all the new candidate nodes was to change K to 120, so the cypher query was like this:

def cypher_query(tagname: str):
     query = f"""
        MATCH (t:Tags {{tagName: "{tagname}"}})
        CALL db.index.vector.queryNodes('tags-embeddings', **120**, t.embedding)
        YIELD node AS Tags, score
        MATCH (Tags)<-[:Maps_to]-(s:Sectors)
        WHERE Tags.isNew IS NULL 
        RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
        """
     result = gds.run_cypher(query)
     return result

Questions.

  1. Is this behaviour somehow expected?
  2. If so, what can I do make sure I'm retrieving results for each node?
  3. For each new node created, I'm assuming that the vector index is automatically assigned (I read that somewhere), but maybe I have to do something different.

Thanks in advance!!

Yes that's expected with post filtering with any vector index.
As you get only N results and then you filter them out.

In these cases (and I guess you also don't have that many tags, pre-filtering ist actually better, you can use the cosine function.

Please also use named parameters, instead of string formatting for the tagname. You can add an index for Tag(isNew) or you could add a "NewTag" label for faster filtering.

    MATCH (t:Tags {{tagName: $tagname})
    MATCH (otherTag:Tag)
    WHERE otherTag.isNew IS NULL 
    // similarity comparison
    WITH t, otherTag, vector.similarity.cosine(t.embedding,otherTag.embedding) as score
    ORDER BY score DESC LIMIT 10 // top-K
    MATCH (otherTag)<-[:Maps_to]-(s:Sectors)
    RETURN otherTag.tagName AS tagName, s.sectorName as sectorName, score