Help needed with Vector indexes and the Python Client

julio1 · June 25, 2024, 1:18am

Hi everyone, hope everything is going well.

I'm trying to run a classification process potential new nodes, based on a semantic search approach. I have created a vector index on the embeddings of a given category of nodes and then I do bulk upload a new nodes that I'd like to add to the network given their semantic similarity (from the embeddings). The problem is that when I run the following query in the Python client:

def cypher_query(tagname: str):
      query = f"""
          MATCH (t:Tags {{tagName: "{tagname}"}})
          CALL db.index.vector.queryNodes('tags-embeddings', 10, t.embedding)
          YIELD node AS Tags, score
          MATCH (Tags)<-[:Maps_to]-(s:Sectors)
          WHERE Tags.isNew IS NULL 
          RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
          """
     result = gds.run_cypher(query)
     return result

For many new nodes I don't get results at all. What I am trying to do here is to get the 10 most similar nodes in the graph to each new node.
To get results for all the new candidate nodes was to change K to 120, so the cypher query was like this:

def cypher_query(tagname: str):
     query = f"""
        MATCH (t:Tags {{tagName: "{tagname}"}})
        CALL db.index.vector.queryNodes('tags-embeddings', **120**, t.embedding)
        YIELD node AS Tags, score
        MATCH (Tags)<-[:Maps_to]-(s:Sectors)
        WHERE Tags.isNew IS NULL 
        RETURN Tags.tagName AS tagName, s.sectorName as sectorName, score
        """
     result = gds.run_cypher(query)
     return result

Questions.

Is this behaviour somehow expected?
If so, what can I do make sure I'm retrieving results for each node?
For each new node created, I'm assuming that the vector index is automatically assigned (I read that somewhere), but maybe I have to do something different.

Thanks in advance!!

michael.hunger · July 11, 2024, 6:46am

Yes that's expected with post filtering with any vector index.
As you get only N results and then you filter them out.

In these cases (and I guess you also don't have that many tags, pre-filtering ist actually better, you can use the cosine function.

Please also use named parameters, instead of string formatting for the tagname. You can add an index for Tag(isNew) or you could add a "NewTag" label for faster filtering.

    MATCH (t:Tags {{tagName: $tagname})
    MATCH (otherTag:Tag)
    WHERE otherTag.isNew IS NULL 
    // similarity comparison
    WITH t, otherTag, vector.similarity.cosine(t.embedding,otherTag.embedding) as score
    ORDER BY score DESC LIMIT 10 // top-K
    MATCH (otherTag)<-[:Maps_to]-(s:Sectors)
    RETURN otherTag.tagName AS tagName, s.sectorName as sectorName, score

Topic		Replies	Views
Questions about vector search index Cypher	0	338	November 8, 2023
Setting vector embedding to the node using the python SDK Neo4j Graph Platform python-tagged	1	679	January 31, 2024
Vector index needed! Neo4j Graph Platform	0	284	February 20, 2023
Neo4j Vector Index prefiltering GenAI	1	96	January 7, 2025
Can't get Neo4jVector.from_existing_index to work GenAI operations	1	113	November 14, 2024

July Summer Fun!

Help needed with Vector indexes and the Python Client

Related topics