Vector store and local llm

Hi,

Has anyone done a RAG example with local llms and neo4j as vector store? I am doing my embeddings and load them as a vector index to neo4j, and then use similarity_serach with query that doesn't work. I think it misses the embedding model, then I uses CALL db.index.vector.queryNodes('{index_name}', n, {queyr_vec}) to pass the query embedding via python but still doesn't work. Do you know of any other work around?

Hi,

I dont have an example of using a local llm with neo4j as a vector store, but I have used several different embedding models and Neo4j vectors.

and then use similarity_serach with query that doesn't work.

Do you mean you are using Python + Langchain + Neo4jVector ? e.g. Neo4j Vector Index | 🦜️🔗 LangChain

To confirm - are you using the same embedding model to generate the embeddings in Neo4j as you are using the generate the embedding for the query you pass to queryNodes?

Thanks for your follow up,

using similarity_search needs an embedding model, so the question with local llm is how to set embedding model for neo4j to use the local model?

regarding your question, yes I am using python langchain and neo4jvector.

yes it is the same embedding model, I use same process with vector.similarity.cosine which is working.

Ok, do you mean.

How can I use my local embedding model with langchain and Neo4jVector?

e.g.

When using OpenAI embedding model you would use something like this?

from langchain_community.vectorstores.neo4j_vector import Neo4jVector
from langchain_openai import OpenAIEmbeddings

index_name = "vector"  # default index name

store = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name=index_name,
)

To use your own embedding model you would have to pass an instance of a langchain Embeddings.

There is a list of embedding models supported by langchain in the documentation.

Yes, exactly.

I followed your instruction and used LlamaCppEmbeddings(model_path=...). It seems it is working as it runs with no error, but when I query the same question in two way of:

  1. similarity_search_with_score
  2. work around to embed the question and use vector.similarity.cosine()

The first approach return nothing while the second approach provide answer.

I am trying something similar (graph db and local llm). However I am using the new pipeline api (SImpleKGPipeline). But it hasnt been been working.

When I use a local llm running mistral-7b-instruct deployed using llama_cpp.python based server, it does not work (I get LLMGenerationError exception as noted at the end of this message below). But when I use the true OpenAI api endpoint as the LLM, it works atleast partly and I dont get the errors, even though the graph created does not have relationships .. mostly looks like it is just chunking. I do not have any predefined schema for entities since the input document can be anything.

I am using the OpenAILLM class from neo4j_graphrag.llm as suggested at this link for both the local LLM and the openai llm (with different URL and api key etc of course.
Questions:

  1. Is this the right way to access both local LLM and openAI LLM ? If so, why does it not work with the local mistral but works (to some extent) with openai ?
  2. Has the SimpleKGPipeline() been tested with local llms ? Is this supported ? Which local LLMs has this been tested with or would be recommended to give better results for generating the KG ?
  3. Also how can I improve the quality of the graph relationships ? are there some token limits or other model/ chunking parameters that influence the quality of the KG relationship building ?
  4. Any document or info on what sort of prompt is created to submit to the LLM in order to generate the nodes/ relationships ? Or any best practices such as for setting the LLM model parameters for best results (and how these parameters will differ for specific local LLMs vs the openai LLM api) ? I am currently using the model parameters as follows
    model_params={
    "temperature": 0.0,
    "max_tokens": 2000,
    "response_format": {"type": "json_object"},
    "seed": 123
    }
  5. Is there any possibility that the LLM will retain memory of prior KG creation queries when responding to newer prompts and queries ? We wouldnt want any such stickiness, so how to ensure this doesnt happen ?

File "/opt/app-root/lib64/python3.11/site-packages/neo4j_graphrag/llm/openai_llm.py", line 109, in ainvoke
raise LLMGenerationError(e)
neo4j_graphrag.exceptions.LLMGenerationError: Error code: 404 - {'detail': 'Not Found'}

Hey @sr.2357 , a 404 suggests that the URL the package is trying to request does not exist.

Is the server running? The llama.cpp README file says to run the ./llama-server script.

Once it is running, try a CURL request to the host/port. From the server README it looks like the endpoint is /completion and I think the OpenAILLM uses /v1/completion.

As you are running Mistral, it may also be worth trying the MistralAILLM class rather than OpenAILLM.

If you can post some code to replicate the issue, I'd be happy to try it here.

1 Like