Vector Search with Traverse Functionality using Python Libraries

:raised_hand_with_fingers_splayed:Hello Neo4j Community,

I am learning the basics and looking forward to your expertise.

Working on a project, I am leveraging RAG with LangChain to query nodes in Neo4j.

Please see the python codes at the end of this message.

Here’s the scenario I am dealing with:

I have a Node B (Income Statement) with a text property and an embedding property populated. Using LangChain in python script, I can easily query this node and retrieve relevant information.

Now, I have added Node A, which is connected to Node B. Node A has properties, such as a year property with a value (e.g., 2000). This node doesn't have embedding property.

My goal is to ask a question that queries both Node B and Node A, retrieving relevant information, including the properties of both nodes (e.g., the value of year from Node A and text-related content from Node B).

What kind of libraries or LangChain functions should I use to achieve this functionality effectively? Specifically:

How can I ensure that queries traverse the connection between Node A and Node B using Python scripts?

How can I retrieve both embeddings-based content from Node B and property-based data from Node A in the same query?

Any guidance or best practices would be greatly appreciated. Thank you!

Hossein J.

My python script at this time looks like this:

import dotenv
import os
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI

Load environment variables

dotenv.load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")

Function to initialize Neo4j vector index

def get_vector_index():
return Neo4jVector.from_existing_graph(
OpenAIEmbeddings(),
url=os.getenv('NEO4J_URI'),
username=os.getenv('NEO4J_USERNAME'),
password=os.getenv('NEO4J_PASSWORD'),
index_name='Vi',
node_label="Income Statement",
text_node_properties=['title', 'drafting', 'engineering_consulting', 'total_revenue'],
embedding_node_property='embedding'
)

Create vector index retriever

vector_index = get_vector_index()

Initialize the RAG QA chain

vector_qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
chain_type="stuff",
retriever=vector_index.as_retriever()
)

Manually input your query

query = input("Enter your question here: ")

Run the query and get the result

try:
response = vector_qa.invoke({"query": query})
print("Query Result:")
print(response['result'])
except Exception as e:
print(f"An error occurred: {e}")

I spent a bit more and realized that I can use CypherQAChain.

It does what I was initially looking for, except the response is chopped, rather than a human-like response.

I can imagine the chopped replies can be fed into llm as context and receive a human-like reply.

I am hoping that maybe there is another easier way to retrieve information from nodes that do not have embedding property.

Any hints would be appreciated.

Thank you.
Hossein

As I am learning more. I can see that there are various ways to do the search.
They are Vector, Graph and FullText.
The next step is to figure out, how to put the results of 3 searches together and generate a human like response.

I found more related information to my original question at:
GraphRAG Github

Let me explain more:
I did let Knowledge Graph Builder to do the chunking and embedding.
Then I wanted to put together a python script to search through the nodes.
Above link provides directions.
When you visit the Readme file of that site, scroll almost to the end. Find the heading of " Performing a Similarity Search"
That provided ultimate answer I was looking for.
Now I still need more improvement like inserting some Prompt Engineering so the query doesn't return information from external sources and Chat Memory features.

Happy coding,
Hossein

Ok, more lessons learned:
It seems that with GraphRag, I have to use OpenAILLM.

That is all right - but how do I utilize prompts? After some tryouts, I realized that the answer is: RagTemplate()

The next question is how to keep the chat history? With ChatOpenAI, it is relatively easy. But with GraphRAG, I have to use OpenAILLM. I have to read more and find out how to implement chat history with this wrapper(OpenAILLM)?