Graph Enhancement Feature and Custom LLM's for generating knowledge graphs

Hi. I’m reaching out to get clarity on three issues:

  1. Model Compatibility: The Graphbuilder app primarily utilizes OpenAI models and Diffbot. Can other models be used as well? Our knowledge base consists of medical guidelines, drug formulary, and other biomedical information. Specialized models, such as SciSpacy (scispacy | SpaCy models for biomedical text processing), which are trained on biomedical data, could potentially do a better job recognizing and linking medical entities in our knowledge base. Understanding if using other specialized models is possible would significantly improve the quality of our results.
  2. Starting with Existing Biomedical Knowledge Graphs: We are considering starting with existing biomedical knowledge graphs, such as RTX-KG2 (GitHub - RTXteam/RTX-KG2: Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)), instead of building a graph from our knowledge base. After reviewing the information on the Neo4J Graphbuilder app, it is unclear how starting with an existing knowledge graph integrates with the app, as it seems to expect to begin from a knowledge base starting point. Could you provide clarity on how we could use an existing knowledge graph with the Graphbuilder app?
  3. Graph Enhancement Feature: While testing the Graphbuilder app, I noted a Graph Enhancement feature allowing me to select a predefined schema. When I selected the healthcare schema and the associated Node Labels and Relationship Types, the accuracy of the resultant knowledge graph improved significantly, and the accuracy of the results approached nearly 100%. I would like to understand more clearly how this feature works to build an efficient RAG system. Also, is there a way to programmatically access this feature using Langchain?

Hi Victor,

  1. yes other models can be used, if you run the graph builder yourself via Docker (we didn't put all the models on the publicly hosted version, because of incurred cost). Including all models that support an OpenAI API
    see: https://neo4j.com/labs/genai-ecosystem/llm-graph-builder-features/#_llm_models
    and Documentation for local deployments - Neo4j Labs

  2. The graph builder currently works by adding data to an existing database, so if your KG already contains entities and relationships, newly added entities can be combined/merged if they have the same label + id as in the original graph. Or you could do that with manual post-processing (merging) we're also launching a new feature that will allow merging entities with the same type and similar embeddings + text distance soon.

While the KG builder uses specific retrievers for the Q&A you could adjust them to integrate your KG more or build your own chatbot on your KG, by using something like NeoConverse or follow our GraphAcademy Courses for building KG chatbots

  1. The "graph schema" feature is actually pretty straightforward - we pass the schema information as part of the extraction prompt to the LLM, and guide the structured output to follow it, it uses our integration in LangChain to achieve that, which is also open source:

see: Constructing knowledge graphs | 🦜️🔗 LangChain

Hi Micheal

Thank you so much for your detailed and helpful response—it’s greatly appreciated!

I think it’s great that we can use other models with the Graphbuilder app, especially via Docker for local deployment. Would be great to know when support would be added for openly available models e.g. llama3.1. (to mitigate costs).

Your explanation regarding integrating our existing biomedical knowledge graph and the upcoming feature for merging entities with the same type and similar embeddings is particularly exciting. This will definitely help us streamline our process and improve the accuracy of our knowledge graph.

I also appreciate the insight into how the "graph schema" feature works and its integration with LangChain. We’ll certainly be looking into NeoConverse and the GraphAcademy courses.

Again, thank you for your guidance. I’m hopeful that with these tools and resources, we’ll be able to significantly enhance our clinical decision support systems.