Help with Building Knowledge Graph for Unstructured Date using Neo4j API

Hi,

Im trying to use the Neo4J API to generate Knowledge Graph put of Unstructured Data. I tried to follow the tutorial and the example listed in the developer guide . Im using the SimpleKGBuilder as the following:

pipeline = SimpleKGPipeline(

driver=driver,

llm=llm,

prompt_template=ERExtractionTemplate(system_instructions=system_instr),

schema={

"node_types": entities,

"relationship_types": relations,

"patterns": patterns,

"additional_node_types": False

},

from_pdf=False,

embedder=embedder)

For the entities properties , I did not define any as I would like to get all the properties listed with a given entity. I know this can be done because I was able to do it through Langchain neo4j wrapper when defining the LLMGraphTransformer as the following:

llm_transformer = LLMGraphTransformer(

llm=llm,

Example node def: [{'label': 'EQUIPMENT', 'description': '...', 'additional_properties': True}

allowed_nodes=allowed_nodes,

allowed_relationships=allowed_relationships,

node_properties=True, #<=== Captures all properties

strict_mode=True, # Set to True if you want ONLY these types

additional_instructions=additional_instr

)

Another thing that I've noticed when using langchain is that my entity resolution works much better and my KG looks more connected than what I get with the SimpleKGPipeline where I get more isolated clusters.

What am I missing ? Can someone point me in the right direction please.
Thanks

For Text (from_pdf=False) or PDF (from_pdf=True), I tried that and it worked with or without schema
if SCHEMA is None:
kg_builder = SimpleKGPipeline(
llm=llm,
driver=NEO4J_DRIVER,
embedder=EMBEDDINGS,
from_pdf=True,// False if text
neo4j_database=DATABASE,
)
else:
kg_builder = SimpleKGPipeline(
llm=llm,
driver=NEO4J_DRIVER,
embedder=EMBEDDINGS,
schema=SCHEMA,
from_pdf=True,// False if text
neo4j_database=DATABASE,
)
return await kg_builder.run_async(text=content) // Text
OR
return await kg_builder.run_async(file_path=file_path) // PDF

1 Like

For the entity resolution part, we have a couple of resolvers that you can try out. The SimpleKGPipeline uses the default one, which is based on merging nodes with the same label and exactly the same name property. Unfortunately, you cannot so far customise the resolver component from SimpleKGPipeline. You can instead skip it when you run the pipeline, then run the more advanced Resolver components afterwards:
pipeline = SimpleKGPipeline(
# ...
perform_entity_resolution=False,
# ... )
Then you can test the different resolvers separately once the KG is written to the database:
# run fuzzy match for entity resolution
# resolver = FuzzyMatchResolver(driver)
# run semantic match for entity resolution
resolver = SpaCySemanticMatchResolver(driver)
res = await resolver.run()
If needed, you can also configure similarity_threshold (for the advanced resolvers), the resolve_properties (the list of properties to consider for the resolution), and filter_query to run the resolution on a specific part of the graph.

1 Like