Inspired by many Retrieval-Augmented Generation (RAG) techniques available these days, I thought of chatting with a Process P&ID using LLM (like GPT-3.5).
Due to the limitation of Vector Similarity search process “Chat-with-your-PDF” RAG approach (Vector DB) is not robust enough to handle this use-case.
Process & Instrumentation Diagrams (P&IDs) have relationships and dependencies (between Process/ Piping and Instrumentation). Knowledge Graphs (like Neo4j) arguably is a better approach as they represent data in a more structured way.
Knowledge Graph of a P&ID
Using Neo4j , I developed a typical Evaporator P&ID Knowledge Graph (see below)
Knowledge Graph Representation of a P&ID
In above Process Equipment/ pumps/ instrumentation and controllers are represented as Nodes while connection between these entities are represented as Relationships.
Querying the Graph
I used LangChain Cypher QA chain to explore and query the above Knowledge Graph with OpenAI API Key. (Cool thing is that I didn’t need to write Cypher Queries!)
My queries were simpler as below:
- What is the relation between Evaporator and Discharge Pump?
- What is the relation between LIT and LIC?
- What is the Evaporator type?
Query Results
{'query': 'What is the relation between Vessel and P001?',
'result': 'The relation between Vessel and P001 is that P001 is a Level Control Pump with a maximum flow of 1000 m3/h, operating at a pressure of 10 bar, and having a differential head of 90 mwc.'}
{'query': 'What is the relation between LIT and LIC?',
'result': 'The Level Transmitter (LIT) provides readings of the liquid level, while the Level Controller (LIC) uses these readings to adjust its outputs accordingly. The LIT measures the level within a range of 15 meters, and the LIC operates within a range of 4-20 mA, with specific output values corresponding to different levels of liquid.'}
{'query': 'What is the relation between LY001 and LCV001?',
'result': 'LY001 converts current into pneumatic and LCV001 is a Level Control Valve. The relation between LY001 and LCV001 is that LY001 is responsible for converting current into pneumatic, which is then used by LCV001 for its operation.'}
{'query': 'What is the Evaporator type?',
'result': 'The Evaporator type is Direct.'}
Knowledge Graph Agent
I used following generation prompt for LLM to generate for particular questions:
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
Examples: Here are a few examples of generated Cypher statements for particular questions:
What is the relation between LIT and LIC?
MATCH (l:LIT)-[:GIVES_INPUT]->(i:LIC)
RETURN l, I
The question is:
{question}”””
CYPHER_GENERATION_PROMPT = PromptTemplate(
input_variables=[“schema”, “question”], template=CYPHER_GENERATION_TEMPLATE
)
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
cypher_prompt=CYPHER_GENERATION_PROMPT,
)
Knowledge Graph Agent Query Results
Following are few of the prompts to the Agent with results. Results were more or less same as before. (I think I need to learn good Prompt Engineering. )
{'query': 'What is the relation between LY and LCV?',
'result': 'LY converts current into pneumatic and is responsible for controlling the position, while LCV is a level control valve that regulates the output percentage based on the current received from LT.'}
{'query': 'Explain what job LIC001 is performing?',
'result': 'LIC001 is performing the job of a Level Controller. Its role is to control the level of a certain parameter, and it operates within a range of 4-20 mA. It has different output values depending on the level, such as 4 mA when the level is at the low alarm level (LAL), 8 mA when the level is at 2.5 m, 12 mA when the level is at 5 m, 16 mA when the level is at 7.5 m, and 20 mA when the level is greater than 8 m.'}
Conclusion
- Above only scratches the surface! My queries to the model were very simple and do not demonstrate the real power of Knowledge Graphs and its combination with LLMs. However, just imagine that a similar trained chatbot (having Process/ Piping/ Instrumentation data carefully embedded in Nodes/ Relations Labels & properties) is made available to participants during Safety Workshops (HAZOP/ SIL/ LOPA etc.) and Design Reviews. Participants may query the chatbot with their innovative queries like what if pump fails, how many PID controllers are there in the system and what is their set point etc and chatbot respond with thoughtful insight in the document being reviewed.
- I didn’t include the results where model hallucinated . I feel model hallucination can be tackled after employing LLM few-shot learning techniques. (Work in progress!)
- There could be other better techniques for this use-case like OCR/ RAG using PDF file etc. However handling tabular data and inability to read relations between entities in a pdf may not give the desired results (my opinion only ).