Hi @Ari_Neo4j & team, here’s my submission ![]()
PagerDruid — Blameless On-Call Graph Agent for Microservices (Neo4j Aura Agent)
PagerDruid is a sarcastic-but-blameless on-call assistant that diagnoses incidents using graph reasoning over a microservices dependency graph and evidence retrieval from runbooks + incident history.
Data note: The schema is purpose-built and the dataset is synthetic, created to safely demonstrate realistic on-call triage patterns (services, dependencies, ownership, runbooks, incidents).
It answers questions like:
- “Checkout is timing out with 502s — what should I check first and who should I page?”
- “If redis-cache is unhealthy, what’s the blast radius within 2 hops?”
- “What’s the shortest dependency path from checkout to core-db?”
What it does
PagerDruid combines:
- Keyword retrieval (RAG without embeddings)
- Full-text search over runbooks, incidents, and services
- Graph reasoning
- Dependency paths (
DEPENDS_ON) - Blast radius within N hops (upstream + downstream)
- Ownership routing (
OWNED_BY)
- Dependency paths (
- Actionability first
- “First checks” from runbooks
- “Who to page” with on-call contacts
- Similar historical incidents + what resolved them
- Evidence list (service names, runbook IDs, incident IDs)
Graph schema (core model)
Nodes
Service(name, tier, description)Team(name, timezone, contact)Runbook(id, title, tags, steps, serviceName)Incident(id, time, severity, summary, resolution)
Relationships
(Service)-[:DEPENDS_ON]->(Service)(Service)-[:OWNED_BY]->(Team)(Runbook)-[:FOR_SERVICE]->(Service)(Incident)-[:AFFECTED]->(Service)(Incident)-[:RESOLVED_BY]->(Runbook)
Screenshot 1 — Schema / model diagram
Description: a screenshot from Data Importer model view (preferred) or a clear schema graph showing labels + relationships.
Agent tools (deterministic + safe)
PagerDruid uses Cypher Template tools (no hallucinated facts) to retrieve and reason deterministically:
Retrieval (full-text)
SearchServices→ searchesservice_ftSearchRunbooks→ searchesrunbook_ftSearchIncidents→ searchesincident_ft
Graph reasoning
ImpactRadius→ upstream + downstream within 2 hops (+ owners)ShortestDependencyPath→ path between services (+ owners per node)OwnersToPage→ maps service(s) to owning team + contactRecentIncidentsForService→ last incidents for a service
Detail extraction
RunbookSteps→ returns runbook steps for top 1–2 matches
Screenshot 2 — Tools configured in Aura Agent
screenshot from Aura Agent builder showing the tool list (names visible).
Retrieval layer (no embeddings)
Embeddings were intentionally skipped to reduce setup friction and keep the solution highly reproducible.
Instead, PagerDruid uses Neo4j full-text indexes:
runbook_ftoverRunbook.title,Runbook.tags,Runbook.stepsincident_ftoverIncident.summary,Incident.resolutionservice_ftoverService.name,Service.description
Screenshot 3 — Indexes / constraints proof
Desc: output of SHOW INDEXES; showing runbook_ft, incident_ft, service_ft (and any uniqueness constraints).
Graph proof (visual reasoning)
PagerDruid’s “graph brain” is visible and testable through direct graph queries:
1) Shortest path (visual)
Example shortest path query (rendered as a graph):
MATCH (a:Service {name:"checkout"})
MATCH (b:Service {name:"payment"})
MATCH p = shortestPath((a)-[:DEPENDS_ON*..10]->(b))
RETURN p;
Screenshot 4 — Shortest dependency path (Graph view)
Desc: screenshot of the Graph view showing the path nodes and DEPENDS_ON arrows.
Demo
Prompt
Checkout is timing out with 502s — what should I check first and who should I page?
What PagerDruid returns (always)
-
Most likely culprit services (with reasons)
-
Blast radius (upstream + downstream within 2 hops)
-
Dependency chain (shortest path to top suspect)
-
First 5 checks (from runbooks; each ends with
[RB-xxx]) -
Who to page (service → team → contact)
-
Similar historical incidents + what fixed them
-
Evidence used (IDs + service names)
Screenshot 6 — Agent response (with evidence)
Desc: screenshot of the agent response showing: culprits + blast radius + runbook checks + paging + incidents + evidence list.
Toolchain used in this demo:
SearchRunbooks + SearchIncidents + ImpactRadius + OwnersToPage + RunbookSteps (+ ShortestDependencyPath when needed)
Why this is useful (and fun)
PagerDruid turns a dependency graph into a decision engine:
-
Faster triage: “Which team do I page?”
-
Safer diagnosis: answers are evidence-backed (runbooks + incidents)
-
Better communication: blast radius + dependency path makes impact obvious
And yes — it stays blameless. The systems misbehaved. Humans are innocent. ![]()




