🏆 Build a Tiny Aura Agent Community Challenge

Hi @Ari_Neo4j & team, here’s my submission :slight_smile:
PagerDruid — Blameless On-Call Graph Agent for Microservices (Neo4j Aura Agent)

PagerDruid is a sarcastic-but-blameless on-call assistant that diagnoses incidents using graph reasoning over a microservices dependency graph and evidence retrieval from runbooks + incident history.

:white_check_mark: Data note: The schema is purpose-built and the dataset is synthetic, created to safely demonstrate realistic on-call triage patterns (services, dependencies, ownership, runbooks, incidents).

It answers questions like:

  • “Checkout is timing out with 502s — what should I check first and who should I page?”
  • “If redis-cache is unhealthy, what’s the blast radius within 2 hops?”
  • “What’s the shortest dependency path from checkout to core-db?”

:brain: What it does

PagerDruid combines:

  1. Keyword retrieval (RAG without embeddings)
    • Full-text search over runbooks, incidents, and services
  2. Graph reasoning
    • Dependency paths (DEPENDS_ON)
    • Blast radius within N hops (upstream + downstream)
    • Ownership routing (OWNED_BY)
  3. Actionability first
    • “First checks” from runbooks
    • “Who to page” with on-call contacts
    • Similar historical incidents + what resolved them
    • Evidence list (service names, runbook IDs, incident IDs)

:spider_web: Graph schema (core model)

Nodes

  • Service(name, tier, description)
  • Team(name, timezone, contact)
  • Runbook(id, title, tags, steps, serviceName)
  • Incident(id, time, severity, summary, resolution)

Relationships

  • (Service)-[:DEPENDS_ON]->(Service)
  • (Service)-[:OWNED_BY]->(Team)
  • (Runbook)-[:FOR_SERVICE]->(Service)
  • (Incident)-[:AFFECTED]->(Service)
  • (Incident)-[:RESOLVED_BY]->(Runbook)

:camera: Screenshot 1 — Schema / model diagram

Description: a screenshot from Data Importer model view (preferred) or a clear schema graph showing labels + relationships.


:wrench: Agent tools (deterministic + safe)

PagerDruid uses Cypher Template tools (no hallucinated facts) to retrieve and reason deterministically:

Retrieval (full-text)

  • SearchServices → searches service_ft
  • SearchRunbooks → searches runbook_ft
  • SearchIncidents → searches incident_ft

Graph reasoning

  • ImpactRadius → upstream + downstream within 2 hops (+ owners)
  • ShortestDependencyPath → path between services (+ owners per node)
  • OwnersToPage → maps service(s) to owning team + contact
  • RecentIncidentsForService → last incidents for a service

Detail extraction

  • RunbookSteps → returns runbook steps for top 1–2 matches

:camera: Screenshot 2 — Tools configured in Aura Agent

screenshot from Aura Agent builder showing the tool list (names visible).


:magnifying_glass_tilted_left: Retrieval layer (no embeddings)

Embeddings were intentionally skipped to reduce setup friction and keep the solution highly reproducible.

Instead, PagerDruid uses Neo4j full-text indexes:

  • runbook_ft over Runbook.title, Runbook.tags, Runbook.steps
  • incident_ft over Incident.summary, Incident.resolution
  • service_ft over Service.name, Service.description

:camera: Screenshot 3 — Indexes / constraints proof

Desc: output of SHOW INDEXES; showing runbook_ft, incident_ft, service_ft (and any uniqueness constraints).

:receipt: Graph proof (visual reasoning)

PagerDruid’s “graph brain” is visible and testable through direct graph queries:

1) Shortest path (visual)

Example shortest path query (rendered as a graph):

MATCH (a:Service {name:"checkout"})
MATCH (b:Service {name:"payment"})
MATCH p = shortestPath((a)-[:DEPENDS_ON*..10]->(b))
RETURN p;

:camera: Screenshot 4 — Shortest dependency path (Graph view)

Desc: screenshot of the Graph view showing the path nodes and DEPENDS_ON arrows.

:clapper_board: Demo

Prompt

Checkout is timing out with 502s — what should I check first and who should I page?

What PagerDruid returns (always)

  • Most likely culprit services (with reasons)

  • Blast radius (upstream + downstream within 2 hops)

  • Dependency chain (shortest path to top suspect)

  • First 5 checks (from runbooks; each ends with [RB-xxx])

  • Who to page (service → team → contact)

  • Similar historical incidents + what fixed them

  • Evidence used (IDs + service names)

:camera: Screenshot 6 — Agent response (with evidence)

Desc: screenshot of the agent response showing: culprits + blast radius + runbook checks + paging + incidents + evidence list.

Toolchain used in this demo:
SearchRunbooks + SearchIncidents + ImpactRadius + OwnersToPage + RunbookSteps (+ ShortestDependencyPath when needed)


Why this is useful (and fun)

PagerDruid turns a dependency graph into a decision engine:

  • Faster triage: “Which team do I page?”

  • Safer diagnosis: answers are evidence-backed (runbooks + incidents)

  • Better communication: blast radius + dependency path makes impact obvious

And yes — it stays blameless. The systems misbehaved. Humans are innocent. :grinning_face_with_smiling_eyes:

2 Likes