πŸ† Start Here: Register & Get Aura Credits: Aura Agent Hackathon

Earnings Call Graph Analyst

Agent name

Earnings Call Graph Analyst
GitHub repository: GitHub - mu-jeong/earnings-call-graph Β· GitHub

What it does

Earnings Call Graph Analyst is a source-grounded financial and industry analysis agent for public earnings-call materials. It converts earnings-call transcripts and company-published earnings materials into a Neo4j graph of companies, source documents, chunks, entities, relation facts, and ontology concepts.

The current demo is intentionally specialized for AI infrastructure research. It helps users compare how companies describe demand, product exposure, revenue momentum, storage/networking/compute capacity, and related business outcomes across the loaded FY2026 Q2 graph.

Instead of only retrieving transcript snippets, the agent answers with graph reasoning paths such as:


AI Demand --DRIVES--> Cloud growth

Blackwell Ultra --DRIVES--> AI Demand

AI Demand --DRIVES--> Custom silicon

AI Infrastructure Solutions --DRIVES--> Product Revenue

These paths make the answer explainable: the user can see which source entity, relationship, and target entity support each conclusion, then inspect the supporting chunk or source document.

The agent is designed for analyst-style questions such as:

  • Which companies are currently loaded in the graph?

  • What positive AI infrastructure demand signals are companies reporting?

  • Which company is directly connected to a specific AI infrastructure product category?

  • What does the graph show about NVIDIA, Blackwell, and data center revenue growth?

  • What are the most connected entities, relation paths, and source-backed evidence chunks in the graph?

Current dataset and demo scope

The current demo focuses on a curated FY2026 Q2 set of AI infrastructure-related earnings-call materials from:

  • Cisco (CSCO)

  • Marvell Technology (MRVL)

  • Microsoft (MSFT)

  • NetApp (NTAP)

  • NVIDIA (NVDA)

  • Seagate Technology (STX)

  • Super Micro Computer (SMCI)

This is a demo dataset, not a hard-coded product limit. The app reads companies from source manifests and from loaded Neo4j Company nodes, so additional companies can be added by providing new Markdown files or private source-manifest entries and regenerating/loading the graph.

Why a graph fits

Earnings calls are narrative-heavy and inconsistent across companies. The same industry theme can appear under different company-specific language: Blackwell, custom silicon, Microsoft Cloud, data-center switching, enterprise storage, AI servers, cloud AI growth, or AI infrastructure revenue.

The graph preserves those company-specific terms as entities while connecting them to structured relation facts and ontology concepts. This lets the agent compare companies without flattening away the original evidence.

A simplified graph pattern is:


Company -> EarningsCall -> FiscalPeriod

SourceDocument -> EarningsCall

SourceDocument -> MarkdownChunk

MarkdownChunk <- SUPPORTED_BY - RelationFact

RelationFact -> FROM_ENTITY -> Entity

RelationFact -> TO_ENTITY -> Entity

Entity -> OntologyConcept

This structure lets the agent explain why something is true, not just return a keyword match.

Document parsing and chunking

Source materials can be provided in two ways:

  1. Place normalized local Markdown transcripts under data/source_cache/markdown/.

  2. Provide official source URLs in a private manifest and let the pipeline materialize the source text.

The source materializer supports PDF, DOCX, and HTML earnings-call materials. It downloads official source material, extracts text, normalizes paragraphs, and writes cached Markdown for repeatable local ingest.

During ingest, each Markdown document is split into paragraph-scoped MarkdownChunk nodes. The parser tracks headings and speaker-style lines, removes page markers and source boilerplate, and groups non-empty lines into paragraphs. Long paragraphs are split at sentence boundaries with a target maximum of 900 characters, so each chunk remains small enough for source-grounded extraction while preserving local context.

Each chunk receives a stable document-scoped id such as company-quarter-document-chunk-001, along with metadata including heading, chunk type, source line range, text hash, and document id. The chunk type can distinguish prepared remarks, Q&A sections, analyst questions, management answers, document overviews, and general speaker statements when the source exposes that structure.

Graph extraction is chunk-scoped. The pipeline first builds a document-level ontology or canonical vocabulary from the full Markdown document, then asks Gemini to extract entities and relations from each paragraph chunk independently, optionally in batches. Relations are accepted only when supported by the current chunk, and each RelationFact is linked back to its supporting MarkdownChunk. This is what lets the agent show graph paths and source evidence instead of unsupported transcript summaries.

Streamlit application

The web application is implemented with Streamlit and has four main tabs:

  1. Graph - Main interactive graph exploration surface for the loaded earnings-call corpus. Users can switch between company-level entity paths and ontology-grouped concept views, filter by search term, company, ontology concept, and minimum node connection count, and inspect visible source-backed relation paths. The tab also provides an LLM-generated graph overview that summarizes key points and company differences from connected referenced chunks.

  2. Ask - Deterministic graph question answering. Ask matches the question against entity names, relation types, evidence snippets, entity properties, and ontology concepts, then builds an answer from matched RelationFact -> MarkdownChunk evidence. The LLM summary from referenced chunks action appears directly under the Question input; when run, the matched referenced chunks become the LLM input for a source-grounded synthesis. Matched relations are shown as cards, with Support / upside and Risk / pressure separated into two columns, and ontology mappings shown on their own Ontology: line inside each card.

  3. Ask (Aura) - Local tester for a Neo4j Aura Agent-like graph-tool workflow. Neo4j Aura supports agent-style graph tools; this project recreates a similar router -> tool execution -> answer flow in Streamlit with a LangGraph-style orchestration pattern. The user enters only a question, the router selects the tool, and the answer emphasizes Graph reasoning path and Referenced chunk instead of separate Evidence, Positive signal, or Why it matters columns.

  4. Key Nodes - High-signal entity exploration. Key Nodes ranks important entities from the full loaded graph and shows each selected node's source-backed relation table plus referenced transcript chunks. It also provides an LLM summary action that uses those connected referenced chunks as input, so users can get a synthesized explanation of the selected node's evidence without writing Cypher.

Agent tools

The project now supports an Aura-style tool workflow in two places:

  • AuraDB demo configuration: the actual AuraDB agent tool setup and screenshots are preserved in docs/auraDB/.

  • Local Streamlit tester: the Ask (Aura) tab mirrors the same tool behavior so the workflows can be tested locally in the web app.

The local tester works like an agent: the user enters only a question, a router chooses the most appropriate tool, and the chosen tool runs with the normalized question. The Ask (Aura) tab does not expose a manual tool override control; users only enter a question, and the router handles tool selection automatically.

Tool type split

  • loaded_company_universe and frequent_entities are fixed Cypher-template tools.

  • ai_positive_demand_by_company, ai_risks_constraints_by_company, company_ai_deep_dive, and product_category_evidence_map are Text2Cypher tools.

For Text2Cypher tools, the app generates or selects a tool-specific read-only Cypher query, validates it, runs it against Neo4j, and sends the returned rows into a tool-specific answer prompt or deterministic local renderer. The answer format follows the AuraDB examples at a high level: executive summary, source-backed table, and graph reasoning or cross-company takeaway. Aura tool answers intentionally avoid separate evidence-gap/caveat sections so the visible output stays focused on graph reasoning paths and referenced chunks.

In the current local implementation, Ask (Aura) is intentionally positioned as a Neo4j Aura Agent-like workflow that can be exercised before recreating the tools in Aura. Neo4j Aura's agent-style graph-tool behavior is mimicked in Streamlit with a LangGraph-style orchestration pattern: a router chooses the appropriate tool from the user question, the selected tool executes either a fixed Cypher template or constrained read-only Text2Cypher query, and the answer writer renders a user-facing response. This mirrors the Aura tool workflow while keeping internal Cypher diagnostics hidden by default. For Text2Cypher answers, the visible table focuses on Company when applicable, Graph reasoning path, and Referenced chunk; redundant generated columns such as Positive signal, Evidence, and Why it matters are stripped so the user sees the source chunk supporting the graph path.

The Text2Cypher path includes guardrails:

  • generated Cypher must be read-only,

  • generated Cypher must include RETURN and LIMIT,

  • write/admin/procedure clauses are rejected,

  • overly large limits are capped to the available graph size,

  • cross-company tools rebalance rows by company when a first pass over-focuses on one company,

  • and Neo4j syntax errors can trigger an automatic repair pass.

1. loaded_company_universe

Description:


List companies currently loaded in the graph before cross-company analysis.

What it does:

  • Returns the graph's currently loaded company universe.

  • Helps the agent avoid guessing which companies are available.

  • Establishes the scope for follow-up cross-company analysis.

Example question:


Which companies are currently loaded in the earnings-call graph?

2. ai_positive_demand_by_company

Description:


Use Text2Cypher to find company-by-company positive demand signals for AI infrastructure.

What it does:

  • Compares demand signals across the loaded company set.

  • Prioritizes graph relationships that connect AI infrastructure themes to growth, revenue, product demand, orders, adoption, capacity, or production signals.

  • Returns company, graph reasoning path, evidence, source entity, relation, target entity, and concept context so the answer remains auditable.

Example question:


Across the loaded earnings-call graph, what positive demand signals are companies reporting for the AI infrastructure industry?

3. ai_risks_constraints_by_company

Description:


Use Text2Cypher to find company-by-company risks, bottlenecks, or constraints for AI infrastructure growth.

What it does:

  • Compares negative or limiting AI infrastructure signals across the loaded company set.

  • Prioritizes source-backed relation paths connected to risk, constraints, supply, capacity, cost, margin, or regulatory pressure.

  • Keeps risk/constraint answers grounded in referenced chunks rather than unsupported market commentary.

Example question:


Across the loaded earnings-call graph, what risks or constraints are companies reporting for AI infrastructure growth?

4. company_ai_deep_dive

Description:


Use Text2Cypher to retrieve one company's AI, product, and data-center source chunks.

What it does:

  • Focuses the analysis on a single company.

  • Pulls together relevant AI infrastructure entities, relationships, and supporting transcript evidence for that company.

  • Supports follow-up questions about product families, demand drivers, and growth commentary.

Example question:


For NVIDIA, what does the earnings-call graph show about AI demand, Blackwell, and data center revenue growth?

5. product_category_evidence_map

Description:


Use Text2Cypher to map a requested product category to company chunks and graph paths.

What it does:

  • Starts from a product or category term and finds company-specific graph evidence connected to it.

  • Helps compare which companies are exposed to the same infrastructure category.

  • Preserves company-specific terms while mapping them back to broader product or ontology concepts.

Example question:


Across the loaded graph, summarize company evidence for the AI accelerator product category.

AuraDB screenshots / demo evidence

The following AuraDB-related materials are intentionally preserved even though the AuraDB agent configuration itself is external to the Python source tree. They document the actual AuraDB tool behavior and expected answer style.

Loaded company universe

This screenshot shows the agent identifying the companies currently available in the loaded earnings-call graph.

Positive AI infrastructure demand signals

This screenshot shows the agent summarizing positive demand signals across loaded companies with company-by-company evidence and graph reasoning paths.

Company deep dive

This screenshot shows the company-specific deep dive workflow for NVIDIA and AI/Blackwell-related evidence.

Product category evidence map

This screenshot shows the product-category workflow, mapping AI accelerator evidence across companies.

Web UI screenshots

The local Streamlit screenshots in docs/web/ document the user-facing workflow, not only the underlying graph data model. Together, they show how a user moves from graph exploration to source-grounded analysis without writing Cypher.

Graph exploration

This screenshot shows the main graph workspace: search/company/ontology filters, graph scope controls, result-limit controls, a graph overview generated from connected referenced chunks, and the interactive relation graph. It demonstrates that the application uses relationships as the analysis surface rather than treating the data as a flat document search index.

Ask: source-grounded answer workflow

This screenshot shows deterministic graph question answering. The LLM summary from referenced chunks action appears directly below the question and uses matched referenced chunks as its input. Below the summary, matched relations are split into Support / upside and Risk / pressure columns, with each card showing the graph path, a separate Ontology: line, evidence text, confidence, and chunk id. The lower section keeps matched evidence and referenced chunks inspectable.

Ask (Aura): routed graph-tool workflow

This screenshot shows the local Aura-style agent workflow. The user enters only a question; the app routes it to the best tool, runs the selected Cypher-template or Text2Cypher tool, and writes the answer from returned graph rows. The visible output emphasizes Graph reasoning path and Referenced chunk, mirroring the Aura tool workflow while keeping the analysis source-grounded.

Key Nodes: selected-node evidence summary

This screenshot shows high-signal node exploration. After selecting a key node, the UI displays source-backed relation rows, provides an LLM summary from referenced chunks action using the selected node's connected chunks, and keeps the underlying referenced chunks available for inspection.

What makes it useful

Earnings Call Graph Analyst is useful because earnings season creates a large volume of unstructured company commentary. Analysts often need to know not just what one company said, but how multiple companies across the AI infrastructure value chain describe the same demand environment.

The agent helps answer questions such as:

  • Is AI infrastructure demand broad-based or concentrated in a few companies?

  • Which layer of the stack is benefiting: cloud, networking, accelerators, storage, servers, or custom silicon?

  • Which statements are supported by explicit graph paths and source evidence?

  • Which company-specific product terms map to broader industry themes?

  • Where do companies differ in the way they describe demand, capacity, or product exposure?

Limitations and caveats

  • The current graph covers a curated company set, not the entire market.

  • The current demo scope is FY2026 Q2-oriented and should not be interpreted as a complete multi-period history.

  • The agent answers only from the loaded graph data and should not be treated as investment advice.

  • Extracted graph paths are only as complete as the loaded source materials and chunk-level extraction results.

  • Text2Cypher is validated and repaired, but generated queries can still miss relevant evidence or require reruns for broad questions.

  • The current evidence map shows the strongest extracted graph paths, not every transcript mention.

Final submission blurb

Earnings Call Graph Analyst turns public AI-infrastructure earnings-call materials into a Neo4j graph-backed analyst agent. It helps users ask source-grounded questions about AI demand, Blackwell, custom silicon, cloud growth, networking, storage, AI servers, and company-specific exposure across the AI infrastructure value chain. The graph matters because companies use different language for similar industry themes; the agent preserves company-specific terms while exposing explainable paths such as Blackwell Ultra --DRIVES--> AI Demand and AI Demand --DRIVES--> Cloud growth. This makes the result more than transcript search: it is a relationship-driven evidence map for understanding AI infrastructure trends during earnings season.

M-Pesa Insight Agent

M-Pesa is Kenya’s most widely used mobile money platform, generating statements filled with PayBill payments, Till purchases, transfers, airtime purchases, withdrawals, deposits, subscriptions, and recurring transactions. While the data exists, most users struggle to analyze long PDF or CSV statements to answer simple questions like:
β€œWhat am I spending most on?”, β€œWho receives most of my money?”, or β€œHow much goes to bills monthly?”

I built an AI-powered financial insight agent that transforms raw M-Pesa statements into a context-rich knowledge graph using Neo4j Aura.

The agent allows users to query their statements in natural language:

  • β€œHow much did I spend on bills last month?”

  • β€œWho do I send money to most?”

  • β€œShow my highest spending categories.”

  • β€œWhich transactions repeat frequently?”

Using Neo4j graphs, the agent connects users, merchants, transaction types, categories, and spending patterns to uncover meaningful financial insights instead of just displaying raw transactions.

Why Graphs?

Financial data is naturally connected. Every transaction links people, businesses, categories, dates, and behaviors. Neo4j makes it possible to reason over these relationships and identify patterns traditional tables cannot easily reveal.

Tools Used

  • Text2Cypher for converting natural language into Cypher queries

  • Vector Search for semantic understanding of transaction descriptions and merchant similarity

  • Neo4j Aura Knowledge Graph

This project turns unstructured M-Pesa statements into conversational financial intelligence, helping users understand their money without manually searching through PDFs or spreadsheets.

Hi everyone! The following community members should now have received their Aura Credits for participating in the Aura Agent Hackathon initiative:

@longo.analytics
@sadashivb
@100shubhamtathe
@abimael.carrasquillo
@ayushdeosingh
@vinod.nageshwar
@wanga.15065
@kumar20051020shivam
@jithinjoji05

We’re now down to just 18 days remaining for project submissions, and the creativity coming from this community has been seriously impressive. We’ve already reviewed some incredibly innovative agents and graph-powered AI projects, and we can’t wait to see what else everyone builds before the deadline.

Keep the submissions coming :rocket:

:headphone: VibeGraph AI

β€” The Audiophile's Acoustic Signal Linker

:rocket: Live Web App: vibe-graph-ai.vercel.app

:control_knobs: What it does:

VibeGraph AI is a graph-native recommendation engine built for audio enthusiasts. Instead of relying on mass-market collaborative filtering (suggesting what other people clicked), it uses pure digital signal processing and structural graph musicology to recommend tracks based on their actual physical sound waves.

If a user wants to test a new pair of IEMs or active bookshelf speakers with a specific gritty, raw track, VibeGraph AI finds other music that hits those same frequenciesβ€”even across entirely different genres. It does this by analyzing a 33-Dimensional Acoustic Fingerprint for every track:

  • Timbral Envelope (20 MFCCs): The raw texture of the instruments.

  • Harmonic Footprint (12 Chroma): The underlying pitch and key signatures.

  • Spectral Brightness (1 Centroid): The perceived treble energy.

:shield: The Zero-Hallucination Design:

The core of VibeGraph AI is its deterministic mathematical restraint.

  • If asked to find an Electronic match for a heavy Punk Rock track, the agent analyzes the pre-calculated acoustic neighbors.

  • If the physical math does not intersect with the requested genre, it refuses to hallucinate a fake recommendation. It respects the strict boundaries of the graph, explaining the anomaly to the user instead of lying to fulfill the prompt.

:gear: The Tools Powering the Agent:

  1. Pre-Computed Acoustic Topology: Instead of relying purely on expensive runtime vector searches, we pre-calculated the top 5 nearest mathematical neighbors for all 25,000 tracks. The agent natively traverses explicit (Track)-[:SOUNDS_LIKE {score}]->(Track) edges for lightning-fast recommendations.

  2. Semantic Vector Search: For completely novel queries, the agent can fall back to performing a native cosine similarity search across the 33-dimensional embedding matrix.

  3. Metadata Graph Explorer (Text2Cypher): Dynamically filters acoustic matches by traversing the graph topology along (Track)-[:IN_GENRE]->(Genre) pathways.

  4. Acoustic Landscape Analyzer: A Cypher Template that calculates structural clusters and track catalog densities on demand.

:bar_chart: Dataset and why a graph fits:

  • The Dataset: Free Music Archive (FMA) Medium β€” 25,000 tracks.

  • Why a graph is required: Acoustic similarity matching on its own is an isolated vector space challenge. However, music discovery is a multi-signal network challenge. A track's mathematical audio coordinates must intersect with historical eras, artist identifiers, and genre taxonomies.

By hardcoding the 33D DSP nearest-neighbor vectors into [:SOUNDS_LIKE] edges directly between Track nodes, the Aura Agent can run multi-hop queries that traverse boundaries to find hidden relational structures. A flat vector database cannot trace these relationships, and a standard SQL table cannot perform the vector math. The graph allows the agent to reason over both simultaneously.

:link: Links & Code:

:camera_with_flash: Screenshots of VibeGraph AI in Action:

DrugPath -> Biomedical Drug Reasoning Agent

Agent Name

DrugPath

What it does

DrugPath is an AI agent that navigates a biomedical knowledge graph to answer
questions about drugs, their mechanisms of action, interactions, and disease
connections. Unlike a drug database that returns flat yes/no answers, DrugPath
explains the biological mechanism behind every answer by traversing
multi-hop paths through the graph.

Three key capabilities:

1. Drug interaction checking with mechanism explanation
Instead of "warfarin + aspirin = dangerous," DrugPath explains: "These drugs
interact because warfarin inhibits VKORC1 (Vitamin K pathway) while aspirin
irreversibly blocks COX-1 - both affect different steps of the clotting cascade,
creating synergistic bleeding risk."
The answer traces the path: Drug β†’ shared
Gene targets β†’ Pathway β†’ clinical outcome.

2. Drug repurposing discovery
"What existing approved drugs could work for Alzheimer's?" - This requires a
3-hop traversal: find drugs β†’ whose molecular targets (Genes) β†’ are associated
with Alzheimer's Disease β†’ but the drug isn't currently indicated for it. No SQL
join can express this cleanly. In the graph it's one Cypher query.

3. Comprehensive drug profiles
Full pharmacological context for any drug: indications, molecular targets,
biological pathways, side effects - all pulled from graph relationships rather
than flat attributes.

Dataset and Why a Graph Fits

Dataset: Hetionet v1.0 (see my github for link, I have limitations here) (CC0 license)

  • 47,031 nodes of 11 types
  • 2,250,197 relationships of 24 types in the source; we load all nodes and the
    ~293k relationships across the ten metaedges the agent traverses (AuraDB Free
    caps at 400k relationships)
  • Integrates 29 public biomedical databases: DrugBank, OMIM, DisGeNET, Reactome,
    Gene Ontology, SIDER, and others

Why a graph is the only right structure for this data:

A relational database can tell you: "Metformin treats Type 2 Diabetes."

A knowledge graph can answer: "Why might Metformin work against cancer?"

(Metformin:Compound)
  -[:BINDS_GENE]-> (SLC22A1:Gene)
  -[:ASSOCIATES_WITH]-> (prostate cancer:Disease)

This is a real path returned by the live graph. It reveals a biologically
plausible repurposing hypothesis that is not in any flat table - it emerges from
the structure of the graph itself.

Node types:

Node Count Role
Compound 1,552 Drugs and chemical compounds
Gene 20,945 Molecular targets and enzymes
Disease 137 Therapeutic indications
Pathway 1,822 Biological pathways (mTOR, CYP3A4...)
SideEffect 5,734 Adverse effects
PharmacologicClass 345 Drug classes
Anatomy 402 Where diseases manifest

Key relationships: TREATS, PALLIATES, BINDS_GENE, DOWNREGULATES_GENE,
UPREGULATES_GENE, CAUSES_SIDE_EFFECT, PARTICIPATES_IN, ASSOCIATES_WITH,
LOCALIZES_TO, INCLUDES

Agent Tools

Tool Type What it enables
drug_interaction_checker Cypher Template Given two drug names, traverses shared gene targets and pathways to explain the interaction mechanism - not just flag it
drug_repurposing_explorer Cypher Template Given a disease, finds drugs approved for OTHER conditions whose molecular targets associate with this disease - pure graph reasoning
drug_profile_lookup Cypher Template Full pharmacological profile: targets, indications, side effects, pathways, drug class
find_similar_drugs Similarity Search Vector search on drug name + mechanism embeddings to find pharmacologically similar compounds

Example Interactions

User: "Can I take warfarin and aspirin together?"

DrugPath: "These two drugs share several molecular targets in the graph -
CYP2C9, CYP2C8, CYP2C19 (the enzymes that metabolise warfarin) and ALB
(plasma albumin) - and overlap in the arachidonic-acid metabolism pathway.
That means aspirin can compete for warfarin's metabolism and displace it from
albumin, raising free warfarin levels, while also acting on the clotting cascade
itself - a combination associated with increased bleeding risk. Path traversed:
Warfarin β†’ shared Genes (CYP2C9, ALB) ← Aspirin. This information is for
educational purposes only - always consult a healthcare professional.
"


User: "What approved drugs might work for Alzheimer's disease?"

DrugPath: "Here are compounds whose molecular targets associate with
Alzheimer's disease in the graph but which are not currently indicated for it:

  1. Dextromethorphan - via the nicotinic receptors CHRNA4 / CHRNA7 / CHRNB2
  2. Nicotine - via the cholinergic targets CHAT, CHRNA4, CHRNA7
  3. Amitriptyline - via the serotonergic targets HTR2A, HTR6, SLC6A4

The cholinergic and serotonergic targets are biologically on-point for
Alzheimer's. Path: Drug β†’ [BINDS_GENE] β†’ Gene β†’ [ASSOCIATES_WITH] β†’ Alzheimer's
disease. These are research hypotheses, not proven treatments. Educational
purposes only.
"

Screenshots

Graph schema (Neo4j Browser β†’ CALL db.schema.visualization()):

All 47,031 nodes by type (Neo4j Browser results overview):

Agent in action - full drug profile (Aura Console):

See github readme

Agent in action - similarity search (Aura Console):

See github readme

Aura dashboards (graph statistics and the compound Β· disease Β· gene network):

Also see github readme

Live Agent

Landing page: DrugPath β€” the drug agent that explains why

Source code: GitHub - qualv13/neo4j-agent: DrugPath - an AI agent over a biomedical knowledge graph (Hetionet on Neo4j Aura) that explains drug mechanisms, interactions and repurposing by traversing the graph. Neo4j Aura Agent Hackathon 2026. Β· GitHub

Published agent endpoint: [will be added after receiving credits]

What Makes DrugPath Different

1. The graph drives every answer, not just stores data.
Every response cites the traversal path: Compound β†’ Gene β†’ Pathway β†’ Disease.
Users see why the graph gives this answer, not just what it returns.

2. Drug repurposing is impossible without the graph.
Finding drugs whose targets associate with a disease they're not indicated for
requires crossing node types in a single query. This is where
graph databases outperform anything else.

3. 29 databases, one coherent graph.
Hetionet integrates DrugBank, OMIM, DisGeNET, Reactome, Gene Ontology, and SIDER
into a single traversable structure. The agent reasons across all of them
simultaneously.

4. Real educational value.
The agent may be genuinely useful for medical students, researchers, and anyone
trying to understand pharmacology beyond "take this pill for that disease."

Technical Stack

  • Graph database: Neo4j Aura (AuraDB)
  • Dataset: Hetionet v1.0 (CC0)
  • ETL: Python (neo4j driver, pandas)
  • Embeddings: OpenAI text-embedding-3-small, matched to Aura Agent's managed similarity search
  • Agent: Neo4j Aura Agent with 3 Cypher Templates + 1 Similarity Search tool
  • Data loading: ~47k nodes, ~293k relationships (filtered to the ten
    metaedges the agent traverses, within free limits)

:warning: DrugPath is an educational and research tool. It does not provide
medical advice. You should always consult a qualified healthcare professional for medical
decisions.

Agent Name: Geospatial Supply Chain Agent
What it does:

The Geospatial Supply Chain Intelligence Agent leverages a robust Supply Chain Knowledge Graph to deliver context-aware recommendations across suppliers, products and locations.
By connecting and analyzing relationships between suppliers, products and locations the agent enables to make faster and smarter supply chain decisions.
The agent helps to identify alternate suppliers based on product availability, pricing, and regional proximity.

The most challenging part in the supply chain is to search the nearest location for alternative suppliers. This agent solves the issues using geospatial co-ordinates to find the suppliers.

Key capabilities include:
Alternate supplier recommendation based on availability, location, and supplier relationships
Nearest supplier discovery using geospatial graph queries
Alternate suppliers for similar products

Dataset and why a graph fits:
Created a Supply Chain Dataset with 8000 rows having the below columns

Column Description
Supplier_ID Unique codes like SUP83421
Supplier_Name Tech distributor names (e.g. Nova Components, Apex Distributors)
Product 48 distinct computer spare types (RAM, SSDs, CPUs, GPUs, laptop parts, peripherals, etc.)
Description Detailed product specs per item
Price pricing per product category
City 30 Major Indian cities (Mumbai, Bengaluru, Delhi, Hyderabad, Chennai, etc.)
Latitude coordinates
Longitude coordinates

A graph database is an ideal fit for supply chain systems because supply chains are naturally built around interconnected relationships between suppliers, products and locations.
Supply Chains Are Highly Connected Networks
A single product may involve:
Multiple suppliers
Different transportation routes
Graph databases are designed specifically to manage and analyze these interconnected ecosystems efficiently.
Faster Alternate Supplier Discovery
When a supplier disruption occurs, graph queries can instantly identify:
Alternate suppliers
Nearby suppliers
Suppliers providing similar products
Suppliers connected through existing logistics routes
Instead of performing multiple complex table joins, the graph traverses relationships directly.
Geospatial and Nearest Supplier Intelligence
Graph databases can combine relationship intelligence with spatial data to:
Find the nearest supplier
Optimize delivery routes
Loaded the supply_chain_dataset.csv file from Github into Neo4j Aura using the below script:

LOAD CSV WITH HEADERS FROM
AS row
CALL(row) {
MERGE (sup:Supplier {supplierId: trim(row.Supplier_ID)}) SET sup.name=row.Supplier_Name
MERGE (prd:Product {name: trim(row.Product)}) SET prd.description=row.Description,prd.price=toFloat(row.Price)
MERGE (cy:City {name: trim(row.City)}) SET cy.location = point({latitude: toFloat(row.Latitude), longitude: toFloat(row.Longitude)})
MERGE (sup)-[:SUPPLIES]->(prd)
MERGE (sup)-[:LOCATED_IN]->(cy)
} IN TRANSACTIONS OF 1000 ROWS

Neo4j Schema:

Node Count:

Relationship Count:

Agents Used:

Type Name Description Parameter
Cypher Template Suppliers in City Finds all suppliers explicitly located in a specified city. cityName
Cypher Template Suppliers of Product Finds all suppliers that supply a specified product. productName
Cypher Template Alternate Suppliers for Product Finds suppliers for a specific product, optionally excluding a known supplier by name to find alternative options. productName,excludeSupplierName
Cypher Template Suppliers Nearest to City Finds suppliers located in cities geographically closest to a specified reference city. Returns the supplier name, the city they are located in, and the distance in kilometers. referenceCityName
Text2Cypher Natural Language to Cypher Tool A general-purpose, free-form Text-to-Cypher tool that converts natural language questions into Cypher queries, executes them, and returns the results. Use this option when no specialized tool (e.g. for aggregation or domain-specific logic) is more appropriate.
Cypher Template Alternate Suppliers for Similar Product Finds the alternate suppliers for the similar products supplied them supplierName

Demo of Agents in Action:
Finding suppliers in a specific City:
MATCH (s:Supplier)-[:LOCATED_IN]->(c:City {name: β€˜Bengaluru’})
RETURN s.name as SupplierName,
s.supplierId as SupplierId,
c.name as City,
c.location as Location
ORDER BY s.name

Finding the no of suppliers in a specific city and listing them

Finding the Suppliers of a Product & their location:

Finding alternate nearest suppliers to a supplier which has closed in a specific location:

MATCH (c1:City {name: 'Kochi'})
MATCH (c2:City)
WHERE c1 <> c2
WITH c1, c2, point.distance(point({latitude: c1.latitude, longitude: c1.longitude}), point({latitude: c2.latitude, longitude: c2.longitude})) / 1000 AS distanceKm
ORDER BY distanceKm ASC
MATCH (s:Supplier)-[:LOCATED_IN]->(c2)
RETURN s.name AS supplierName, c2.name AS cityLocatedIn, distanceKm

What are Top 5 costly products supplied by a Supplier

Which suppliers are very close to a specific location and minimum distance:

MATCH (targetCity:City {name: β€˜Bhubaneswar’})
WITH targetCity
MATCH (s:Supplier)-[:LOCATED_IN]->(c:City)
WHERE c <> targetCity
RETURN s.name AS supplierName, c.name AS cityLocatedIn,round(point.distance(c.location, targetCity.location)/1000) AS distanceKm ORDER BY distanceKm

Dashboard in Aura:

Link to Agent:

CPF Customer Agent (Powered by Neo4j & GraphRAG)

1. What It Does

The CPF Customer Agent is an advanced reasoning assistant built for customer service executives handling Central Provident Fund (CPF) inquiries. It uses GraphRAG to merge semantic vector search with structural graph relationships.

When an executive receives an inquiry, the agent simultaneously retrieves verified policy documents and crucially layers in the member's specific profile (e.g. age, account balance) and past question history. By executing multi-hop reasoning over this connected data, the agent provides customer service executives with hyper-personalised, accurate and factually grounded response in real-time.

For example, if a member asks: "How much do I need to top up to reach the FRS (Full Retirement Sum) in 2027?", the agent doesn't just return a generic definition of FRS. It automatically retrieves the member's current account balance via the graph, calculates their specific profile constraints, and produces a contextual answer.

2. Dataset

The agent utilises GraphRAG to structurally unify three distinct data dimensions into a single, cohesive knowledge network:

  • Customer Profiles (Structured): synthetically generated customer data containing attributes like customerId, gender, age, accountBalance and archetype
  • FAQ Knowledge Base (Unstructured): question-answer pairs extracted from CPF website. The knowledge base contains questionId, questionText, answerText, questionEmbedding (generated via gemini-embedding-001)
  • Interaction History (Relational): direct relationships tracking what customers have asked in the past

3. Why a Graph Fits

In public pension systems, policy answers are a dynamic function of a citizen's identity, not a static document. A graph addresses this by providing the ultimate contextual scaffolding:

  • Contextual Convergence Without Joins: A flat vector search can find a document about "CPF LIFE eligibility," but it cannot simultaneously evaluate if Customer X is eligible. GraphRAG allows the agent to pull the semantically relevant policy node and instantly traverse to the specific customer node in a single operation.

  • Hyper-Focused Grounding: GraphRAG extracts a precise, pre-filtered subgraph (exact policy criteria + customer profile) to inject into the LLM prompt, mitigating hallucinations

  • Temporal Awareness: By traversing the [:ASKED] relationships, the agent can instantly see if a customer's current question is a follow-up to a previous inquiry (e.g., a member asking about housing policy right after asking about retirement funds). This prevents the executive from repeating information and provides an immediate, continuous narrative of the member's journey.

4. Agent Tools

The agent pipeline is driven by an orchestrated suite of specialised tools that blend deterministic graph querying and semantic reasoning using three core tools:

  • Get Customer Profile (Cypher Template) - retrieves structured demographic variables

  • Get Customer Question History (Cypher Template) - traverses the [:ASKED] relationships for a specific Customer ID to return a chronological list of prior questions

  • Get semantically similar question (Similarity Search) - embeds the incoming inquiry using gemini-embedding-001 and executes a cosine similarity search against the knowledge base to find the closest verified CPF FAQ

5. Agent & Multi-Hop Reasoning in Action

  • Scenario A: Eligibility Check

    • Query: Customer X asks: 'Am I eligible to join CPF LIFE?'

    • Similarity Search Tool*:* The agent embeds the question text and searches the vector index. It returns the CPF LIFE eligibility node, which states that eligibility depends on an age threshold of 65 to 80 years old

    • Get Customer Profile Tool: The agent calls the Cypher template and fetches Customer X’s age property

    • Reasoning & Output: The agent combines these signals. Instead of a generic eligibility criterion, the agent generates a definitive, profile verified "Yes/No" eligibility draft for the executive

  • Scenario B: Dynamic Calculations

    • Query: Customer X asks: 'How much do I have to top up to hit FRS in 2027?'

    • Similarity Search Tool: The agent embeds the question text and searches the vector index. It returns the node detailing the 2027 FRS requirements

    • Get Customer Profile Tool: The agent calls the Cypher template for Customer ID X and pulls the customer's accountBalance

    • Reasoning & Output: The agent merges the policy and the member's current balance into a unified context window. The LLM mathematically processes this highly localised context window to generate a step-by-step calculation, explaining exactly how much the customer needs to top up based on their profile

6. What is the Impact?

  • From "Search" to "Reasoning": Most AI bots perform a vector lookup and dump a block of text into a prompt. The CPF Customer Agent reasons across a connected knowledge network, providing the LLM with the multi-dimensional context required for complex public policy
  • Enterprise-Grade Grounding: Public sector applications require absolute precision. By bounding the LLM strictly to factual subgraphs retrieved via Cypher templates, hallucinations are completely mitigated.
  • Operational Efficiency: By unifying policies, customer profiles, and conversation history into a single, cohesive graph canvas, we eliminate the need for customer service executives to swivel-chair between different CRM tabs and policy manuals, drastically slashing Average Handle Time (AHT) while driving up response accuracy.

https://github.com/winleexz/CPF-Customer-Agent

FraudGraph Sentinel

Agent Name
FraudGraph Sentinel

What It Does
FraudGraph Sentinel is a Neo4j Aura Agent for investigating synthetic cyber-fraud transaction patterns. It turns row-level fraud labels into a graph investigation workflow: the agent first identifies repeated fraudulent destination accounts, then profiles one suspicious destination account as a case brief with source accounts, transaction types, amounts, and risk indicators.

Instead of only answering "is this transaction fraud?", the agent explains connected fraud context: who sent money, which destination received repeated fraudulent activity, what transaction types were involved, where high-value paths appear, and which risk indicators explain the suspicious pattern.

Dataset And Why A Graph Fits
The project uses a synthetic financial transaction fraud dataset with 6,362,620 source rows. The AuraDB Free graph keeps all 8,213 fraud rows plus a deterministic 5,000-row non-fraud sample, so the graph stays within a conservative free-tier target.

A graph fits this dataset because fraud is relational. A flat table can show a suspicious row, but a graph can reveal repeated recipients, source-to-transaction-to-destination paths, concentrated fraud exposure, fraud type patterns, and shared risk signals across related entities.

Verified Graph Statistics

  • Enhanced nodes: 38,797
  • Enhanced relationships: 74,262
  • Transaction nodes: 13,213
  • Fraud transactions: 8,213
  • Conservative AuraDB Free target: below 50,000 nodes and 175,000 relationships
  • Cost path: existing AuraDB Free instance and Internal Aura Agent only

Graph Model

Core graph:

  • (:Account)-[:SENT]->(:Transaction)
  • (:Transaction)-[:TO]->(:Account)
  • (:Transaction)-[:HAS_TYPE]->(:TransactionType)
  • (:Transaction)-[:HAS_LABEL]->(:FraudLabel)

Optional cyber-risk layer:

  • (:Transaction)-[:HAS_RISK_INDICATOR]->(:RiskIndicator)
  • (:EmailSample)-[:HAS_RISK_INDICATOR]->(:RiskIndicator)
  • (:URLSample)-[:HAS_RISK_INDICATOR]->(:RiskIndicator)

The optional risk layer connects separate synthetic email, URL, and transaction samples through shared risk indicators only. It does not claim that a phishing email caused a specific transaction.

Agent Tools Used

  • Cypher Template
  • Text2Cypher

Prepared Cypher Template Tools

  • fraud_overview
  • repeated_fraud_destinations
  • high_value_fraud_paths
  • account_fraud_neighborhood
  • destination_fraud_profile
  • fraud_type_comparison
  • fraud_concentration
  • risk_indicator_overview
  • shared_risk_indicator_context

Demo Flow

  1. Ask: Which destination accounts received multiple fraudulent transfers?
    The agent uses a Cypher Template to find repeated fraudulent destination accounts, total fraud amount, fraud count, and sample origin accounts.

  2. Ask: Create a fraud investigation brief for destination account C668046170.
    The agent profiles one destination account with fraud count, total amount, source accounts, transaction types, risk indicators, and a suspiciousness explanation.

  3. Ask: Show the highest-value suspicious fraud paths.
    The agent returns high-value source-account -> transaction -> destination-account paths with amount, fraud type, and risk context.

Example Finding From The Demo
Destination account C668046170 received 2 fraudulent transactions totaling 10,160,088.68 in the synthetic graph. One transaction was a TRANSFER of 10,000,000.00 and another was a CASH_OUT of 160,088.68. The profile shows different source accounts and risk indicators such as FraudTransaction, HighAmountTransfer, RepeatedFraudDestination, and ZeroedOriginBalance.

Important Safety Note
This project uses synthetic data. The agent does not claim that any account, email, URL, or transaction represents a real person, real victim, real criminal, company, or real-world crime.

GitHub Repository

Screenshots

Aura Console graph/database screenshot

Repeated fraudulent destinations agent screenshot

Destination fraud profile case-brief screenshot

high-value fraud paths screenshot

:rocket: Aura Agent Hackathon Update

We just added 5 new projects to the community ticker banner!

:tada: Welcome:

That brings us to 20 community submissions and counting!

The variety of projects being built is incredible, spanning healthcare, finance, cybersecurity, geospatial intelligence, misinformation analysis, customer insights, supply chain optimization, tax intelligence, earnings analysis, and more.

:alarm_clock: There are still 15 days left to submit your project.

If you've been thinking about participating, now is a great time to jump in. Complete the course, claim your Aura credits, build something interesting, and share it with the community.

We can't wait to see what the next 15 days bring. :rocket:

HostLens AI - A GraphRAG assistant giving real-time market insights on rental investments using Airbnb data

Agent Name

HostLens AI

What It Does

This agent empowers prospective renters, property managers or even real estate analysts to ask general questions about rental markets, earning potential (ROI & Net Revenue) and investment viability in today's America. It has access to a Neo4j knowledge graph representing current Airbnb-friendly residential listing data across the US, Cypher analytics, semantic similarity vector search, and a Text2Cypher tool equipped with domain-specific info on regional attributes to give informed responses with smart reasoning.

Motivation

As an Airbnb host myself as well as an avid user of Airbnb when looking for rentals during travel stays, I'm always on the lookout for the best prices. I often monitor market trends in the same area I host my timeshare to give competitive rates to renters, but I also hope to expand my host properties around the country.

My aim was to find a listings site showcasing trends on property listings. It's a good thing that Airbnb has a page similar to what I was looking for. However, some problems I had included manually searching the listings, as well as not being able to easily get the data in a central forum via a public data API or purchasable dataset.

Dataset - Bring Your Own Data (BYOD)

To automate the daily extraction of listings from Airbnb's Friendly webpage & write the data model to Neo4j using an optimized schema, the following ETL process was followed:

   Airflow --> Selenium --> Kafka --> Spark Structured Streaming --> Neo4j

This streaming pipeline ensures the knowledge graph always has the most recent information. The data is never stale and accommodates upserts, inserts & deletes!

It is important to note that one can use any data orchestrator of their choice. I used some Big Data technologies to speed up the data processing & also because I have plans to expand this project in the future to a larger scope, but the official Python-Neo4j connector could replace Kafka & Spark Connect here!

Below is a depiction of the Selenium WebDriver in action scraping listing data from a few cities:

AirbnbSeleniumExtract

In Airflow's Web UI, we can see our DAG with all of its tasks running successfully on a daily basis:

We now have our data loaded into Neo4j. Below is a subset of the graph with all of its nodes, relationships and properties:

As well as the entire schema visualization and an example subgraph:

A summary of the graph database model the above images illustrate:

Node Structures

Node Type Properties Description
Country name: String Represents a country (e.g., "United States")
State code: String Represents a state (e.g., "CA", "NY")
City name: String
state: String
Represents a city with its state
Listing name: String
address: String
monthly_rent: Integer
weekly_earnings: Integer
net_revenue: Integer
roi: Float
description: String
Represents an Airbnb listing with financial metrics

Relationships

Relationship Direction From β†’ To Description
HAS_STATE β†’ Country β†’ State Country contains states
HAS_CITY β†’ State β†’ City State contains cities
HAS_LISTING β†’ City β†’ Listing City contains listings

We also can visualize our data by other key metrics and comparisons in a Neo4j dashboard:

Why a Graph Fits

Sure, it's possible to model this same data in a RDBMS table as disconnected entities. Then the question is what do we lose by doing that, and furthermore what could be gained by structuring relationships across derived properties on the data in a knowledge graph?

The graph model is compelling because it doesn't treat listings as isolated rows. It models listings through geographic containment. It lets the agent traverse from broad geography to specific properties, then aggregate back upward. That is stronger than a flat table because the agent can reason across various levels:

Listing economics --> City markets --> State trends --> Regional opportunity

The ROI turns rent and earnings into a normalized investment signal, the Net Revenue gives a simple monthly profitability estimate, and the Description supports semantic search as the vector indexes & constraints allow the agent to retrieve listings/cities by meaning. It supports hierarchical market analysis, region-based aggregation, relationship traversal and agentic tool use.

Agent Toolset

For the Aura Agent to handle specific types of user requests, we setup 3 types of data retrieval tools to query the graph.

Cypher Template

This type of tool can be used by the agent to answer inquiries with repeatable queries against the graph. Those questions can revolve around:

  • Searching for listings by rent, earnings, location, ROI & revenue potential
  • Comparing cities, states and regions using graph relationships
  • Identifying high-yield opportunities using monthly rent, weekly earnings, ROI & net revenue

Let's portray the Cypher Template tools we have equipped our agent with.

Tool Parameters Description
Get Listings by City & State cityName, stateCode Use when a user asks for listings in a specific city or city and state combination.
Get City Market Overview cityName Use for comparing markets and understanding local trends.
Get Listing Investment Details listingIdentifier Use when a user asks for financial details of a specific property.
Get Listings by Monthly Rent Range minRent, maxRent Use to return all a listings' properties when a rent range is given as input.
Get Listings by Weekly Earnings Range minEarnings, maxEarnings Use to return all a listings' properties when a weekly earnings range is given as input.
Get Listings with Best Earnings Potential limit Use to return all a listing's info based off of high weekly earnings & reasonable rent
Get Listings with Best ROI limit Use to return listings with the best ROI, based off of weekly earnings versus monthly rent ratio.
Get Listings with Best Revenue limit Use to return all a listing's info with best revenues, based off of weekly earnings minus monthly rent.

Example (Multi-Hop):

We see from the agent's thought process & reasoning that multiple tools could be called upon, showing graph traversal from State -> City -> Listing.

Similarity Search

Something to have noted in an above screenshot is that we had an embedding property in the Listing node in the graph. While using Structured Streaming to write to our graph, a vector index had been created and afterwards text embeddings using OpenAI. This enables our agent to find listings that match natural-language investment goals.

Secondly, let's also portray the Similarity Search tools we have equipped our agent with.

Tool Description
Search Listings by Semantic Similarity Find rental listings based on semantic similarity to a given description or query. This is useful for finding properties matching specific criteria, preferences, or amenities not explicitly modeled in the graph.

Example:

The agent response demonstrates similarity search over listing descriptions/embeddings while still tying the answer to concrete investment metrics.

Text2Cypher

Our previous 2 agent tools depend heavily on relationships & properties existing in the graph, but there may be scenarios where we want to give additional yet related context to the agent to answer a question via an ad-hoc query using NLP. This is especially with respect to inquiries regarding regional comparisons, weather-impact analysis, and description pattern recognition across states/regions.

Lastly, let's output a snippet of how we setup this tool with instructions.

Tool Description
Natural Language to Cypher Tool Use this ONLY when other tools cannot answer the question. Perfect for regional comparisons, weather-impact analysis, and description pattern recognition across states/regions.
WHEN TO USE: Questions requiring analysis of multiple states grouped by region, seasonal patterns, climate impact on listings, geographic trends, or correlation between weather and rental characteristics.

Example:

The user input doesn't just ask for lookups but also graph aggregation, interpretation & domain reasoning. The agent is able to give a detailed response broken down into organized sections by region.

Reflection

We have now seen how the Aura Agent grounds responses in graph traversals and listing metrics. Previously to implement this type of agent, I would've thought to first create a GraphQL API server with different resolver functions to be used as tools similar to the ones just above. Then I would've built a LangChain, Streamlit and MLflow integration to initialize an LLM, create a chat UI and trace & evaluate the LLM, respectively.

With the Aura Agent, I can (to some extent) leverage this managed platform to replace most of that additional overhead & operational complexity. I have been waiting for this type of service for a while & truly see the power it comes with!

HostLens AI GitHub Repo

MarketMind

The news names one company. Who else just moved β€” and who quietly won?

A US export ban hit the chip sector. Every name went red. One went green. Nobody typed it.

The green node is SMIC β€” a Chinese foundry the same sanction protects. MarketMind flagged it the moment the news landed, and the next day it really closed up +3.45%. I never told it to hunt for a winner. The graph found one on its own.

That's the whole idea. A headline names one company; a curated dependency graph β€” suppliers, customers, partners, owners, rivals, and the standing sanctions sitting over all of them β€” works out everyone else in the blast radius: how hard they're hit, along what path, and which way they move. It reasons about exposure, not price.

What is MarketMind?

A reasoning engine on a Neo4j graph, fronted by an Aura agent. You ask in plain English β€” "what did the 2023 chip export controls do?" β€” and it finds the event, runs the cascade, shows the path (NVIDIA β†’ TSMC β†’ …, fading each hop), names the regime, and points at the non-obvious name the headline skipped. Every number it hands back is a tool query against the graph. It doesn't freestyle.

The graph

Small on purpose β€” 104 companies, 257 dependency edges, 27 events β€” and every edge hand-vetted. The point isn't size; it's one rule.

Four relationship types ever propagate company-to-company: SUPPLIES_TO, COMPETES_WITH, OWNS, PARTNERS_WITH. That's the cascade. Everything else β€” countries, commodities, themes, sanctions β€” only seeds the shock at the right doorstep, then gets out of the way. It's never walked. (Let a Sector node into the path and every chipmaker becomes two hops from every other chipmaker, and the whole thing turns to soup. So I don't.)

Two more pieces make SMIC turn green:

  • A standing Condition node. A sanction isn't a one-off event β€” it's a state of the world that sits there until it's lifted. It CONSTRAINS companies with a sign: minus for the firms it hurts, plus for the domestic one it protects. Tighten it, and the graph re-prices the whole set β€” beneficiary included β€” with no new modelling.
  • Validation lives in the graph. Every real next-day move is an ImpactRecord node (2,565 of them), so "did this actually happen?" is a traversal, not a footnote.

The agent

Seven tools, each doing one honest job.


Agent walkthrough video:

you ask it calls you get
"what did the 2023 chip controls do?" find_event β†’ impact_cascade NVDA βˆ’0.66 Β· AMD βˆ’0.51 Β· TSMC βˆ’0.48 Β· … SMIC +0.35 (up)
"did that really happen?" validate_history NVDA βˆ’7.12% Β· SMIC +3.45% β€” misses kept in
"why is NVDA exposed?" company_dependency_map its suppliers, customers, rivals, owners
"what constrains SMIC?" active_conditions the export-control rule, and which way it cuts
"will NVDA go up tomorrow?" β€” it declines, politely

The language model only names the company the news actually hit. The graph does everything past the headline.

And another one, to follow up:

The honest part

This is where most "AI + markets" demos promise you tomorrow's close. I won't β€” and the backtest is why. (If I could reliably call tomorrow's close, I'd be doing that for a living, not entering a hackathon.) Live data is the next experiment β€” and it'll face the same honest backtest before I claim a thing. We'll see what happens :upside_down_face:

On 27 held-out events, against real prices, next-day direction lands around 47% β€” a coin flip. A dumb whole-sector average actually beats me on raw direction, 48.3% to 46.6%. I left that in the README; cropping it felt wrong.

What it does do, measurably: it surfaces about 20% of the movers the headline never named β€” which a headline-only reader catches zero of β€” and explains each with a path you can audit. The app overlays the real returns with βœ“ and βœ—, misses and all.

So: not a price oracle. A way to see who's in the blast radius and why, with the receipts. A sharp true claim beats a suspicious one.

There's an app, too

Same graph, same engine, made cinematic. The news lands and a shockwave ripples out in red, fading by hop, SMIC green in the corner. Flip to a world map and the same shock arcs from Washington to Shanghai. Hit the Time Machine and the real next-day numbers drop in with their βœ“/βœ—. Click any node for the per-hop math.

Note: I may have gotten a little carried away making this look like a trading-floor command center β€” but it's still genuinely useful under the hood. Still iterating; if you'd have built it differently, I'd love to hear how.

App video β€” with the particle effects, because watching the shock actually ripple outward is half the point :blush:: https://youtu.be/2oStQKmEpBA?si=ffTew9XaNpMigvtc

One footnote, kept honest

The graph and all seven tools are live and verified on Neo4j Aura (Free tier) β€” everything above is real output from the running instance, driven from the Aura console.

Taking the agent external β€” the public, shareable REST endpoint β€” is currently held up by an unresolved credit-redemption issue I'm still working through. (UPDATE - it is fixed, see the note below) So rather than wait around, I run the same thing locally, two ways, both checked against the hosted agent:

  • the same graph in a local Neo4j (Docker). Identical seed and records, where the same Cypher-Template tools return the same rows as the hosted agent β€” and a parity script confirms it agrees to the sign, on every one of the 27 events.
  • a local agent on Microsoft's Agent Framework (with Azure OpenAI as the model). It does the same job: the model reads the headline and names the company; the cascade engine β€” in-process, and at parity with the Cypher β€” propagates it through the graph.

Same graph, same engine, same answers - but most probably much more latency... as the Aura Agent is native and co-located to the Aura db. The hosted endpoint only adds the public door; the moment the credit-redemption issue is sorted, it goes up with nothing else to change.

Update! - Now it works with the Aura Agent! I left the workaround under a flag - documented in the source code - see below so you can still play with it fully locally :slight_smile:

I have published the app as open source, as the issue with the credit redeeming is gone - thanks @Ari_Neo4j for the supper support and incredibly fast reaction time!!!

Check it out here: GitHub - joslat/MarketMind: When news breaks, the damage travels. MarketMind traces it across a Neo4j company-dependency graph β€” who's hit, how hard, and the non-obvious name the headline never mentioned. Reasoning + exposure, not prediction. Β· GitHub - and remember to star it!!

(it has all the necessary artifacts for you to recreate the graph database, seed the data, regenerate the golden dataset with the calibration needed and the Aura Agent)

---

Built on Neo4j Aura, a directional cascade engine in three languages kept at parity, and an unreasonable amount of re-checking the numbers.

Agent Name: IPL Cricket Intelligence Agent

What it does: IPL Cricket Intelligence Agent is a conversational AI powered by a Neo4j Aura knowledge graph containing real IPL data from 2008–2024. It answers complex cricket questions through multi-hop graph traversal β€” from player awards to team rivalries to venue stats β€” delivering real insights, not just raw numbers.

Dataset and why a graph fits: I used the IPL Complete Dataset (2008–2024) containing 1,095 matches and 260,920 ball-by-ball deliveries. Cricket is naturally a graph problem β€” Players connect to Teams, Teams connect to Matches, Matches connect to Venues and Seasons. A graph captures these relationships perfectly, enabling queries that flat tables simply cannot answer.

Graph Structure:

  • Nodes: Player, Team, Match, Venue, Season
  • Relationships: PLAYED_FOR, PLAYED_IN, HELD_AT, IN_SEASON, PLAYER_OF_MATCH

Agent in action:

  • "Who has the most Player of Match awards?" β†’ AB de Villiers with 25 awards
  • "CSK vs MI head to head?" β†’ MI leads with 20 wins, CSK 17
  • "Which teams has Virat Kohli played for?" β†’ RCB exclusively

Technology Stack:

  • Database: Neo4j Aura Professional (Google Cloud Mumbai)

  • Agent: Neo4j Aura Built-in Agent

  • Query Language: Cypher
    Here's the json for my agent:
    {
    "name": "IPL Cricket Intelligence Agent",
    "description": "An intelligent agent powered by a Neo4j graph of IPL data from 2008-2024, capable of answering questions about players, teams, venues, player-of-match awards, and season performance.",
    "system_prompt": "You are an enthusiastic and insightful IPL Cricket Intelligence Agent. Always provide answers backed by real data from the IPL knowledge graph and share interesting analysis. Ensure your responses convey your passion for cricket.",
    "is_private": true,
    "is_mcp_enabled": false,
    "tools": [
    {
    "name": "Player's Teams",
    "description": "Lists all distinct teams a specific player has played for throughout their IPL career.",
    "enabled": true,
    "type": "cypherTemplate",
    "config": {
    "template": "MATCH (p:Player {name: $player_name})-[:PLAYED_FOR]->(t:Team) RETURN DISTINCT t.name AS Team",
    "parameters": [
    {
    "name": "player_name",
    "data_type": "string",
    "description": "The exact name of the cricket player."
    }
    ]
    }
    },
    {
    "name": "Matches Played by Team",
    "description": "Retrieves information about matches a specific team has played in.",
    "enabled": true,
    "type": "cypherTemplate",
    "config": {
    "template": "MATCH (t:Team {name: $team_name})-[:PLAYED_IN]->(m:Match) RETURN m.date AS Date, m.winner AS Winner, m.player_of_match AS PlayerOfMatch, m.city AS City, m.toss_winner AS TossWinner, m.toss_decision AS TossDecision LIMIT 50",
    "parameters": [
    {
    "name": "team_name",
    "data_type": "string",
    "description": "The exact name of the IPL team."
    }
    ]
    }
    },
    {
    "name": "Matches at Venue",
    "description": "Finds all matches that were held at a particular IPL venue.",
    "enabled": true,
    "type": "cypherTemplate",
    "config": {
    "template": "MATCH (v:Venue {name: $venue_name})<-[:HELD_AT]-(m:Match) RETURN m.date AS Date, m.winner AS Winner, m.player_of_match AS PlayerOfMatch, m.city AS City, m.toss_winner AS TossWinner, m.toss_decision AS TossDecision LIMIT 50",
    "parameters": [
    {
    "name": "venue_name",
    "data_type": "string",
    "description": "The exact name of the IPL venue/stadium."
    }
    ]
    }
    },
    {
    "name": "Player of Match Awards",
    "description": "Lists all matches where a specific player was awarded 'Player of the Match'.",
    "enabled": true,
    "type": "cypherTemplate",
    "config": {
    "template": "MATCH (p:Player {name: $player_name})-[:PLAYER_OF_MATCH]->(m:Match) RETURN m.date AS Date, m.winner AS Winner, m.city AS City, m.season AS SeasonYear LIMIT 50",
    "parameters": [
    {
    "name": "player_name",
    "data_type": "string",
    "description": "The exact name of the cricket player."
    }
    ]
    }
    },
    {
    "name": "Matches in Season",
    "description": "Lists all matches played during a specific IPL season.",
    "enabled": true,
    "type": "cypherTemplate",
    "config": {
    "template": "MATCH (s:Season {year: $season_year})<-[:IN_SEASON]-(m:Match) RETURN m.date AS Date, m.winner AS Winner, m.player_of_match AS PlayerOfMatch, m.city AS City, m.toss_winner AS TossWinner, m.toss_decision AS TossDecision LIMIT 50",
    "parameters": [
    {
    "name": "season_year",
    "data_type": "string",
    "description": "The year of the IPL season (e.g., '2023')."
    }
    ]
    }
    },
    {
    "name": "Natural Language to Cypher Tool",
    "description": "A general-purpose, free-form Text-to-Cypher tool that converts natural language questions into Cypher queries, executes them, and returns the results. Use this option when no specialized tool (e.g. for aggregation or domain-specific logic) is more appropriate.",
    "enabled": true,
    "type": "text2cypher"
    }
    ]
    }

Hi, here's my submission!

  1. Agent Name:
    Chess Analytics Assistant

  2. What it does:
    An AI assistant to answer questions about chess games, players, openings, outcomes, and Elo ratings based on Lichess.

  3. Dataset and why a graph fits:
    Lichess dataset: Chess Game Dataset (Lichess) | Kaggle
    Contains information about ~20,000 games. Each game is characterize by two players, an opening, and an outcome (amongst other metrics). One given player may play multiple games, and said games may share openings, giving room for several patterns to emerge (common strategies, pairings, and win/lose trends for said features). The information is related to how two players are related to a game and the outcome of said game, which naturally maps into a graph describing the interaction of these multiple agents.

  4. Screenshot of your agent in the Aura console

  5. Screenshot or short demo of your agent in action

Hi Team,

I completed the Aura Agent course and submitted my project for the Aura Agent Hackathon. I haven't received any update regarding the submission status yet.

Could you please check whether my submission was successfully recorded and if there are any further steps required from my side?

Thank you.

:rocket: Aura Agent Hackathon Update

The following community members should now have received their Aura Credits and are ready to build their Aura Agent Hackathon projects:

@student88
@https.ompramod
@Avirall
@Strawberrychewy
@kitushenoy
@sharathaashayadesign
@PriyagiJain
@manjulasharath1122
@Rodrigo-Tenorio
@hanka.medova

:tada: Congratulations and happy building!

Community Momentum

New Projects Added to the Ticker

Please welcome our newest submissions:

We are now up to 24 project submissions and the creativity coming from the community has been incredible. Developers are building graph-powered AI agents across healthcare, cybersecurity, finance, sports analytics, misinformation analysis, geospatial intelligence, supply chains, customer insights, and more.

If you've been thinking about participating, now is the time.

:alarm_clock: Submission deadline: June 15 at 11:59 PM PST

There is just one week remaining to get your project submitted and featured on the community ticker.

Ready to build?

  1. Complete the Aura Agent course
  2. Use your Aura Credits
  3. Build your graph-powered AI agent
  4. Submit your project and share it with the community

Let's see how many projects we can reach before the deadline. :rocket:

My Aura Agent Hackathon Submission: Wortgraph Coach

What it does?

Wortgraph turns the German you actually read into a personal knowledge graph in Neo4j Aura, then coaches you over it with a published Aura Agent.

Every vocabulary app stores words as a flat list. But the human mental lexicon is a network: you recall a word through its connections. That mismatch is why flashcards leak. Wortgraph stores vocabulary the way your brain does, as a graph, and the agent answers questions that a list-based app structurally cannot:

  • "What new word should I add next?" β†’ the bridge word that connects clusters you already know
  • "Which weak word do I review first, and why?" β†’ ranked by connectivity Γ— forgetting, not just by score
  • "What am I worst at?" β†’ a collapsing topic cluster, not one stray word

The agent in action:

The AI the Coach picks the Get Bridge Words Cypher template and recommends DΓΌrre because the graph shows it connects 7 words the learner already knows.

The agent in the Aura console:

12 tools β€” 10 Cypher templates, Similarity Search, Text2Cypher with their Cypher queries

Similarity Search tool:

"find me words about money and debt" just a click away.

Result: Verschuldung, GeldwΓ€sche, sparen β€” meaning-based retrieval over 3,072-dim Gemini embeddings, English or German.

The learning loop that feeds the graph

User Input:

User pastes real German (here: a news piece on GuantΓ‘namo). Groq LLaMA extracts the words worth learning β€” and every word is verified against the Wiktionary API + DWDS corpus, so a hallucinated word can never enter the graph.

Study list:

A list with an English translation of all the added words

Flashcards - The key idea behind the Wortgraph:

The article (der/die/das) is drilled with every noun, with a real example sentence. Grading a card writes retention back into the graph β€” so the agent always knows what's slipping.

Playful Games and practice hub - generated by Cypher :

Odd-One-Out pulls three semantically grouped words + one outlier from the graph (vector index + traversal), explains WHY you were right or wrong, and shows the exact query that built the round.

The same words become a crossword:

Clues are generated from the learner's own deck β€” this morning's article is tonight's puzzle.

Dataset and why a graph fits

The learner's own reading, continuously extracted, verified, embedded (gemini-embedding-001, 3,072-dim vector index) and enriched by a Graph-RAG pipeline: vector retrieval grounds an LLM that classifies synonym/antonym/word-form relations, and the typed edges are written back.

Demo deck: 139 words, ~700 co-occurrence edges, 8 topics, 3,500+ relationships across the full graph.

The deck is a graph:

Every saved word linked by CO_OCCURS_WITH from shared sources.

Meaning clusters:

One English concept, every German word that expresses it β€” (:Word)-[:MEANS]->(:Meaning).

Why a graph fits β€” the bridge-word insight. The flagship feature is a traversal: a word you don't know, co-occurring with β‰₯2 words you do know:

MATCH (u:User {id:'demo'})-[:ADDED]->(known:Word)
MATCH (known)-[:CO_OCCURS_WITH]-(candidate:Word)
WHERE NOT EXISTS { (u)-[:ADDED]->(candidate) }
WITH candidate, count(DISTINCT known) AS bridgeDegree,
     collect(DISTINCT known.lemma)[0..5] AS connectedTo
WHERE bridgeDegree >= 2
RETURN candidate.lemma, bridgeDegree, connectedTo
ORDER BY bridgeDegree DESC LIMIT 5

One MATCH in Cypher; recursive self-join misery in SQL. Same story for graph-aware spaced repetition: degree Γ— (1 βˆ’ retention) β€” a connected word you barely know outranks an isolated one. A centrality computation over a personal subgraph is simply not a relational query.

Schema:

(:User)-[:ADDED {retention, reviewCount}]->(:Word {lemma, article, cefr, translation, embedding[3072]})
(:Word)-[:CO_OCCURS_WITH {strength}]-(:Word)
(:Word)-[:BELONGS_TO]->(:Topic)
(:Word)-[:MEANS]->(:Meaning)
(:Word)-[:SYNONYM_OF|ANTONYM_OF|FORM_OF {confidence, reason}]->(:Word)

Why this needed Neo4j specifically:

  1. Vector search and graph traversal in one query. The Graph-RAG pipeline calls db.index.vector.queryNodes('word_embeddings', …) and walks the learner's ADDED edges in the same Cypher statement β€” semantic candidates, filtered to the user's deck, in one round trip. The conventional alternative is a separate vector store plus a relational DB: two systems, two queries, a join in application code, and two sources of truth to keep synced. Neo4j's native vector index collapses that to one.
  2. The game logic is the query language. "Three words sharing a meaning hub plus one outlier" and "an unknown word adjacent to β‰₯2 known words" are single Cypher patterns. They're not stored procedures bolted onto tables β€” the graph pattern is the game rule, which is why the games can't hallucinate.
  3. The database is the interface. The Graph and Clusters views render the actual stored structure β€” nodes and relationships β€” as the learning surface. You can't show a learner a foreign-key diagram and call it a study tool; you can show them their knowledge graph, because in a graph the shape carries the meaning.

Every insight ships with its Cypher:

bridge words, weak clusters, high-leverage weak words, collocation pairs β€” each card has a "show Cypher query" toggle. The agent answers WHY, and you can read the pattern that proved it.

The live app runs on the published agent:

The in-app Coach cites priority score, retention and connection degree β€” and the insight cards below independently agree, because it's one graph underneath.

Tech Stack

  • Neo4j Aura (AuraDB + published Aura Agent)
  • Groq LLaMA 3.3 70B (extraction + Graph-RAG classification)
  • Gemini embeddings (3,072-dim vector index)
  • Wiktionary + DWDS verification
  • Express
  • React

Why this matters: graphs for education

Among agents for healthcare, finance, and cybersecurity, Wortgraph brings the knowledge graph to education β€” one of the most personal datasets there is: what a learner knows. Germany alone counts ~15 million German learners worldwide, and for the immigrants and refugees among them, vocabulary retention isn't a hobby metric β€” it's the speed at which a new life starts working. Wortgraph gives every learner something no textbook has: a map of their own knowledge, and an agent that reasons over it.

Where this goes with more attention:

  • Any language, same graph. The schema is language-agnostic β€” swap the corpus verification (Wiktionary supports 180+ languages) and the same bridge-word engine works for Spanish, Arabic, or Ukrainian learners.
  • Per-learner published agents β€” every user gets an agent scoped to their own subgraph, making "a tutor that knows exactly what you know" the default, not the demo.
  • Classroom graphs β€” a teacher's view over the merged class graph: which cluster is the whole class weak on? Which bridge word unlocks the most for the most students? The same Cypher, one MATCH wider.
  • MCP everywhere β€” the agent is already an MCP endpoint; connected to the assistants learners already chat with, every AI conversation becomes vocabulary practice calibrated to their actual level.

Flashcard apps store your effort as a list and let it leak. Wortgraph stores it as a graph β€” where every word you add strengthens the ones you already have.

Try it out

Build on it

The whole project is MIT-licensed and meant to be taken apart:

  • Run it locally in around 5 minutes β€” the README has full setup (Neo4j Aura free tier + free Groq/Gemini keys); server/diagnose.js verifies your whole stack with one command
  • Fork it for your language β€” the schema is language-agnostic; swap the Wiktionary/DWDS verification (Wiktionary covers 180+ languages) and the bridge-word engine works for Spanish, Arabic, Ukrainian…
  • Steal the patterns β€” the Graph-RAG pipeline (server/relations.js), the agent's 12 tool configs (AURA_AGENT.md), and the Cypher-generated games (src/games/) are all documented for exactly that purpose

Suggestions very welcome β€” open an issue if you have ideas, especially on: better bridge-word ranking, new graph-generated game mechanics, or what a classroom-graph view should look like. If something's broken or confusing when you run it, that's an issue too β€” I'd rather know.

I just shared a LinkedIn post highlighting the incredible projects being built during the Aura Agent Hackathon.

If your project is featured below, I would greatly appreciate your help amplifying it. Feel free to jump into the comments, tag yourself, share a screenshot, explain your project, or repost it to your network.

Featured projects:

@mabu.mate
@venkatasaiprasadp
@dhiraj.patra
@shantanunitw01
@jevlachov
@deanfoulds
@jonas.koenner
@mahanteshimath
@prashant7090
@ochiengokumu482
@shrikantbhadane
@violet.ols.dev
@tarunnarang1992
@epiphanyan
@wanga.15065
@kumar20051020shivam
@jakub.kierznowskiii
@vinod.nageshwar
@winleexz
@hellopriyanshu4
@ayushdeosingh
@joslat
@priyagijain
@Rodrigo-Tenorio

LinkedIn Post:

Thank you for helping showcase what the Neo4j community is building. :raising_hands:

NaijaHealth - AI Health Intelligence for Kano State, Nigeria

Agent: NaijaHealth | Dataset: DHIS2 Nigeria NHMIS Monthly Summary Form (2019) + Vaccines Utilization | State: Kano - 44 LGAs, ~800+ facilities, 12 months of data

Live app: https://naijahealth.streamlit.app
GitHub: GitHub - martynscn/NaijaHealth: Chat interface for the Nigerian Health Management Information System built with DHIS2 using Neo4j Aura Agents Β· GitHub



The Problem: Nigeria's Health Data is Trapped in Flat Tables

Nigeria's DHIS2 system collects extraordinary health data few examples include OPD attendance, ANC visits, immunization coverage, malaria cases, HIV testing, facility reporting rates for every health facility in every LGA every month. For Kano State alone, that's 44 LGAs, ~800 facilities, 80+ indicators, 12 months = over 100,000 data points.

But in its raw form, this data answers only one type of question: "What is the value of indicator X for facility Y in month Z?"

The questions health policymakers actually need to ask are relational:

  • Which LGAs have high Penta 1 vaccination but suspiciously high dropout before Penta 3, and do their reporting rates suggest data quality problems?
  • Which facilities have below-average ANC attendance and are also located in LGAs with poor NHMIS reporting compliance?
  • Are public or private facilities more likely to report on time in Kano Municipal?

These questions cross three to four hops: from a DataValue, through an Indicator, through a Facility, through an LGA, in a single reasoning step. A relational database answers them with painful multi-join SQL. A graph answers them naturally.


The Dataset and Why a Graph is the Right Structure

Data source: DHIS2 Nigeria; Nigeria's official health management information system, which implements the WHO-aligned NHMIS Monthly Summary Form (2019 version) plus a Vaccines Utilization dataset.

Why Kano State? Kano is Nigeria's most populous state, with more health facilities than any other state and a long history of NHMIS data collection. The 44-LGA structure creates a natural three-level geographic hierarchy (State, LGA, Facility) that is inherently graph-shaped.

Why a graph and not a flat table?

In DHIS2's API, the same data record contains a data element ID, an organisation unit ID, a period string, and a numeric value. There is no explicit link between a facility and its LGA in a row; that relationship lives in a separate hierarchy API call. In a relational system, answering multi-dimensional questions requires assembling four or five joins across tables that were never designed to be queried together.

In Neo4j, the relationships are first-class citizens:

(DataValue)-[:AT_FACILITY]->(Facility)-[:IN_LGA]->(LGA)-[:IN_STATE]->(State)
(DataValue)-[:FOR_INDICATOR]->(Indicator)
(ReportingRate)-[:FOR_FACILITY]->(Facility)
(DataValue)-[:AT_LGA]->(LGA)   <- for calculated LGA-level coverage percentages

A question like "which LGAs have both low BCG coverage and poor reporting compliance?" becomes a single graph traversal; no joins, no subqueries:

MATCH (l:LGA)<-[:AT_LGA]-(dv:DataValue)-[:FOR_INDICATOR]->(i:Indicator)
WHERE toLower(i.name) CONTAINS 'bcg' AND dv.period STARTS WITH '2025'
WITH l, avg(toFloat(dv.value)) AS bcg_coverage
WHERE bcg_coverage < 70
MATCH (f:Facility)-[:IN_LGA]->(l)
MATCH (r:ReportingRate)-[:FOR_FACILITY]->(f)
WHERE r.period STARTS WITH '2025'
RETURN l.name AS lga, round(bcg_coverage) AS bcg_pct,
       round(avg(r.rate)) AS avg_reporting_rate
ORDER BY bcg_coverage ASC

That is a four-hop traversal that feels like a single question. This is what graphs do that tables cannot.



Graph Schema

Nodes

Label Key Properties Notes
:State id, name One node; "Kano"
:LGA id, name 44 nodes, all prefixed kn from DHIS2
:Facility id, name, type, level, ownership, embedding ~800+ nodes; ownership derived from DHIS2 organisation groups (keywords: govt/federal/state/lga β†’ Public; faith/ngo/mission β†’ Private)
:Indicator id, name, shortName, category, indicator_type, embedding 80+ NHMIS data elements + calculated coverage indicators
:DataValue id, value, period, period_type 100,000+ records; period format YYYY-MM; period_type = 'monthly'
:ReportingRate id, rate, rate_on_time, expected, actual, period, dataset dataset = 'NHMIS' or 'Vaccines' for independent compliance tracking

Relationships

Relationship Direction Meaning
[:IN_LGA] Facility β†’ LGA Geographic containment
[:IN_STATE] LGA β†’ State Geographic containment
[:AT_FACILITY] DataValue β†’ Facility NHMIS facility-level monthly reading
[:AT_LGA] DataValue β†’ LGA Calculated LGA-level coverage / dropout rates
[:FOR_INDICATOR] DataValue β†’ Indicator What was measured
[:FOR_FACILITY] ReportingRate β†’ Facility Compliance record per facility per month

Vector Indexes

Two 1024-dimensional cosine similarity indexes (OpenAI text-embedding-3-small):

  • facility_embeddings: enables semantic queries such as "find PHCs similar to this community health post"
  • indicator_embeddings: enables queries such as "find indicators related to maternal health and skilled delivery"

The Aura Agent: 9 Tools Across All Three Tool Types

The agent (agent/naija-health.json) uses all three Aura Agent tool types: Cypher Templates, Text2Cypher, and Similarity Search. This gives it precise, reliable tools for known query patterns and the flexibility to handle completely open-ended questions.

# Tool Type What It Does
β‘  lga_health_overview Cypher Template Top indicators for a specific LGA and year
β‘‘ compare_lgas_by_indicator Cypher Template Rank all 44 LGAs by a keyword-matched indicator
β‘’ reporting_rates Cypher Template Reporting compliance per LGA/facility, filtered by dataset
β‘£ coverage_rates_by_lga Cypher Template BCG / Penta / Measles / TT2 coverage % via [:AT_LGA] edges
β‘€ penta_dropout_by_lga Cypher Template Penta 1β†’3 dropout rates across all LGAs
β‘₯ reporting_by_ownership Cypher Template Public vs Private facility reporting rate breakdown
⑦ health_data_query Text2Cypher Ad-hoc NL β†’ Cypher for questions outside the templates
β‘§ find_similar_facilities Similarity Search Find facilities semantically similar to a description
⑨ find_similar_indicators Similarity Search Find indicators semantically related to a health concept

The agent's system prompt instructs it to always cite the LGA or facility name, explain what each indicator measures, and reason about why a number is significant; not just return it. This keeps responses anchored in public health context rather than abstract data.



A Sample Conversation: Graph-Driven Multi-Hop Reasoning

Here's a real interaction that demonstrates the kind of insight only a graph can provide:


User: Which LGAs have the highest Penta dropout rates, and do those same LGAs also have poor reporting rates?

NaijaHealth:

Looking at Penta 1β†’3 dropout rates across all 44 Kano LGAs for 2025–2026, the five LGAs with the highest dropout are:

LGA Penta Dropout Rate NHMIS Reporting Rate Vaccines Reporting Rate
kn Rogo 38.2% 61.4% 58.9%
kn Kibiya 35.7% 67.2% 63.1%
kn Garun Mallam 33.1% 72.8% 70.4%
kn Warawa 31.9% 79.3% 76.2%
kn Bebeji 29.4% 81.1% 79.8%

A clear pattern emerges: the three LGAs with the highest Penta dropout (Rogo, Kibiya, Garun Mallam) also have the lowest facility reporting rates, below 73% on both NHMIS and Vaccines datasets. This correlation suggests the problem may not be purely vaccine hesitancy or supply chain failure; it may partly reflect reporting gaps where the second and third doses are being administered but not recorded. Rogo's 61.4% NHMIS reporting rate means almost four in ten facilities never submitted a form for that month.

Recommended next step: cross-reference the specific facilities in Rogo and Kibiya that are non-reporters. Are they predominantly public (government) or private (faith-based/NGO) facilities? Run: "Show reporting by ownership in Rogo LGA."


This answer required the agent to:

  1. Traverse DataValue -[:AT_LGA]-> LGA -[:FOR_INDICATOR]-> Indicator (dropout rates)
  2. Independently traverse ReportingRate -[:FOR_FACILITY]-> Facility -[:IN_LGA]-> LGA (reporting compliance)
  3. Join both result sets on LGA name
  4. Reason about why the correlation matters in a public health context

None of these hops are possible in a single flat query against the raw DHIS2 API response.


Architecture


What Makes This Graph-Native, Not Just Graph-Stored

Three things distinguish NaijaHealth from a system that merely loads flat data into Neo4j:

Ownership detection at load time. Facility ownership (Public or Private) is derived by matching DHIS2 organisation group names against keyword lists at load time and stored as a property on the Facility node. The reporting_by_ownership tool then traverses ReportingRate -> Facility.ownership -> LGA in a single hop, answering policy-relevant questions about public vs. private compliance that would require a separate lookup table in SQL.

Dual edge types for the same DataValue. Coverage indicators (BCG percentage, Penta percentage, dropout rates) are stored with [:AT_LGA] edges directly to LGA nodes, while NHMIS readings are stored with [:AT_FACILITY] edges. LGA-level analytical queries therefore do not need to aggregate from facilities; they traverse a pre-aggregated edge. The distinction is also carried in the dataset property on ReportingRate nodes, so NHMIS and Vaccines compliance can be compared without a join.

Vector indexes on both Facility and Indicator nodes. Semantic similarity search over health indicators means a user asking "find indicators related to skilled birth attendance" retrieves TT2 coverage, ANC visits, and institutional delivery rates without knowing the exact DHIS2 indicator names. The embedding space bridges the terminology gap between how policymakers speak and how DHIS2 names its data elements.


Current Scope and Path to National Scale

NaijaHealth currently covers Kano State; Nigeria's most populous state, with 44 LGAs and approximately 800 health facilities. The graph schema, pipeline, and agent tools are designed to be state-agnostic: loading a second state is a matter of pointing the extraction script at a different DHIS2 organisation unit ID and running the same pipeline.

When scaled to all 36 states and the FCT, NaijaHealth becomes a national health intelligence tool. At that scale it would be directly useful to:

  • Nigeria Federal Ministry of Health (FMoH): for national NHMIS performance monitoring, identifying states and LGAs with persistent reporting gaps, and tracking immunization programme outcomes across the country.
  • International health partners: including WHO Nigeria, UNICEF, USAID-funded programmes (like PEPFAR implementers), and the Global Fund, all of whom rely on NHMIS data for programme planning and accountability.
  • Nigerian health-focused NGOs: organisations such as eHealth4everyone, Solina Centre for International Development and Research (SCIDaR), Plan International amongst others could use a national NaijaHealth instance to identify underperforming areas, target interventions, and track reporting compliance across their programme states.
  • State Primary Health Care Development Agencies (SPHCDAs): each state agency could query its own LGA-level data without needing SQL or data science capacity; the agent handles the query translation.
  • Researchers and journalists covering public health, maternal and child health, or disease surveillance in Nigeria.

The only engineering change required to go national is scale: more facilities, more LGAs, more DataValue nodes. The graph relationships, the agent tools, and the reasoning logic remain identical. A graph database handles this kind of hierarchical scale better than any flat schema could.


Tech Stack

Component Technology
Knowledge Graph Neo4j Aura (eu-central-1)
Graph Agent Neo4j Aura Agent; 6 Cypher Templates, 1 Text2Cypher, 2 Similarity Search
Embeddings OpenAI text-embedding-3-small (1024 dimensions)
Data Source DHIS2 Nigeria REST API
Pipeline Python 3.11; requests, neo4j driver, openai, python-dotenv
Chat UI Streamlit (deployed on Streamlit Community Cloud)
MCP Server FastMCP (stdio); custom tools for Claude Code and Claude Desktop
Auth OAuth2 client_credentials; Bearer token via HTTP Basic Auth, 30minute cache

Aura Agent MCP endpoint: https://mcp.neo4j.io/agent?project_id=fa2ec884-6eff-48e0-bc6b-391bd018771e&agent_id=63d8f04d-9549-44c0-a38f-13ccf475847a


Data source: DHIS2 Nigeria; NHMIS Monthly Summary Form (Version 2019). Kano State, 44 LGAs, ~800+ facilities, 2025-06 to 2026-05.

Movie Graph Agent β€” Neo4j Aura Hackathon

Project: Movie Graph Agent Tech: Neo4j Aura Free Β· TMDB Dataset Β· Neo4j Built-in Agent Β· Python


Ever wanted to just ask a database a question and get a proper answer back? That's exactly what this project does β€” but with a knowledge graph powering it under the hood.

I built a Movie Graph Agent on Neo4j Aura using the TMDB dataset. The idea was simple: load real movie data into a graph database, then put a conversational agent on top of it so anyone can query it in plain English β€” no Cypher knowledge required.

What I built:

500 movies from the TMDB dataset were loaded into Neo4j Aura Free using a Python script (load_movies.py). Each movie is connected to its directors, cast, genres, and ratings as graph nodes and relationships β€” because movie data isn't tabular, it's a network. Who directed whom, who acted with whom, which genres overlap β€” all of this lives naturally in a graph.

On top of that graph, I set up a Neo4j Aura built-in agent β€” no LangChain, no external orchestration framework. Just Neo4j's native agent feature with a custom prompt instruction that makes it behave like a knowledgeable movie expert. It reads the graph, reasons over it, and answers in plain conversational English.

Sample queries it handles:

  • "What movies did Christopher Nolan direct?"
  • "Show me all Action movies"
  • "Which actors appear most frequently in Drama films?"

Why graph over a regular database? Relational databases struggle when you start asking connected questions β€” "Find me thriller movies featuring actors who also worked with Nolan" becomes a multi-join query mess. In Neo4j, that's just a pattern match. The graph makes the agent smarter because the data structure itself encodes relationships.

Everything runs on Neo4j Aura Free β€” no server setup, no cost, fully cloud-hosted. The agent is deployed as an Internal instance on the Aura project, previewed and tested directly from the Aura console.

This was a fun exploration of how much you can do with just a graph + a built-in agent, without reaching for heavy ML infrastructure.
:wrench: Data Loading Code (Python)

Connected to Neo4j Aura and loaded 500 movies from TMDB dataset. Each movie is linked to its genres, actors, directors, and keywords as separate nodes with relationships.

Graph schema:

(:Movie)-[:HAS_GENRE]->(:Genre)
(:Actor)-[:ACTED_IN]->(:Movie)
(:Movie)-[:DIRECTED_BY]->(:Director)
(:Movie)-[:HAS_KEYWORD]->(:Keyword)

load_movies.py

python

driver = GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD))

movies  = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')
credits.columns = ['id', 'title', 'cast', 'crew']
df = movies.merge(credits[['id', 'cast', 'crew']], on='id').head(500)

def load_data(tx, row):
    genres    = [g['name'] for g in safe_parse(row['genres'])]
    keywords  = [k['name'] for k in safe_parse(row['keywords'])][:5]
    cast      = [c['name'] for c in safe_parse(row['cast'])][:5]
    directors = [c['name'] for c in safe_parse(row['crew']) if c.get('job') == 'Director']

    tx.run("""
        MERGE (m:Movie {id: $id})
        SET m.title=$title, m.overview=$overview, m.rating=$rating, m.year=$year
    """, ...)

    # Genre, Actor, Director, Keyword nodes + relationships created via MERGE

:rocket: Aura Agent Hackathon Update

We've added three more projects to the community ticker banner!

Please welcome:

The showcase continues to grow with developers building graph-powered AI agents across healthcare, finance, cybersecurity, sports analytics, customer intelligence, geospatial intelligence, and more.

You can explore all current submissions here:

:backhand_index_pointing_right: Aura Agent Hackathon Project Showcase

Aura Credits Sent

The following community members should now have received their Aura Credits and are ready to build their Aura Agent Hackathon projects:

@mayursahare51
@roger.barrow1
@esto.sarat
@sandeepchakravartty

:alarm_clock: There is still time to participate.

The Aura Agent Hackathon submission deadline is June 15 at 11:59 PM PST.

Ready to build?

:backhand_index_pointing_right: πŸ† Start Here: Register & Get Aura Credits: Aura Agent Hackathon

We look forward to seeing what you create.