Turn your data into something that actually thinks.
Build an AI agent on top of a knowledge graph that understands context, not just answers. Learn fast, build something real, and put your skills to work.
All Qualified Submission get our brand newly designed "Agent" T-Shirt! *
*One T-Shirt per mailing address
My Aura Hackathon Submission: Apple HealthGraph Agent
What it does
HealthGraph Agent turns your Apple Health data into a knowledge graph that reasons about your body. It answers questions no health app can: "Why was my recovery terrible last Thursday?" "What pattern precedes my best sleep?" "How does strength training vs. running affect my HRV differently?"
Apple Health collects 50+ metric types daily, but stores them as disconnected time series. You can see what happened never why. HealthGraph Agent connects the dots by building a graph where workouts, sleep sessions, heart rate variability, resting heart rate, blood oxygen, and activity rings are linked through temporal and causal relationships. The agent then uses multi-hop graph reasoning to surface patterns that are invisible in flat dashboards.
Dataset and why a graph fits
Dataset: Apple Health XML export (or synthetic data for people without an iPhone/Apple Watch). The ETL pipeline parses the raw export.xml using streaming XML processing, transforms ~48,000 records per year into graph-ready structures, and batch-loads into AuraDB.
Why a graph fits this could be the key insight:
Health data is inherently relational. A table can tell you your HRV was 28ms on Thursday. A graph can tell you why:
That HIIT session on Thursday β poor recovery on Friday isn't visible in any time-series chart. But in the graph, it's a two-hop traversal. The agent follows these relationship chains to explain causality, not just correlation.
The FOLLOWED_BY relationship between workouts and sleep sessions is where the graph gets powerful β it captures temporal causation with the hours_between property, allowing the agent to reason about recovery windows.
Scale: ~700 nodes and ~1,200 relationships for 12 months of data. 366 Day nodes, 366 DailySummary nodes, ~220 Workout nodes, 366 SleepSession nodes, 53 Week nodes, plus Device and MetricType nodes. Small graph, rich reasoning.
Agent tools
Tool
Type
What it enables
Weekly overview
Cypher Template
"How was my health last week?" β returns daily metrics across 8 dimensions
Workout β sleep impact
Cypher Template
"How does running affect my sleep?" β multi-hop traversal from workout through day to next sleep session and next-day HRV
Free-form health queries
Text2Cypher
Any natural language question about health patterns β agent generates Cypher from the graph schema
Pattern matching
Similarity Search
"Find days like my best recovery day" β vector similarity on DailySummary text embeddings
Example conversation
User: "How does strength training affect my sleep compared to running?"
Agent reasoning (multi-hop):
Matches all Workouts where activity_type = 'TraditionalStrengthTraining'
Follows FOLLOWED_BY β SleepSession to get post-workout sleep duration
Follows ON_DAY β Day β NEXT_DAY β Day β HAS_SUMMARY β DailySummary for next-day HRV
Repeats for activity_type = 'Running'
Compares the two distributions
Agent response:After strength training sessions (avg 50 min), you sleep 7.4 hours on average with next-day HRV of 38ms. After running sessions (avg 35 min), sleep drops to 6.9 hours but next-day HRV recovers to 44ms. Your body recovers faster from cardio than resistance training β consider scheduling strength sessions earlier in the week when you can afford a slower recovery.
This answer requires 4 hops through the graph. No flat database can produce it.
What makes this different
1. The graph drives the insight, not just stores data. Every answer traces a relationship path. "Your HRV dropped because..." always cites the specific workout β sleep β recovery chain that explains it.
2. Anyone can use it. Don't have an Apple Watch? The repo includes a synthetic data generator with 4 persona profiles (athlete, biohacker, sedentary, default) that produce 12 months of realistic, correlated health data. One command: GENERATE=1 PERSONA=biohacker bash scripts/run_pipeline.sh
3. The agent explains its reasoning. When the agent says "your best recovery days follow yoga sessions," it shows the Cypher path it traversed and the specific nodes it visited. The reasoning tab makes the graph traversal transparent.
4. It's a real tool for a real community. The quantified self and biohacking community (millions of Apple Watch users tracking their health) currently has no way to query cross-metric correlations. This agent fills that gap.
DepGraph Agent turns your PyPI dependency tree into a knowledge graph that reasons about supply chain security. It answers questions no health app β sorry, no security tool β can: "What's the blast radius of werkzeug?" "Why is my app exposed to this CVE?" "How does a vulnerability in a low-level library propagate up through 4 hops to my top-level packages?"
PyPI packages have hundreds of transitive dependencies, but most tools only show direct CVEs. You can see what is vulnerable β never why your app is affected. DepGraph connects the dots by building a graph where packages, dependency relationships, and known OSV vulnerabilities are linked through explicit edges. The agent uses multi-hop graph reasoning to trace propagation paths that are invisible in flat dashboards.
Dataset and why a graph fits
Dataset: Two free public APIs β no Kaggle, no signup:
PyPI JSON API (pypi.org/pypi/<pkg>/json) β live package metadata and dependency lists
OSV API (api.osv.dev/v1/query) β real CVE/GHSA vulnerability data for PyPI packages
25 seed packages (flask, django, requests, cryptography, pillow, urllib3, celery, etc.) + 1 hop of their dependencies β 269 packages, 801 real vulnerabilities.
Why a graph fits β this is the key insight:
A table can tell you werkzeug has CVEs. Only a graph can tell you why your app is affected:
That flask β werkzeug β CVE chain isn't visible in any dependency scanner's flat output. But in the graph, it's a two-hop traversal. The agent follows these relationship chains to explain causality, not just correlation.
When Log4Shell dropped in 2021, teams needed to know their blast radius across 3β4 hops of transitive dependencies. That's a graph traversal problem. Recursive CTEs in SQL break at scale and can't explain the path. A graph returns the full chain in milliseconds and makes the reasoning transparent.
Follows (upstream)-[:DEPENDS_ON*1..4]->(werkzeug) β finds flask, starlette, and others
Returns affected packages with hop count, vuln count, and sample CVEs
LLM synthesizes the chain into a human explanation
Agent response:
Werkzeug 3.1.3 carries 20 known vulnerabilities including CVE-2023-25577 (HIGH β path injection) and CVE-2023-46136 (HIGH β DoS via multipart parsing). Flask directly depends on werkzeug (1 hop), meaning any application using flask is transitively exposed. The dependency chain is: your-app β flask β werkzeug β [CVE]. Starlette also has werkzeug in its dependency tree (2 hops) with 7 additional vulnerabilities in the chain.
This answer requires 3 hops through the graph. No flat database produces it.
What makes this different
1. The graph drives the insight, not just stores data.
Every answer traces a relationship path. "Your app is exposed because..." always cites the specific package β dependency β CVE chain. The reasoning is transparent β the agent shows which tool it used and what graph data it retrieved.
2. Real data, live APIs.
No synthetic datasets. Every vulnerability is pulled live from the OSV database. Every dependency edge is pulled live from PyPI. The graph reflects the actual state of the ecosystem today.
3. Semantic vulnerability search.
Vulnerability embeddings (3072-dim via Gemini) enable queries like "find CVEs similar to SQL injection in ORM layer" β matching by attack pattern, not CVE ID. This surfaces related vulnerabilities across different packages that keyword search misses entirely.
4. It solves a real problem for a real community.
Every Python developer has transitive dependencies they don't fully understand. DepGraph makes the hidden exposure visible and explainable β not just a list of CVEs, but the exact path through which each one reaches your code.
Aura Credits Sent! Check your email that used when you completed the Graph Academy course and signed up on the form.
Huge shoutout to everyone below β your Aura Credits have been sent . If signed up and your name is not below below, please DM me as you were missing some important information in order to receive your credits.
The next batch of credits will be sent out later this week!
We canβt wait to see what you build with Aura. Go create something awesome and share it with the community Can't wait to get your projects on the ticker!
I have completed the course and also got the credit. Started analysis for the hackathon so that can develop real life solutions with Aura, graph, RAG and Agents.
Thanks for this opportunity.
Update! Thanks to your reports, I was able to prioritize this with our support team...the team resolved this and updated fix is live. I tested it and have redeemed my credits :).
Please give me a thumbs up here once you are able to claim them.
Hello neo4J Community. early this day, I have completed the course. I would like my $100 aura credits be posted to my account. thank you for your help.
Thank you. Your credits will be sent within 2-3 business days by email with redemption instructions. If you do not receive them by Friday morning, please DM me.
"Which Basel IV controls is this bank missing and why?"
"What is the full counterparty contagion chain if this bank fails?"
"Which regulatory rule was violated in this risk event?"
"Find past incidents semantically similar to this new breach"
"What controls would have prevented this risk event?"
"Which Basel IV rules supersede older Basel III rules?"
The agent traverses a knowledge graph of financial institutions, regulatory obligations, control implementations, counterparty exposures, and historical risk events to explain WHY something is compliant or risky β not just whether it is. Every answer cites the exact graph relationship path that produced it.
What makes this unique: the agent is also published as a live MCP server endpoint and connected directly to Claude.ai via MCP connector β judges can query it live from Claude right now.
Dataset and why a graph fits
The graph models the global banking regulatory landscape with fictional institution names:
Node Type
Count
Description
Entity
8
Banks across US, UK, Germany, Switzerland, India
Regulation
5
Basel IV, Basel III, DORA, Dodd-Frank, SREP
Rule
10
Regulatory rules with supersession chains
Control
10
Compliance controls linked to specific rules
RiskEvent
7
Historical incidents with severity and financial impact
Regulator
6
Enforcement bodies with jurisdiction
Total
46 nodes
96 relationships, 10 relationship types
Why only a graph solves this:
A compliance officer asking "Is Horizon Bank compliant with Basel IV?" needs to traverse:
That is a 4-hop traversal. The missing controls are the difference between those two sets. No SQL query or flat table can express this cleanly β and it gets worse when you add counterparty contagion:
This is a multi-hop path traversal with relationship property aggregation β total USD exposure accumulated across each hop. The graph makes the reasoning transparent and auditable, which is exactly what regulators demand.
Additional graph-native features:
SUPERSEDES chain β Basel IV rules supersede Basel III rules, enabling regulatory evolution tracking
TRIGGERED_BY β links risk events directly to the rule that was breached
AFFECTED β tracks which entities were impacted by contagion from a risk event
Vector embeddings (3072-dim Gemini) on RiskEvent nodes for semantic similarity search
Agent Tools
Tool
Type
What it does
Missing Regulatory Controls for Bank
Cypher Template
4-hop traversal finding controls required by regulation but not implemented by the bank
Counterparty Contagion Paths
Cypher Template
Multi-hop COUNTERPARTY_OF traversal up to 3 hops with cumulative USD exposure
Rules Violated by Risk Event
Cypher Template
Links RiskEvent β Rule β Regulation β Entity for full violation context
Bank Regulations and Enforcement
Cypher Template
Entity β Regulation β Regulator with jurisdiction and authority level
Superseding Basel Rules
Cypher Template
SUPERSEDES chain showing Basel IV replacing Basel III rules
Preventive Controls for Risk Event
Cypher Template
RiskEvent β Rule β Control with implementation status and prevention assessment
Search Similar Risk Events
Similarity Search
3072-dim Gemini embeddings on risk event descriptions via vector cosine index
Natural Language to Cypher
Text2Cypher
Last resort fallback for ad-hoc aggregation queries
Example interactions
1. Compliance gap analysis (Cypher Template)
Query: "What regulatory controls is Horizon Bank missing?"
Response: Identifies 3 missing controls β Leverage Ratio Monitor (Basel IV, daily), Stress Testing Framework (Basel III + SREP, quarterly), NSFR Monitor (Basel III + Basel IV, monthly) β with regulation, frequency, purpose and risk exposure summary.
2. Counterparty contagion (Cypher Template)
Query: "Show me counterparty contagion paths from Horizon Bank"
Response: Returns 10 multi-hop paths with full chain traversal, cumulative USD exposure per path, hop count, and circular risk loops where contagion returns to the originating bank β total systemic exposure quantified.
3. Semantic incident matching (Similarity Search)
Query: "Find past risk events similar to a ransomware attack on banking infrastructure"
Response: Matches EVT-007 (score 0.837, ransomware on third-party vendor) and EVT-003 (score 0.861, IT outage disrupting operations) with semantic reasoning explaining why each event matches.
4. Preventive control analysis (Cypher Template)
Query: "What controls would have prevented risk event EVT-001?"
Response: Traces EVT-001 β TRIGGERED_BY β BASEL4-LIQ-1 β REQUIRES_CONTROL β Liquidity Coverage Ratio Monitor, confirms control was NOT IMPLEMENTED at Alpine Investment Bank, explains the causal chain.
What makes this different from other submissions
1. Graph drives every single answer. No response is a simple lookup. Every answer traces a multi-hop relationship path and explains the compliance reasoning. The agent never returns a fact without the graph path that produced it.
2. Live MCP integration with Claude.ai. The agent is published as an MCP server and connected directly to Claude.ai. This is not just a demo inside Aura console β it works as a real tool inside Claude conversations right now.
3. Real-world enterprise domain. Financial regulatory compliance is a $50B/year industry problem. This agent addresses the exact compliance gap analysis and systemic risk assessment work that banks do manually today.
4. All 3 tool types used strategically. Cypher Templates for precise multi-hop traversals (6 templates), Similarity Search for semantic incident matching with 3072-dim Gemini embeddings, Text2Cypher as true last-resort fallback β exactly per Neo4j best practice.
5. Regulatory evolution tracking. The SUPERSEDES relationship between Basel IV and Basel III rules is unique to this submission β it models how regulations evolve over time, enabling queries like "which of our controls were designed for Basel III rules that Basel IV has now superseded?"
6. Explainability built in. Every answer states which rule requires which control, which regulator enforces which regulation, and which relationship path produced the answer β making the reasoning auditable and transparent.
Dhiraj Patra
AI, Agentic Workflows Lead & Architect | 10+ Years Driving Enterprise AI Innovation
Senior Engineering Lead | AI/ML Architect | Technical Architect EY β embedded within BNY Mellon Enterprise Risk Technology
28+ years experience | Patent holder in EDGE AI
What it does: CineGraph AI is a cinematic intelligence agent that transforms 100,000 movies into a reasoning engine. It performs multi-hop graph traversals to explain why certain creators and genres dominate the industry, moving beyond simple keyword search to true relational analysis.
Dataset and why a graph fits: I used the Global Movies Dataset (1950β2026), which includes 100,000 movies and over 400,000 relationships. Movies are inherently connected by Directors, Actors, Genres, and Eras. A graph is the perfect fit because it captures the causality chainsβthe ripple effect of a successful collaboration or a genre trendβthat a flat table simply cannot represent.
Scale: 115,000+ nodes, 400,000+ relationships.
Technology Stack
Database: Neo4j Aura (Knowledge Graph)
Reasoning Engine: Neo4j Aura Agent (Powered by LLM with multi-hop graph traversal)
Data Pipeline: Python with neo4j driver and pandas