My Aura Hackathon Submission: Apple HealthGraph Agent
What it does
HealthGraph Agent turns your Apple Health data into a knowledge graph that reasons about your body. It answers questions no health app can: "Why was my recovery terrible last Thursday?" "What pattern precedes my best sleep?" "How does strength training vs. running affect my HRV differently?"
Apple Health collects 50+ metric types daily, but stores them as disconnected time series. You can see what happened never why. HealthGraph Agent connects the dots by building a graph where workouts, sleep sessions, heart rate variability, resting heart rate, blood oxygen, and activity rings are linked through temporal and causal relationships. The agent then uses multi-hop graph reasoning to surface patterns that are invisible in flat dashboards.
Dataset and why a graph fits
Dataset: Apple Health XML export (or synthetic data for people without an iPhone/Apple Watch). The ETL pipeline parses the raw export.xml using streaming XML processing, transforms ~48,000 records per year into graph-ready structures, and batch-loads into AuraDB.
Why a graph fits this could be the key insight:
Health data is inherently relational. A table can tell you your HRV was 28ms on Thursday. A graph can tell you why:
(Workout:HIIT {duration:45min, energy:540kcal})
-[:ON_DAY]-> (Day:Thursday)
-[:NEXT_DAY]-> (Day:Friday)
-[:HAS_SUMMARY]-> (DailySummary {hrv_mean:28, resting_hr:72})
That HIIT session on Thursday → poor recovery on Friday isn't visible in any time-series chart. But in the graph, it's a two-hop traversal. The agent follows these relationship chains to explain causality, not just correlation.
Graph schema:
(:Person)-[:USES]->(:Device)-[:RECORDED]->(:Workout)
(:Workout)-[:ON_DAY]->(:Day)-[:HAS_SUMMARY]->(:DailySummary)
(:Workout)-[:FOLLOWED_BY {hours_between}]->(:SleepSession)
(:SleepSession)-[:ON_DAY]->(:Day)
(:Day)-[:NEXT_DAY]->(:Day)
(:Day)-[:PART_OF]->(:Week)
The FOLLOWED_BY relationship between workouts and sleep sessions is where the graph gets powerful — it captures temporal causation with the hours_between property, allowing the agent to reason about recovery windows.
Scale: ~700 nodes and ~1,200 relationships for 12 months of data. 366 Day nodes, 366 DailySummary nodes, ~220 Workout nodes, 366 SleepSession nodes, 53 Week nodes, plus Device and MetricType nodes. Small graph, rich reasoning.
Agent tools
| Tool | Type | What it enables |
|---|---|---|
| Weekly overview | Cypher Template | "How was my health last week?" — returns daily metrics across 8 dimensions |
| Workout → sleep impact | Cypher Template | "How does running affect my sleep?" — multi-hop traversal from workout through day to next sleep session and next-day HRV |
| Free-form health queries | Text2Cypher | Any natural language question about health patterns → agent generates Cypher from the graph schema |
| Pattern matching | Similarity Search | "Find days like my best recovery day" — vector similarity on DailySummary text embeddings |
Example conversation
User: "How does strength training affect my sleep compared to running?"
Agent reasoning (multi-hop):
- Matches all Workouts where
activity_type = 'TraditionalStrengthTraining' - Follows
FOLLOWED_BY→ SleepSession to get post-workout sleep duration - Follows
ON_DAY→ Day →NEXT_DAY→ Day →HAS_SUMMARY→ DailySummary for next-day HRV - Repeats for
activity_type = 'Running' - Compares the two distributions
Agent response: After strength training sessions (avg 50 min), you sleep 7.4 hours on average with next-day HRV of 38ms. After running sessions (avg 35 min), sleep drops to 6.9 hours but next-day HRV recovers to 44ms. Your body recovers faster from cardio than resistance training — consider scheduling strength sessions earlier in the week when you can afford a slower recovery.
This answer requires 4 hops through the graph. No flat database can produce it.
What makes this different
1. The graph drives the insight, not just stores data. Every answer traces a relationship path. "Your HRV dropped because..." always cites the specific workout → sleep → recovery chain that explains it.
2. Anyone can use it. Don't have an Apple Watch? The repo includes a synthetic data generator with 4 persona profiles (athlete, biohacker, sedentary, default) that produce 12 months of realistic, correlated health data. One command: GENERATE=1 PERSONA=biohacker bash scripts/run_pipeline.sh
3. The agent explains its reasoning. When the agent says "your best recovery days follow yoga sessions," it shows the Cypher path it traversed and the specific nodes it visited. The reasoning tab makes the graph traversal transparent.
4. It's a real tool for a real community. The quantified self and biohacking community (millions of Apple Watch users tracking their health) currently has no way to query cross-metric correlations. This agent fills that gap.
