Movie Graph Agent β Neo4j Aura Hackathon
Project: Movie Graph Agent Tech: Neo4j Aura Free Β· TMDB Dataset Β· Neo4j Built-in Agent Β· Python
Ever wanted to just ask a database a question and get a proper answer back? That's exactly what this project does β but with a knowledge graph powering it under the hood.
I built a Movie Graph Agent on Neo4j Aura using the TMDB dataset. The idea was simple: load real movie data into a graph database, then put a conversational agent on top of it so anyone can query it in plain English β no Cypher knowledge required.
What I built:
500 movies from the TMDB dataset were loaded into Neo4j Aura Free using a Python script (load_movies.py). Each movie is connected to its directors, cast, genres, and ratings as graph nodes and relationships β because movie data isn't tabular, it's a network. Who directed whom, who acted with whom, which genres overlap β all of this lives naturally in a graph.
On top of that graph, I set up a Neo4j Aura built-in agent β no LangChain, no external orchestration framework. Just Neo4j's native agent feature with a custom prompt instruction that makes it behave like a knowledgeable movie expert. It reads the graph, reasons over it, and answers in plain conversational English.
Sample queries it handles:
- "What movies did Christopher Nolan direct?"
- "Show me all Action movies"
- "Which actors appear most frequently in Drama films?"
Why graph over a regular database? Relational databases struggle when you start asking connected questions β "Find me thriller movies featuring actors who also worked with Nolan" becomes a multi-join query mess. In Neo4j, that's just a pattern match. The graph makes the agent smarter because the data structure itself encodes relationships.
Everything runs on Neo4j Aura Free β no server setup, no cost, fully cloud-hosted. The agent is deployed as an Internal instance on the Aura project, previewed and tested directly from the Aura console.
This was a fun exploration of how much you can do with just a graph + a built-in agent, without reaching for heavy ML infrastructure.
Data Loading Code (Python)
Connected to Neo4j Aura and loaded 500 movies from TMDB dataset. Each movie is linked to its genres, actors, directors, and keywords as separate nodes with relationships.
Graph schema:
(:Movie)-[:HAS_GENRE]->(:Genre)
(:Actor)-[:ACTED_IN]->(:Movie)
(:Movie)-[:DIRECTED_BY]->(:Director)
(:Movie)-[:HAS_KEYWORD]->(:Keyword)
load_movies.py
python
driver = GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD))
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')
credits.columns = ['id', 'title', 'cast', 'crew']
df = movies.merge(credits[['id', 'cast', 'crew']], on='id').head(500)
def load_data(tx, row):
genres = [g['name'] for g in safe_parse(row['genres'])]
keywords = [k['name'] for k in safe_parse(row['keywords'])][:5]
cast = [c['name'] for c in safe_parse(row['cast'])][:5]
directors = [c['name'] for c in safe_parse(row['crew']) if c.get('job') == 'Director']
tx.run("""
MERGE (m:Movie {id: $id})
SET m.title=$title, m.overview=$overview, m.rating=$rating, m.year=$year
""", ...)
# Genre, Actor, Director, Keyword nodes + relationships created via MERGE


