Explore the data behind the Women’s World Cup with our World Cup Graph.
On Friday Women’s World Cup 2019 started, and over the weekend we’ve dusted off our World Cup scraping scripts and created a Women’s World Cup Graph.
Image courtesy of https://www.behance.net/gallery/79721663/FIFA-Womens-world-cup-posterTL;DR
We have a hosted version of a World Cup Graph on a Neo4j Cloud instance at 5d37db5a.databases.neo4j.io. You can login with the username worldcup and password worldcup. Once you’ve logged in run :play womens-worldcup-queries
Graph Model
We’ve made some tweaks to the graph model that we used for Men’s World Cup 2018, so let’s have a look at our new and improved model.
World Cup Graph ModelLet’s start with the Tournament node. We have one Tournament node for each World Cup tournament, so there are 8 of these nodes, one for each of the tournaments from 1991 to 2019. Teams participate in these tournaments, so we create that relationship between Team and Tournament nodes.
Squads are NAMED by Teams FOR each of these tournaments, and a Person can either be in the squad or the coach for that squad.
After exploring the data, I realised that people can only ever play for one team, so we have a REPRESENTS relationships between the Person and Team nodes.
The relationship between Person and Matches has been simplified from the previous model. We’ve removed the concept of Appearance, and now have direct relationships from Player to Match. The PLAYED_IN relationship is used both for players who start a match, and those who come on as a substitute.
And finally, Teams play in Matches. We capture the result of the match on the PLAYED_IN relationships.
Show me the data!
We have a Neo4j browser guide that you can use to import the data into your own, local Neo4j instance if you want to play along.
:play womens-worldcup
We’ve also got a hosted version of a World Cup Graph on a Neo4j Cloud instance at 5d37db5a.databases.neo4j.io. You can login with the username worldcup and password worldcup.
If you use that one you don’t need to bother with the data import and can start straight with the queries by running the following guide:
:play womens-worldcup-queries
Let’s have a look at some of the queries that we can run against this dataset.
Which teams have played in every World Cup?
MATCH (tournament:Tournament), (team:Team)Teams that played in every World Cup
WITH team, collect(tournament) AS tournaments
WHERE all(t in tournaments WHERE (team)-[:PARTICIPATED_IN]->(t))
RETURN [(team)-[:PARTICIPATED_IN]->()]
Who won the previous World Cups?
MATCH (t1:Team)-[p1:PLAYED_IN]-(m:Match)<-[p2:PLAYED_IN]-(t2:Team),World Cup Winners
(m)-[:IN_TOURNAMENT]->(tourn)
WHERE id(t1) < id(t2) AND m.stage = "Final"
RETURN tourn.name AS name, tourn.year AS year,
t1.name AS team1, t2.name AS team2,
CASE WHEN p1.score = p2.score
THEN p1.score + "-" + p2.score + " (" +
p1.penaltyScore + "-" + p2.penaltyScore + ")"
ELSE p1.score + "-" + p2.score
END AS result,
(CASE WHEN p1.score > p2.score THEN t1
WHEN p2.score > p1.score THEN t2
ELSE
CASE WHEN p1.penaltyScore > p2.penaltyScore THEN t1
ELSE t2 END END).name AS winner
ORDER BY tourn.year
Who are the top scorers across all the World Cups?
MATCH (p:Person)-[:SCORED_GOAL]->(match)-[:IN_TOURNAMENT]->(tourn),Top Scorers across all World Cups
(p)-[:REPRESENTS]->(team)
RETURN p.name, team.name AS team, count(*) AS goals,
apoc.coll.sort(collect(DISTINCT tourn.year)) AS years
ORDER BY goals DESC
LIMIT 10
Who’s the top scorer playing in the 2019 World Cup?
MATCH (p:Person)-[:SCORED_GOAL]->(match)-[:IN_TOURNAMENT]->(tourn),Top scorers playing in World Cup 2019 Top scorers playing in World Cup 2019
(p)-[:REPRESENTS]->(team)
WITH p, team, count(*) AS goals,
apoc.coll.sort(collect(DISTINCT tourn.year)) AS years
WHERE (p)-[:IN_SQUAD]->()-[:FOR]->(:Tournament {year: 2019})
RETURN p.name, team.name AS team, goals
ORDER BY goals DESC
LIMIT 10
Next Steps
We hope you enjoy the dataset and if you have any questions or suggestions on what we should do next let us know in the comments or send us an email to devrel@neo4j.com.
We encourage you to take the data and either build your own APIs or applications or analysis notebooks on top of it. We’d love to hear all about your ideas
Now Available: Women’s World Cup 2019 Graph was originally published in neo4j on Medium, where people are continuing the conversation by highlighting and responding to this story.