Explore a Neo4j Graph of football (soccer) transfers in the Summer 2019 Window
The Football Transfer window is currently open, which means players are moving around the globe for increasingly exorbitant fees.
Transfer Markt — The best place to keep track of transfersThe best place to keep track of what’s going on is the Latest Transfers page of the transfermarkt website, which captures details of the players, clubs, and fees involved.
Irfan and I were trying to work out where the money was flowing based on the transfers, so we decided to create a Neo4j Graph to help us out.
The scraping code to get the data from the transfermarkt website is in the mneedham/football-transfers repository. The transfers.json file contains JSON documents for all the transfers that happened since June 2019.
Loading the data
One line of the transfers.json file looks like this:
{
"season":"2019/2020",
"player":{
"href":"/antoine-griezmann/profil/spieler/125781",
"name":"Antoine Griezmann",
"position":"Centre-Forward",
"age":"28",
"image":"https://tmssl.akamaized.net//images/portrait/medium/125781-1533626871.jpg?lm=1533626889",
"nationality":"France"
},
"from":{
"href":"/atletico-madrid/startseite/verein/13",
"name":"Atl\u00e9tico Madrid",
"country":"Spain",
"league":"LaLiga",
"leagueHref":"/primera-division/transfers/wettbewerb/ES1",
"image":"https://tmssl.akamaized.net//images/wappen/tiny/13.png?lm=1519120744"
},
"to":{
"href":"/fc-barcelona/startseite/verein/131",
"name":"FC Barcelona",
"country":"Spain",
"league":"LaLiga",
"leagueHref":"/primera-division/transfers/wettbewerb/ES1",
"image":"https://tmssl.akamaized.net//images/wappen/tiny/131.png?lm=1406739548"
},
"transfer":{
"href":"/jumplist/transfers/spieler/125781/transfer_id/2552096",
"value":"\u00a3108.00m",
"timestamp":1563058800
}
}
We have players, from and to clubs, and the transfer itself. We’ll import that data into the following graph model:
Football Transfers Graph ModelWe can execute the following query, that uses APOC’s Load JSON procedure, to create players, transfers, leagues, and clubs:
CALL apoc.load.json("https://github.com/mneedham/football-transfers/raw/master/data/transfers.json")
YIELD value
WITH value, apoc.text.replace(value.transfer.value, "£", "") AS transferValue
WHERE transferValue <> "?" AND transferValue <> "-"
MERGE (p:Player {id: value.player.href})
SET p.name = value.player.name
MERGE (from:Club {id: value.from.href})
SET from.name = value.from.name
FOREACH(ignoreMe IN CASE WHEN value.from.leagueHref = "" THEN [] ELSE [1] END |
MERGE (fromLeague:League {id: value.from.leagueHref})
SET fromLeague.name = value.from.league
MERGE (from)-[:IN_LEAGUE]->(fromLeague)
)
MERGE (to:Club {id: value.to.href})
SET to.name = value.to.name
FOREACH(ignoreMe IN CASE WHEN value.to.leagueHref = "" THEN [] ELSE [1] END |
MERGE (toLeague:League {id: value.to.leagueHref})
SET toLeague.name = value.to.league
MERGE (to)-[:IN_LEAGUE]->(toLeague)
)
MERGE (t:Transfer {id: value.transfer.href})
SET t.value = CASE
WHEN transferValue contains "k"
THEN toFloat(apoc.text.replace(transferValue, "k", ""))
* 1000
WHEN transferValue contains "m"
THEN toFloat(apoc.text.replace(transferValue, "m", ""))
* 1000000
ELSE 0.0 END
SET t.date = date(datetime({epochseconds:value.transfer.timestamp}))
MERGE (t)-[:OF_PLAYER]->(p)
MERGE (t)-[:FROM_CLUB]->(from)
MERGE (t)-[:TO_CLUB]->(to);
After we’ve done that we’ll run the following query to create a relationship from a league to the country that it belongs to:
CALL apoc.load.json("https://github.com/mneedham/football-transfers/raw/master/data/leagues.json")Graph of the top 10 transfers
YIELD value
MATCH (l:League {id: value.league})
MERGE (c:Country {name: value.country})
MERGE (l)-[:IN_COUNTRY]->(c);
Now we’ve ready to query the graph.
Top 10 transfers
The following query finds the top 10 transfers by value, and returns the player and clubs involved:
MATCH (t:Transfer)-[:OF_PLAYER]->(player),Top 10 Transfers Atlético Madrid
(from)<-[:FROM_CLUB]-(t)-[:TO_CLUB]->(to)
RETURN player.name, from.name, to.name,
apoc.number.format(t.value) AS price
ORDER BY t.value DESC
LIMIT 10
Atlético Madrid and Real Madrid appear on several of these transfers.
We can aggregate the queries involving these teams to see how much money has been involved in their transfers.
Money In, Money Out
The following query finds the clubs that have spent and received the most money during the transfer window:
MATCH (club:Club)Money In and Money Out
WITH club,
apoc.coll.sumLongs(
[(club)<-[:FROM_CLUB]-(t) | t.value]) AS moneyIn,
apoc.coll.sumLongs(
[(club)<-[:TO_CLUB]-(t) | t.value]) AS moneyOut
RETURN club.name,
apoc.number.format(moneyIn) AS in,
apoc.number.format(moneyOut) AS out
ORDER BY moneyIn + moneyOut DESC
LIMIT 10
As we guessed, the Madrid clubs are at the top of the list. Surprisingly there aren’t any English clubs in the top 10.
What about if we only look at the money spent on transfers?
Money Out
The following query finds the clubs that have spent and received the most money during the transfer window:
MATCH (club:Club)-[*2]->(country:Country)Money spent on transfers
WITH club, country,
apoc.coll.sumLongs(
[(club)<-[:TO_CLUB]-(t) | t.value]) AS moneyOut
RETURN club.name, country.name, apoc.number.format(moneyOut) AS out
ORDER BY moneyOut DESC
LIMIT 10
The Spanish teams still dominate the top 3 positions, but interestingly Aston Villa have spent the most money of the English teams so far. Presumably that will change by the end of the summer.
Next let’s go a level up, and see which countries money is flowing between.
Money Flow by Country
Money FlowThe following query finds the total fees spent moving players from teams in one country to another, excluding transfers between clubs that play in the same country.
We also return the most expensive transfer between those countries:
MATCH (t:Transfer)-[:OF_PLAYER]->(player),Money flow between countries
(fromCountry)<-[:IN_COUNTRY]-(fromLeague),
(fromLeague)<-[:IN_LEAGUE]-(from)<-[:FROM_CLUB]-(t),
(t)-[:TO_CLUB]->(to)-[:IN_LEAGUE]->(toLeague),
(toLeague)-[:IN_COUNTRY]->(toCountry)
WITH *
ORDER BY fromLeague, toLeague, t.value DESC
WITH fromLeague, toLeague, sum(t.value) AS totalFees,
fromCountry, toCountry,
collect({player: player.name, fee: t.value}) AS transfers
WHERE fromCountry <> toCountry
RETURN fromCountry.name, toCountry.name,
apoc.number.format(totalFees) AS total,
transfers[0].player AS player,
apoc.number.format(transfers[0].fee) AS fee,
size(transfers) AS numberOfTransfers
ORDER By totalFees DESC
LIMIT 10
The most money has transferred from Portugal to Spain, although this is a bit skewed by the transfer of João Felix, which accounts for almost 60% of the money flow.
Next Steps
We hope you enjoy the dataset and if you have any questions or suggestions on what we should do next let us know in the comments or send us an email to devrel@neo4j.com.
Football Transfers Graph was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.