cancel
Showing results for 
Search instead for 
Did you mean: 

Digging Into the ICIJ Pandora Papers Dataset with Neo4j

greta
Graph Fellow

Yesterday, the Pulitzer Award winning International Consortium of Investigative Journalists (ICIJ) published the first data-release of the recent Pandora Papers investigation.

This time the data publication was not split across the different investigations but contains the full offshoreleaks database in one dataset, so you can explore the data of shell companies, law firms, banks, and ultimate owners across all leaks and investigations.

The data model is consistent with previous publications. Officers are related in several ways (directors, shareholders, beneficiaries) with Entities (shell companies). Intermediaries (banks, law firms) manage the creation and operation of those shell companies. And all of them have addresses that can be used for investigations as well.

![](upload://zVJoecS6zjYJOYeiCJymvIVgyz2.png)Offshoreleaks data model

Each node has fields for countries and country codes to associate them with specific geographies and many other pieces of information.

The first data release for the Pandora Papers consists of 26k Officers, 18k Entities, and 1,000 Intermediaries.

![](upload://4mceQXn2UhfbzaQmX5ouqRGEV4D.png)

Installing the Dataset in Neo4j Desktop

The easiest way to get started with the dataset is to Download & Install Neo4j Desktop.

1. Download the “dump” file from the public GitHub repository.
2. Either in the example project or a newly created project you can use “Add File” to add the dump file to your project.

![](upload://kfGWoqdPFCHZj7M9OJVe0iyQFEo.png)

3. Then choose “Create new DBMS from Dump.”

![](upload://fqRQ4761CSx0MulkGWGZOEjS1ic.png)

4. Provide a password.

![](upload://47y3KSVMpOKljjy5opLrYA1sHPR.png)

5. Wait a few seconds until the db is created then hit “Start.”
6. After it’s started, open “Neo4j Browser” on the running database.

![](upload://5uIDXhJcEmYRX7wBP5xqYO7Aeh0.png)

7. Within Neo4j Browser use :play icij-offshoreleaks to launch the interactive guide. (Pin it on top with the pin icon).

Exploring the Data with Neo4j Bloom

Besides exploring the provided guides, you can also pick any of the published stories and look behind the scenes by searching for the people, organizations, and jurisdictions mentioned in the story.

![](upload://a6ezULfdVFntJkc0ZHJ2FMQ6eGq.png)

If you don’t feel comfortable with a query language, you can also use the Neo4j Bloom Visualization Software to explore the data with a more natural language interface and visually.

You can start Neo4j Bloom from the “Graph Apps” sidebar in Neo4j Desktop. It will open for your currently running database.

![](upload://t8iuvXiw1TRFvmanFWEd6i2eFdp.png)Neo4j Bloom with Search Phrase![](upload://cCwrF2yZvY4FVQrAs4jcOFBSk7A.png)Neo4j Bloom Visualiyation

Follow a Story from the “The Landlords” Investigations

As an example of how you can investigate published stories yourself, here is an example of the “South Africa’s smart city” from the “The Landlords” investigations on property ownership by shell companies with unbeknownst owners.

The Land Lords - ICIJ

In “South Africa’s smart city” Ruslan Goryukhin — a key aide to some of Putin’s closet friends — is reported to be connected to companies involved in the development of the first “post apartheid smart city” — Cradle City in South Africa.

Let’s see what we find in our data, first querying for Goryukhin.

MATCH path = (o:Officer)-[r]->(:Entity)
WHERE o.name CONTAINS 'GORYUKHIN' AND o.sourceID STARTS WITH "Pandora Papers"
RETURN path LIMIT 100
![](upload://ek7MADpbmKbNlSWxfNsbfYHextZ.png)

Another person mentioned in the story was “Preston Hampton Haskell IV, the son of a Texas construction billionaire.” Let’s see if we can find him too, and his connections to Goryukhin.

MATCH (o:Officer), (o2:Officer)
WHERE o.name CONTAINS 'GORYUKHIN' AND o.sourceID STARTS WITH "Pandora Papers" AND o2.name contains 'PRESTON HAMPTON HASKELL'
MATCH path=allShortestPaths((o)-[*]-(o2))
RETURN path LIMIT 25
![](upload://45kyKdEiNeeLRC0qOfRBEidnwdl.png)

The report speaks of a shell company named Kelburn One and Amari Land International Ltd — later renamed to Forum Properties Africa — which unfortunately are not in the published dataset.

Programmatic Access

The example repository also comes with code-examples in Python, Java, Javascript, .Net, and Go. There is also a full GraphQL project for the dataset with a schema for the neo4j/graphql integration library that you can use to run and deploy a GraphQL API.

These examples show how to connect to the database and run a query against the data. So if you’re inclined to build a dashboard or app, feel free to use those.

Here is the JavaScript example:

// npm install --save neo4j-driver
// node example.js
const neo4j = require('neo4j-driver');
const driver = neo4j.driver('bolt://<HOST>:<BOLTPORT>',
neo4j.auth.basic('<USERNAME>', '<PASSWORD>'),
{/* encrypted: 'ENCRYPTION_OFF' */});
const query =
`
MATCH (a:Officer {name:$name})-[r:officer_of|intermediary_of|registered_address*..10]-(b)
RETURN b.name as name LIMIT 20
`;
const params = {"name": "Ross, Jr. - Wilbur Louis"};
const session = driver.session({database:"neo4j"});
session.run(query, params)
.then((result) => {
result.records.forEach((record) => {
console.log(record.get('name'));
});
session.close();
driver.close();
})
.catch((error) => {
console.error(error);
});

Load the Database into Neo4j AuraDB

If you want to dive deeper into the dataset you can also use our cloud service Neo4j AuraDB to load the dataset into an AuraDB Pro instance.

  1. Register or Log in at console.neo4j.io
  2. Create an AuraDB Professional instance (a size of 2G should be enough to upload the file).
  3. Save the password!
  4. Go to the “Import Database” and upload the dump file.
  5. “Open” the database, provide your password.
  6. Then run :play icij-offshoreleaks for the interactive browser guide.
  7. From here you can continue to do anything that you want.

Available on the Neo4j Labs Demo Server (read-only)

A read-only database is also available on the neo4j-labs demo server.
Just use “offshoreleaks” as username/password/database.

With :play icij-offshoreleaks you can run the interactive guides there.

Happy investigations, please share in the comments or tag #neo4j on Twitter when you find something interesting.

![|1x1](upload://eVir0IYoCIrKjdncb97BCmPsNCT.gif)

Digging Into the ICIJ Pandora Papers Dataset with Neo4j was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

1 REPLY 1

vialard
Node Clone

Unfortunately the dumps are not available due to exceeded quota...

Downloading data/icij-offshoreleaks-42.dump (471 MB)
Error downloading object: data/icij-offshoreleaks-42.dump (ca331de): 
Smudge error: Error downloading data/icij-offshoreleaks-42.dump (ca331dedbabe722bc4e9ed6121b4b0760146d5a57c018a1089f5d20faa22f46a): 
batch response: This repository is over its data quota. 
Account responsible for LFS bandwidth should purchase more data packs to restore access.