You can read the readme.html file on github to get a full description of what I have done.
I have created a graph using the ICIJ data which they have made available. This data covers the:
Offshore Leaks
Bahamas Leaks
Paradice Papers
Panama Papers
There was a lot of formatting which had to be done to the data before it could be imported into Neo4j using the import command.
Formatting / Transformation
The Addresses in particular ware very disappointing they are just in an address field and this is not good for fine grand data analysis. The changes I made are listed below. In brief they were:
Removing CRLF
Changing ID Fields to Ensure Uniqueness
New Relationship created for Duplicate Officer and Intermediary.
Future Work
There are number of things I am planning to do with this data:
The FinCEN graph will be added do this data.
I have a separate WikiLealk graph which I plan to search for any references to the people or entities in this data and creating a combined graph of the WikiLeaks and ICIJ leaks.
Data analysis will be done using the Noe4j data science libraries to detect any patterns.