cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Requesting suggestions to load Panama Papers dataset intro Neo4j Desktop

Kevin6482
Node Clone

I have downloaded the Panama Papers from ICIJ website and tried to import the csv files in Neo4j Desktop using below queries, I set the heap size to 10G. All nodes were created successfully, however creating relationships was taking very long time, it was running for more than 1 hour and still running. Is there a better way to load the dataset?

Addresses:
:auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///ppa.csv' AS line CREATE (:Addresses { address: line.address, icij_id: line.icij_id, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, node_id: toInteger(line.node_id), sourceID: line.sourceID})

Intermediaries:
:auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///ppi.csv' AS line CREATE (:Intermediaries { name: line.name, internal_id: line.internal_id, address: line.address, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, status: line.status, node_id: toInteger(line.node_id), sourceID: line.sourceID})

Officers:
:auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///ppo.csv' AS line CREATE (:Officers { name: line.name, icij_id: line.icij_id, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, node_id: toInteger(line.node_id), sourceID: line.sourceID})

Entities:
:auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///ppe.csv' AS line CREATE (:Entities { name: line.name, original_name: line.original_name, former_name: line.former_name, jurisdiction: line.jurisdiction, jurisdiction_description: line.jurisdiction_description, company_type: line.company_type, address: line.address, internal_id: line.internal_id, incorporation_date: line.incorporation_date, inactivation_date: line.inactivation_date, struck_off_date: line.struck_off_date, dorm_date: line.dorm_date, status: line.status, service_provider: line.service_provider, ibcRUC: toInteger(line.ibcRUC) , country_codes: line.country_codes, countries: line.countries, note: line.note, valid_until: line.valid_until, node_id: toInteger(line.node_id), sourceID: line.sourceID})

Relationships/Edges:

:auto USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM 'file:///pp_edg.csv' AS csvLine
MATCH (n1 { id: toInteger(csvLine.node_1)}),(n2 { id: toInteger(csvLine.node_2)})
CREATE(n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)

2 REPLIES 2

ameyasoft
Graph Maven
Per your node ingestion queries, there is no property with name 'id'. The only property I see is 'node_id'. Try this:

:auto USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM 'file:///pp_edg.csv' AS csvLine
MATCH (n1 { node_id: toInteger(csvLine.node_1)})
MATCH (n2 { node_id: toInteger(csvLine.node_2)})
CREATE(n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)

Thanks for responding. Actually even changing the property was taking more than an hour, so I went with other alternative method to import data.

I imported the data with the import command of neo4j-admin and loaded the graph model successfully.

bin/neo4j-admin import --database panamapapers --nodes H_ppa.csv,ppa.csv --nodes H_ppe.csv,ppe.csv --nodes H_ppi.csv,ppi.csv --nodes H_ppo.csv,ppo.csv --relationships H_pp_edg.csv,pp_edg.csv --trim-strings=true > import.out

IMPORT DONE in 6s 161ms.
Imported:
559600 nodes
674102 relationships
7096599 properties
Peak memory usage: 1.037GiB