How can I import a large excel file in neo4j?


(Mehdi Ajroud) #1

I want to import a large excel file in neo4j (144Mo) and when I convert it to excel it's around (590Mo) . I used this query to import it :

LOAD CSV WITH HEADERS FROM 'file:///Contrats2018.csv' AS Contracts FIELDTERMINATOR ';'
CREATE (c:Contrats {
id: Contracts.contract_id, 
complete_object: Contracts.contract_complete_object ,
object: Contracts.contract_object,
tranche : Contracts.contract_conditional_tranche,
description: Contracts.contract_description, 
duration: Contracts.contract_duration,
exec_dep_code: Contracts.contract_execution_department_code,
exec_geo_city: Contracts.contract_execution_geo_city,
floor_area: Contracts.contract_floor_area,
firm_trance: Contracts.contract_firm_tranche,
housing_code: Contracts.contract_housing_count,
site_visit: Contracts.contract_mandatory_site_visit,
notice_first_post: Contracts.contract_notice_first_publication,
posting: Contracts.contract_posting,
progress: Contracts.contract_progress,
response: Contracts.contract_response,
social_criteria: Contracts.contract_social_criteria,
state_intitule: Contracts.contract_state_intitule,
time_frame_duration_type: Contracts.contract_time_frame_duration_type,
time_frame_end: Contracts.contract_time_frame_end,
time_frame_start: Contracts.contract_time_frame_start,
parts: Contracts.contract_with_parts,
variant: Contracts.contract_with_variant,
type: Contracts.TYPE,
CPV_main_code_court: Contracts.contract_CPV_main_code_court,
intitule_CPV_court: Contracts.contract_intitule_CPV_court,
estimated_amount_single_value: Contracts.contract_estimated_amount_single_value
})

after waiting 15 min , neo4j crashes.
Knowing that I am using Windows 10 , 64 bits , 4,00 Go RAM . I am running Neo4j Browser version: 3.2.5

Anyone could help please ?


(Christophe Willemsen) #2

You can try to prepend the query with a PERIODIC COMMIT :

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS ...

Reference : https://neo4j.com/docs/developer-manual/current/cypher/clauses/load-csv/#load-csv-importing-large-amounts-of-data


(Mehdi Ajroud) #3

Well I just used 10000 and it crashes ! I will try to increase it


(Paul Thomas) #4

Try lowering it, the more often you commit, the less memory needed ...


(Mehdi Ajroud) #5

Shall I try 1000 then ?


(Christophe Willemsen) #6

The number you use for the periodic commit tells after how many lines it has to "commit" the operation, the lower the value is, more often it will write to the db and then use lower memory per single transaction (commit)

You can monitor by opening a second window in the browser and count the number of created nodes every X seconds for example


(Mehdi Ajroud) #7

Thanks Christophe :)
but how can I realize this one ? "count the number of created nodes every X seconds for example"


(Christophe Willemsen) #8

MATCH (c:Contracts) RETURN count(c)

Run that query manually in the neo4j browser and repeat how many times you want


(Paul Thomas) #9

note if you have millions of rows to load a bulk load from the command line which rebuilds the entire graph from scratch will much faster than load csv ...


(Mehdi Ajroud) #10

I am kind of progressing , I used "periodic commit 1000" and after few minutes it displayed this msg :


(Christophe Willemsen) #11

Your CSV is not valid


(Mehdi Ajroud) #12

I just deleted that coumn since I won't need it in my work later and also it contains many spaces between strings .