Import CSV hanging

pa0lo · June 27, 2019, 6:37pm

The cypher below is hanging. The CSV has 180K rows

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///connectedEntity_00000.csv' AS row
MERGE(con:Connection {uen:row.UEN, name:row.SHORT_NM})
MERGE(e:Entity {id:row.ID, uen:row.UEN_1, legalName:row.LEGAL_NM, is_primary:row.PRIMARY_IND})
MERGE(c:Country {cty_code:row.CUST_CTRY_RISK})
MERGE(p:Product {prod_name:row.PRODUCT, auth_amt:row.AUTHAMT, ccy:row.CCY})
MERGE(i:Industry {ind_name:row.SIC_CA})
MERGE(e)-[:BELONGS_TO {conn_type:row.CONN_TYPE}]->(con)
MERGE (e)-[:HAS_PRODUCT]->(p)
MERGE (e)-[:IS_IN_INDUSTRY]->(i)
MERGE (e)-[:RISK_CTY]->(c)

andrew_bowman · June 27, 2019, 6:53pm

Which version of Neo4j are you on?

Can you provide the query plan from an EXPLAIN of the query (please expand all elements of the plan using the double-down arrow in the lower-right corner of the result pane first)?

Also please give us the output of :schema. My initial guess is that you don't have the right indexes in place to support quick lookup, so you are likely doing up to 5 different label scans for each of those 180k rows, which would explain the slowdown.

pa0lo · June 27, 2019, 7:03pm

Thanks Andrew. Here is a screenshot of the explain result

I haven't created any indexes. Should I create those before running the load or in the same statement?

andrew_bowman · June 27, 2019, 7:09pm

You will need to create those before your load (and figure out if it makes sense to create indexes alone or unique constraints (which include an index), depending on what makes sense in your data model).

A MERGE is like a MATCH followed by a CREATE (provided the match failed), so these include lookups by default, and without indexes you were performing 5 label scans across the 5 labels of the MERGEd nodes, and that was happening per row of your 180k rows, so 900k total label scans, each row getting a bit more costly as nodes were added to the associated labels.

If we add the appropriate indexes or unique constraints you'll be executing 5 index lookups (hopefully unique) per row, much more efficient.

Also when you MERGE you should only merge on the properties associated with the indexes you want to use. So if we assume that `id is unique to :Entity nodes, then you might have a unique constraint on :Entity(id), so we only want to MERGE on that and use ON CREATE SET for the remaining properties:

...
MERGE(e:Entity {id:row.ID})
ON CREATE SET uen = row.UEN_1, legalName = row.LEGAL_NM, is_primary = row.PRIMARY_IND
...

pa0lo · June 27, 2019, 7:11pm

Got it! I'll try it that way. Thanks for the quick response

Topic		Replies	Views
CSV import issue Import / Export	26	711	June 21, 2023
hello guys, I have a problem when I want to load this query, it takes a very long time, up to 2 hours, does anyone know why? Cypher performance , cypher , operations , import	2	90	June 6, 2024
CSV import hanging server Import / Export cypher	4	1307	January 7, 2019
Merge of few thousand relations is very slow Cypher	4	4351	October 11, 2018
Neo4j Data Import Slowness Import / Export	3	322	August 19, 2020

Get Certified in June!

Import CSV hanging

Related topics