Merge of few thousand relations is very slow

norbert · October 10, 2018, 4:25am

Dear all,

I try to get better performance with csv import/merge of relations. I have a database created with neo4j-import from CSV files, and on a daily basis there will be some updates to the CSV files.

I can easily import the changes using merge with

LOAD CSV WITH HEADERS FROM "file:///some.csv" AS row
  MERGE (c:nType { uuid: row.uuid, name: row.name, revision: toInt(row.revision) }) ;

which also works pretty fast, even for CSV files with 160000 entries but only some are changed.

But when I try to do the same with relations matching onto the uuid part:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///tlpdb/48871/out/edge-contains.csv" AS row
  MATCH (p { uuid: row.`:START_ID`}), (q { uuid: row.`:END_ID` } )
  MERGE (p)-[r:contains]->(q) ;

then it takes about 11min for a simple csv file with 3491 new relations:

0 rows available after 708392 ms, consumed after another 0 ms
Created 3491 relationships

I didn't start with the csv file containing 164690 lines (but most are already present).

I have created an index (and it is online) on uuid as well as (name, revision).

This is with Neo4j 3.4.8 running on Debian/sid.

Do I need to set up another index? The uuids are unique across all nodes.

Thanks for any suggestion

Norbert

andrew_bowman · October 10, 2018, 4:55am

Indexes are only used when both the label and property that are indexed are present in your match pattern.

This:
MATCH (p { uuid: row.:START_ID}), (q { uuid: row.:END_ID} )
doesn't have labels present on either of these, so an index won't be used. It's instead doing an all nodes scan for both, and accessing the properties of all the nodes in your db twice to fulfill this single match.

Add in the label for the index, and double-check by running an EXPLAIN of your query plan.

norbert · October 10, 2018, 5:30am

Thanks, indeed. But then if p and q label (node type) can be in a strict subset of all node types? I have 5 node types: p:Package, p:Collection, p:Scheme, ... and I want to restrict the search to say only Package and Collection.

I found a faster way but it collapses all the node types into one, and distinguishes them via attributes. That way the index runs over all possible nodes and the merge is very fast.

Is there another way to speed this up without collapsing node types and searching across node types? That is, emulating something like and index over mutliple node types (labels)?

Thanks

michael.hunger · October 10, 2018, 9:14pm

You can use a higher level label (you can use multiple labels for nodes), e.g. :Component

norbert · October 11, 2018, 12:56am

Thanks, yes that is what I am going for (thus my other question about adding multiple labels).

Topic		Replies	Views
How Can speed up Neo4j Importing CSV FILE General migrated	4	280	October 28, 2022
Performance issue when importing CSV relationships Import / Export performance , import , csv , index	2	2083	January 28, 2019
Neo4j Data Import Slowness Import / Export	3	322	August 19, 2020
Importing relationships from multiple csv file Import / Export performance , load-csv	12	3195	June 5, 2020
CSV import issue Import / Export	26	711	June 21, 2023

Get Certified in June!

Merge of few thousand relations is very slow

Related topics