Very slow relationship creation (UNWIND)

flpgrz · July 16, 2020, 1:24pm

Hello everyone!

I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'CREATE (n:Node) ' \
    'SET n += row'
    tx.run(cypher_query, batch=batch)

The graph presents 4M relationships, and I am trying to create them in the following way:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'MATCH (head:Node) WHERE head.id = row.head_id ' \
    'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
    'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
    'SET rel += row.properties'
    tx.run(cypher_query, batch=batch)

The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?

Cobra · July 16, 2020, 1:41pm

Hello @flpgrz

Did you use UNIQUE CONSTRAINTS on id?

Regards,
Cobra

flpgrz · July 16, 2020, 2:02pm

Hi @Cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.

In sake of curiosity, how would you use UNIQUE CONSTRAINTS?

Thanks,
Filippo

Cobra · July 16, 2020, 2:03pm

Yeah I mean your id, not the Neo4j one

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything

flpgrz · July 16, 2020, 2:06pm

Ok, thanks, @Cobra, I will try.
I should run that command before I create the nodes, right?

Cobra · July 16, 2020, 2:07pm

Normally yes, but you can do it after, you just need to wait a bit

flpgrz · July 16, 2020, 3:48pm

Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion

Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?

Cobra · July 16, 2020, 3:51pm

The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read You must always use unique constraint if you want your queries to go quickly

hazardousmonk · May 29, 2021, 3:02pm

This is awesome - setting the index literally made a hundred fold improvement in query times for a very long UNWIND clause inserting data

Topic		Replies	Views
Creating relationships in a not so big graph takes very long time Neo4j Graph Platform migrated	1	213	November 25, 2022
Too slow adding relationships to 1.8 billion nodes with CYPHER Cypher	15	4173	July 20, 2021
Less efficiency while performing create relation with neo4j from python Operations relationship	15	535	October 27, 2021
Using Unwind and Dumping Data in neo4j - Query Optimization Cypher apoc , performance , cypher	0	519	July 9, 2020
Create relationship too slow for existing nodes using cypher-shell Import / Export performance , cypher , relationship , import	0	250	July 21, 2021

Get Certified in June!

Very slow relationship creation (UNWIND)

Related topics