cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Very slow relationship creation (UNWIND)

flpgrz
Node Link

Hello everyone!

I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'CREATE (n:Node) ' \
    'SET n += row'
    tx.run(cypher_query, batch=batch)

The graph presents 4M relationships, and I am trying to create them in the following way:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'MATCH (head:Node) WHERE head.id = row.head_id ' \
    'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
    'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
    'SET rel += row.properties'
    tx.run(cypher_query, batch=batch)

The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?

1 ACCEPTED SOLUTION

Yeah I mean your id, not the Neo4j one

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything

View solution in original post

8 REPLIES 8

Cobra
Ninja
Ninja

Hello @flpgrz

Did you use UNIQUE CONSTRAINTS on id?

Regards,
Cobra

Hi @Cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.

In sake of curiosity, how would you use UNIQUE CONSTRAINTS?

Thanks,
Filippo

Yeah I mean your id, not the Neo4j one

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything

Ok, thanks, @Cobra, I will try.
I should run that command before I create the nodes, right?

Cobra
Ninja
Ninja

Normally yes, but you can do it after, you just need to wait a bit

flpgrz
Node Link

Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion

Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?

Cobra
Ninja
Ninja

The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read You must always use unique constraint if you want your queries to go quickly

This is awesome - setting the index literally made a hundred fold improvement in query times for a very long UNWIND clause inserting data