Very slow relationship creation (UNWIND)

Hello everyone!

I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'CREATE (n:Node) ' \
    'SET n += row'
    tx.run(cypher_query, batch=batch)

The graph presents 4M relationships, and I am trying to create them in the following way:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'MATCH (head:Node) WHERE head.id = row.head_id ' \
    'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
    'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
    'SET rel += row.properties'
    tx.run(cypher_query, batch=batch)

The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?

Hello @flpgrz :slight_smile:

Did you use UNIQUE CONSTRAINTS on id?

Regards,
Cobra

1 Like

Hi @cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.

In sake of curiosity, how would you use UNIQUE CONSTRAINTS?

Thanks,
Filippo

1 Like

Yeah I mean your id, not the Neo4j one :slight_smile:

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything :slight_smile:

1 Like

Ok, thanks, @cobra, I will try.
I should run that command before I create the nodes, right?

Normally yes, but you can do it after, you just need to wait a bit :slight_smile:

1 Like

Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion :slight_smile:

Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?

The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read :slight_smile: You must always use unique constraint if you want your queries to go quickly :slight_smile:

2 Likes

This is awesome - setting the index literally made a hundred fold improvement in query times for a very long UNWIND clause inserting data