I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:
with session.begin_transaction() as tx:
cypher_query = 'UNWIND $batch as row ' \
'CREATE (n:Node) ' \
'SET n += row'
tx.run(cypher_query, batch=batch)
The graph presents 4M relationships, and I am trying to create them in the following way:
with session.begin_transaction() as tx:
cypher_query = 'UNWIND $batch as row ' \
'MATCH (head:Node) WHERE head.id = row.head_id ' \
'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
'SET rel += row.properties'
tx.run(cypher_query, batch=batch)
The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?
Hi @cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.
In sake of curiosity, how would you use UNIQUE CONSTRAINTS?
Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion
Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?
The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read You must always use unique constraint if you want your queries to go quickly