I'm trying to create relationships between two groups of nodes. The first group is Listing and it has 10 million nodes, each of which is supposed to be connected to a unique Picture node. There are 50 million Pictures nodes (5 pictures for each listing).
First I loaded listings csv and created the 10 million Listing nodes. Then I wrapped my next query around in 'apoc.periodic.iterate.' As it loads csv to create the picture nodes, it finds the listing node that it should be connected to and creates that relationship. The heap memory runs out after 30k relationships are created with a batch size of 10k.
Any help would be much appreciated. I'm super new to neo4j and would love to learn anything I can!
My query to load listings and create Listing nodes
CREATE CONSTRAINT ON (listing:Listing) ASSERT listing.id IS UNIQUE
And changed parrallel to false
CALL apoc.periodic.iterate("
CALL apoc.load.csv('file:///pictures.csv',{
mapping:{
id: {type:'int'},
listing: {type:'int'}
}
}) YIELD map as row RETURN row
","
CREATE (p:Picture) SET p = row
WITH p
MATCH (l:Listing)
WHERE p.listing = l.id
CREATE (p)-[:PICTURE_OF]->(l)
", {batchSize:10000, parallel:false, iterateList:true});
It created 760k relationships and Picture nodes this time but still ran out of heap memory. I'm confused why it would run out of memory since the "apoc.periodic.iterate" should be executed to each specific batch?
Thank you for the suggestions! I'm using Neo4j desktop 1.2.4 running 4.0.0. The APOC version is 4.0.0.3. I prefixed my outer query with CYPHER runtime=INTERPRETED and it capped out at 625k. I also tried CYPHER runtime=SLOTTED and it capped out at 710k.
Yes I ran out of heap at this point (sorry for using an inaccurate word). I will raise an issue on the APOC github page. Thank you again for your help!
Thank you for pointing it out! I created the constraint because I read this in the documentation (3.5 Defining a schema)
Adding the unique constraint will implicitly add an index on that property. If the constraint is dropped, but the index is still needed, the index will have to be created explicitly.
I wanted to make sure each of my Listing node has a unique id and is indexed to improve performance. Would you say it's better than I just create the index but not adding the unique constraint?
Thank you again for the help!
that's great . Some how i missed this part while going through the documentation . Thanks for sharing.
No , its better to have constraint when you want to make you are creating only one node per id.