Optimizing the writing of large amounts of data in neo4j with apoc Parquet, periodic iterate

Hi,
I need to import hundreds of millions of nodes and relationships into my database while taking care of duplicate.
I created indexes for most of my nodes.
I'm using Using neo4j 5.13 and apoc

This is what look like my cypher query

            CALL apoc.periodic.iterate('
            CALL apoc.load.parquet("file:///my_file.parquet") YIELD value RETURN value
        ','
            MERGE (n1:Label1 {property: value.property1})
            ON CREATE SET n.property2 = value.property2

            FOREACH (_ IN CASE WHEN value.info1 IS NOT NULL AND value.info1 <> \\'\\' THEN [1] ELSE [] END |
                MERGE (n2:Label2 {info: value.info1})
                MERGE (n)-[:HAS_INFO]->(n2)
            )
            
            FOREACH (_ IN CASE WHEN value.info2 IS NOT NULL AND value.info2 <> \\'\\' THEN [1] ELSE [] END |
                MERGE (n3:Label3 {info: value.info2})
                MERGE (n)-[:HAS_INFO]->(n3)
            )

            WITH value.unknown_values AS uvalues, n
            UNWIND uvalues AS uvalue
            MERGE (n4:Label4{info:uvalue})
            MERGE (n)-[:HAS_UNKNOWN_INFO]->(n4)

            ',{batchSize: 10000, parallel: true}
            )

I'd like to know if my way of doing things seems to you to be optimized or not.

Thanks

It does look typical. Two comments though. I don’t see where the variable β€˜n’ that is referenced throughout the code is defined. The first match binds its result to β€˜n1’. Should β€˜n1’ be β€˜n’ instead?

Second, sometimes running the batches in parallel becomes problematic when the code creates relationships. This is because there could be blocking due to the end nodes in a relationship needing to be locked in order to create the relationship. This would occur if the data is creating multiple relationships to the same node. Just a heads up.

Thanks for your answer.
Indeed I made a mistake, 'n' should be 'n1'
Regarding lock between batches i have some issue, I am using the 'retries' parameter of apoc.periodic.iterate that i forgot to add in the example.

1 Like