Hi,
I need to import hundreds of millions of nodes and relationships into my database while taking care of duplicate.
I created indexes for most of my nodes.
I'm using Using neo4j 5.13 and apoc
This is what look like my cypher query
CALL apoc.periodic.iterate('
CALL apoc.load.parquet("file:///my_file.parquet") YIELD value RETURN value
','
MERGE (n1:Label1 {property: value.property1})
ON CREATE SET n.property2 = value.property2
FOREACH (_ IN CASE WHEN value.info1 IS NOT NULL AND value.info1 <> \\'\\' THEN [1] ELSE [] END |
MERGE (n2:Label2 {info: value.info1})
MERGE (n)-[:HAS_INFO]->(n2)
)
FOREACH (_ IN CASE WHEN value.info2 IS NOT NULL AND value.info2 <> \\'\\' THEN [1] ELSE [] END |
MERGE (n3:Label3 {info: value.info2})
MERGE (n)-[:HAS_INFO]->(n3)
)
WITH value.unknown_values AS uvalues, n
UNWIND uvalues AS uvalue
MERGE (n4:Label4{info:uvalue})
MERGE (n)-[:HAS_UNKNOWN_INFO]->(n4)
',{batchSize: 10000, parallel: true}
)
I'd like to know if my way of doing things seems to you to be optimized or not.
Thanks