Relationship creation taking a lot of time using spark connector

Hi, I'm creating a graph with approximately 200 million nodes and 500 million relationships. Node creation doesn't take a long time but the problem lies in relationship creation. I'm using a pyspark data frame to write my data using the spark connector. This is my query to write relationships

WITH event
                    LIMIT 1
                    CALL apoc.periodic.iterate(
                    'CYPHER runtime=parallel UNWIND $batch as event MATCH (a:EDISubscriberHL {TransactionID: event.Encounter2}) MATCH (b:EDIClaim {PatientAccountNumber: event.Claims_ClaimInfo_PatientAccountNumber}) WHERE NOT EXISTS((a)-[:HAS_CLAIM]->(b)) RETURN event,  a,   b',

                    'CREATE (a)-[r:HAS_CLAIM {}]->(b)',
                    {batchSize:1000, params:{batch: $events}, parallel:false}
                    )
                    YIELD batches,total,timeTaken,committedOperations,failedOperations,failedBatches,retries,errorMessages,batch,operations,    wasTerminated,failedParams,updateStatistics
                    RETURN batches,total,timeTaken,committedOperations,failedOperations,failedBatches,retries,errorMessages,batch,operations,   wasTerminated,failedParams,updateStatistics;

this is the batch size I'm using with the spark connector

 sub_df3.write.format("org.neo4j.spark.DataSource") \
                    .mode("Overwrite") \
                    .option("url", URL) \
                    .option("authentication.basic.username", USER) \
                    .option("authentication.basic.password", PASSWORD) \
                    .option("database", DATABASE)\
                    .option("batch.size", 5000)\
                    .option("query", query)\
                    .option("transaction.retries", 5)\
                    .save()

My question is what should be the batch size in the apoc.periodic.iterate query as well as in the spark connector parameter batch size

First, do you have indexes defined on the properties you are matching on: EDISubscriberHL(TransactionID) and EDIClaim(PatientAccountNumber)

The batch size depends on memory. I read in an article by Michael hunger that batches of 10,000 should be ok. As such, maybe you don’t need periodic iterate when you data frame passes 5000 rows.

yes I have indexes on these particular properties. Approximately 180 GB of RAM is allocated to neo4j. I'll try 10,000