cancel
Showing results for 
Search instead for 
Did you mean: 

10k writes taking around 4 minutes. Is this the limit?

andreperez
Graph Buddy

I'm using Pyingest script to read 10 CSV Files with 10k columns and 7 rows each. The best result I had so far was using a Chunk Size of 10000.
My PC has 16gb of RAM and a 4c/8t CPU.
The first two always take about 1~2 minutes, then it grows a bit and by the last file the write process takes around 5 minutes.
My DB Heap configs are the following:

# Java Heap Size: by default the Java heap size is dynamically calculated based
# on available system resources. Uncomment these lines to set specific initial
# and maximum heap size.
dbms.memory.heap.initial_size=5G
dbms.memory.heap.max_size=5G

# The amount of memory to use for mapping the store files.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the Java heap size.
dbms.memory.pagecache.size=7G

I got this value from neo4j-admin memrec (not mine actually, I'm using the AppImage version of Neo4J so I can't use memrec, just found this on this forum from a similar device).

Since Pyingest is using Pandas to optimize the CSV read I guess the Timings are getting slower because of Garbage Collector problems.

I'm trying to get the best result I can on my pc, so that when I put this on a Server (much more powerful obviously than my machine) it will perform beautifully.

Is there any way to optimize more?

EDIT: I forgot to put the queries I'm using.

      WITH $dict.rows as rows UNWIND rows as row
      MERGE (a: Cookie_id {domain: row.domain}) 
      MERGE (b: OS {version: row.version}) 
      MERGE (c: Device_type {classification: row.classification}) 
      MERGE (d: Device_model {model: row.model}) 
      MERGE (e: IP {addr: row.addr}) 
      MERGE (f: Access_time {hour_group: row.hour_group}) 
      MERGE (g: Access_day {is_weekend: row.is_weekend}) 
      MERGE (a)-[:USING_OS]->(b)
      MERGE (a)-[:BY_TYPE]->(c)
      MERGE (a)-[:ACCESSED_BY]->(d)
      MERGE (a)-[:HAS_IP]->(e)
      MERGE (a)-[:ACCESSED_AT_TIME]->(f)
      MERGE (a)-[:ACCESSED_AT_DAY]->(g)
      RETURN a
1 ACCEPTED SOLUTION

Hi there! Have you created indices on the nodes you're MERGE-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.

View solution in original post

4 REPLIES 4

Hi there! Have you created indices on the nodes you're MERGE-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.

Actually I didn't.
Will try this to see the boost. Thanks for your answer!

I've got a question about this. Can I set any property as a constraint to act as a Index?
Or does it needs to be the Id?

Oh man! The total time to proccess the 10 files is taking now 15 seconds.
I set a constraint for the main property and set it as the Node Key.
Thank you so much

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.