Optimizing the load of a 38 character string with Index on it

Hello,

I have to have an Unique Identifier on one of the node is a 38 character string, eg - "37300097819223399953200236708848436046".

Is there anyway to optimize this kind of Unique Identifier. I have an index on this. This fails to load the 9M rows.

I have around 9M records which I am loading into graph with below command -

CALL apoc.periodic.iterate("
UNWIND ['activity.csv'] AS file CALL apoc.load.csv('file:///dcm/' + file, {header:true, sep:',', quoteChar:'"', nullValues:['']}) YIELD
map as row WHERE row.ecid is not null return row
","
WITH row
MERGE (ec:ec_id {ec_id:row.ecid})
ON CREATE SET ec+={create_dt:datetime()}
", {batchSize: 10000, iterateList: true, parallel: false})

Community Edition Server -
Server Size - 16 Core 128GB
Server is running on 70% CPU when loading data

Memory Configs -
dbms.memory.heap.initial_size=31g
dbms.memory.heap.max_size=31g
dbms.memory.pagecache.size=73g
dbms.tx_state.max_off_heap_memory=8g

I am seeing below logs in debug.log and database becomes inaccessible.

2020-05-29 04:34:54.697+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5933, gcTime=5972, gcCount=2}
2020-05-29 04:35:01.850+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5951, gcTime=5963, gcCount=2}
2020-05-29 04:35:08.773+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5910, gcTime=5916, gcCount=2}
2020-05-29 04:35:15.831+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6021, gcTime=6102, gcCount=2}
2020-05-29 04:35:22.785+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6131, gcTime=6173, gcCount=2}
2020-05-29 04:35:29.434+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5847, gcTime=5935, gcCount=3}
2020-05-29 04:35:36.352+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6082, gcTime=6097, gcCount=2}
2020-05-29 04:35:42.847+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5680, gcTime=5773, gcCount=2}
2020-05-29 04:35:49.551+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6101, gcTime=6196, gcCount=2}
2020-05-29 04:35:56.490+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6137, gcTime=6181, gcCount=2}
2020-05-29 04:36:03.019+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5927, gcTime=5974, gcCount=3}
2020-05-29 04:36:09.480+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5960, gcTime=6023, gcCount=2}
2020-05-29 04:36:15.959+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5978, gcTime=6049, gcCount=3}
2020-05-29 04:36:22.346+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5885, gcTime=5933, gcCount=2}
2020-05-29 04:36:28.693+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5945, gcTime=5987, gcCount=2}
2020-05-29 04:36:35.028+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5807, gcTime=5881, gcCount=2}
2020-05-29 04:36:41.493+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6063, gcTime=6132, gcCount=3}
2020-05-29 04:36:47.641+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5747, gcTime=0, gcCount=0}
2020-05-29 04:36:53.606+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5764, gcTime=5796, gcCount=2}
2020-05-29 04:36:59.946+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=6039, gcTime=6083, gcCount=3}
2020-05-29 04:37:05.837+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5590, gcTime=5618, gcCount=2}
2020-05-29 04:37:11.823+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=5669, gcTime=5759, gcCount=2}

Do you have a constraint on the label and key you merge on?

What neo4j version do you use?

Hi @michael.hunger,

Sorry for the late response. Here is how I created the index. Yes, I do have a Unique constraint and I do not have a key as I was using Community Edition.

CREATE CONSTRAINT UQ_EC_ID
ON (n:ec_id)
ASSERT n.ec_id IS UNIQUE;

I am on Community Edition Neo4j 4.0.4.

This is really weird, could you try to use LOAD CSV instead of apoc.load.csv and see if it persists?

Hi @michael.hunger,

I tried using LOAD CSV as per your suggestion and I am seeing those errors just once in a while but they are going away ultimately. Attaching log below. We've used APOC for all our ingestion jobs and seeing this error in other places too makinf DB unresponsive. Especially when the CSV file size is big (>9M rows around). Is LOAD CSV recommended over APOC procedures for such cases?

Also, it will be helpful if we know what actually this error mean.

2020-06-03 17:35:49.315+0000 INFO [o.n.c.i.ExecutionEngine] [idgraph] Discarded stale query from the query cache after 239 seconds: CALL dbms.listConfig()
2020-06-03 17:35:54.921+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.4] version=3, last transaction in previous log=5466, rotation took 35 millis, started after 87672 millis.
2020-06-03 17:37:28.525+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.5] version=4, last transaction in previous log=7202, rotation took 37 millis, started after 93567 millis.
2020-06-03 17:38:57.762+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.6] version=5, last transaction in previous log=9176, rotation took 34 millis, started after 89203 millis.
2020-06-03 17:40:29.268+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.7] version=6, last transaction in previous log=11420, rotation took 36 millis, started after 91470 millis.
2020-06-03 17:41:58.480+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.8] version=7, last transaction in previous log=13593, rotation took 33 millis, started after 89179 millis.
2020-06-03 17:43:27.346+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.9] version=8, last transaction in previous log=15650, rotation took 34 millis, started after 88832 millis.
2020-06-03 17:44:49.275+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=429, gcTime=0, gcCount=0}
2020-06-03 17:45:10.743+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.10] version=9, last transaction in previous log=17688, rotation took 34 millis, started after 103363 millis.
2020-06-03 17:46:43.655+0000 INFO [o.n.k.d.Database] [idgraph] Rotated to transaction log [/neo4jdrive/data/transactions/idgraph/neostore.transaction.db.11] version=10, last transaction in previous log=19630, rotation took 32 millis, started after 92880 millis.
2020-06-03 17:46:48.741+0000 INFO [o.n.c.i.ExecutionEngine] [idgraph] Discarded stale query from the query cache after 659 seconds:
CALL db.labels() YIELD label
RETURN {name:'labels', data:COLLECT(label)[..1000]} AS result
UNION ALL
CALL db.relationshipTypes() YIELD relationshipType
RETURN {name:'relationshipTypes', data:COLLECT(relationshipType)[..1000]} AS result
UNION ALL
CALL db.propertyKeys() YIELD propertyKey
RETURN {name:'propertyKeys', data:COLLECT(propertyKey)[..1000]} AS result
UNION ALL
CALL dbms.functions() YIELD name, signature, description
RETURN {name:'functions', data: collect({name: name, signature: signature, description: description})} AS result
UNION ALL
CALL dbms.procedures() YIELD name, signature, description
RETURN {name:'procedures', data:collect({name: name, signature: signature, description: description})} AS result
UNION ALL
MATCH () RETURN { name:'nodes', data:count() } AS result
UNION ALL
MATCH ()-[]->() RETURN { name:'relationships', data: count(
)} AS result

Thanks,
Sai

Generally not, it seems to be something particular with your file. not sure if you can share it.

The error means that the virtual machine is trying to free memory and it takes quite long and might not be successful.

I am including a row from the file, it has header too just do not want to include it here. Bolded value is the one I created the index on -

"2019-01-18T17:39:12",237249526,,,,,"66690879344007303623597366135687768876","aaklshin",8084382,"C","mathij0"

@michael.hunger - Does increasing the heap size help in anyway.

Hi @michael.hunger - I am still seeing this issue and was reaching out to see if you have any suggestions on this.