Database unresponsive sometimes

usamabin.masood1 · April 6, 2021, 6:17pm

Hi folks,

We are experiencing unresponsiveness issue while loading data into neo4j for complete delta cycle, mostly on the step of node deletion and relationship creation.

Our Environment Configuration can be described as:
GCP Compute Engine Machine 16vCPU 64GB RAM
Operating System Ubuntu 18.04LTS

Neo4J Community edition.
Memory settings:
dbms.memory.heap.initial_size=24100m
dbms.memory.heap.max_size=24100m
dbms.memory.pagecache.size=19900m
dbms.tx_state.max_off_heap_memory=8g

Node in Neo4j: 500,000,000

Issue details:
The Neo4J Database just go into infinite state of processing for that particular deletion or relationship creation query. Metrics indicates 40% RAM usage but CPU shoots upto 80%.
If the compute engine instance is restarted and job is submitted again, it works like a charm. No, simultaneous queries were executed during this scenario. Before every data load, we are clearing database cache but this does not solves the problem.

Is there a way I can configure my database where it's response is consistent and Query response time always remain same. If it takes 5 minutes to execute, it must take 5 minutes every time. Caching can speed up the things but I don't understand how restarting VM instance or database engine speeds up the process. Is it due to any kind of deadlocks ?

Can anyone help in this case.

andrew_bowman · April 7, 2021, 1:27am

Check your debug log. If you're seeing a significant number of stop-the-world pauses (it's in milliseconds, so 5000 or more for the pause times) that would explain what you're seeing. Those would be garbage collection events, and those are often because of poorly tuned queries. You'd want to identify the queries in question that are either hanging or causing the pauses. Perhaps you're trying to commit too much data in a single transactions, in which case you should look to use apoc.periodic.iterate() to batch the changes. Or it could be something else. Without knowing what's going on with the query or being able to see the query plan or the debug logs, we don't have much to go on.

usamabin.masood1 · April 7, 2021, 11:20am

Thanks Andrew for taking interest in this issue. While examining the debug logs, I found stop-the-world pauses many times and delayed the execution for up-to 5 hours. It looks like it was the main cause. Queries that were submitted use apoc.periodic.iterate()

Query used Incase of Create Nodes:

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM "{}" AS row return row
','
                CREATE (a:NodeMarket
				{ marketCode : coalesce(row.marketCode,"")
                  ,market : coalesce(row.market,"")                    
                    ,createdBy : coalesce(row.createdBy,"")
                    ,modifiedBy : coalesce(row.modifiedBy,"")
                    ,createdDateTime : datetime(coalesce(replace(row.createdDateTime," ","T"),"1900-01-01T00:00:00"))
                    ,modifiedDateTime : datetime(coalesce(replace(row.modifiedDateTime," ","T"),"1900-01-01T00:00:00"))   })'
,{batchSize:2500, iterateList:true, parallel:true, concurrency:200});

Incase of Delete:

CALL apoc.periodic.iterate(
'LOAD CSV WITH HEADERS FROM "{}" AS row return row'
,
                'MATCH (a:NodeMarket
				{marketCode : coalesce(row.marketCode,"")} )
			
				DETACH DELETE a'
				, {batchSize:1000, iterateList:true});

Incase of Relationships Create:

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM "{}" AS row return row
','
MATCH (a:NodeStore { storeID : coalesce(row.storeID,"") })     
WITH a,row 
MATCH (b:NodeMarket)
where b.marketCode = row.marketCode
CREATE (b)-[rel:REL_STORE]->(a)',
 {batchSize:1000, iterateList:true});

Based of debug logs uploaded, can you please suggest what optimizations and query tuning can be done?
What steps can be taken to avoid stop-the-world pauses?

debug(trimmed).txt (108.1 KB)

andrew_bowman · April 19, 2021, 4:42pm

The first big thing to check is if you have the correct indexes in place. You need indexes to support efficient MATCH and MERGE operations.

For the queries you provided, you would need to make sure you have indexes (or unique constraints, whichever makes the most sense for your data model) on:

:NodeMarket(marketCode)
:NodeStore(storeID)

Topic		Replies	Views
Database becomes non-responsive in production Neo4j Graph Platform performance	12	7605	January 6, 2020
Neo4j periodically becoming unresponsive Neo4j Graph Platform performance	2	2298	November 11, 2019
Experiencing GC pause and high CPU Cypher	2	489	May 11, 2022
Performance Recommendation Needed Neo4j Graph Platform migrated	10	258	November 9, 2022
Extremely slow query time with concurrent queries Neo4j Graph Platform performance	8	2588	October 22, 2020

Database unresponsive sometimes

Related topics