Neo4j read from disk is slow

meet2sukalp · October 22, 2019, 12:07pm

Hi Team,

I am experiencing performance issues while reading data from disk. Here are some of the details regarding dataset and environment.

Graph size is roughly 34G, out of which 6G is size of indexes
Total no. of nodes in db: 24M, total relationships: 61M
Page cache size is 12G
we are using azure premium ssd (P30) for persistence (https://azure.microsoft.com/en-us/pricing/details/managed-disks/) which offers 5000 IOPS per disk and has a throughput of 200MB/s.
Neo4j is community version 3.4.5 running on k8s cluster on azure as single pod.

I am trying to make a cypher query on one of the indexed field which is expected to return at most 1000 records of maximum size 4MB.
I understand since my graph is bigger than the page cache, some of the data will be read from disk. But the read operation takes more than even 30 seconds in some cases.

Is that normal behaviour when neo4j reads indexed data from disk? Any help would be appreciated

Thanks

dana_canzano · October 22, 2019, 6:57pm

have you prefaced the query in question with PROFILE OR EXPLAIN so as to determine if the index is being utilized?

How have you determined your graph size is 34G? Does this include transaction logs as well, which are not included in the pagecahce

meet2sukalp · October 22, 2019, 7:23pm

hello Dana,

Yes, I have used explain with the query & it does use the index. Here is the output of explain

.
Filter after NodeIndexSeek doesn't matter much since they will be max 1000 records. I have also tried removing those but there was no significant improvement in the query.

I used :sysinfo to determine the size of graph. Here is the output of it. I think it includes transaction logs also. Not sure though. Transaction logs are around 2G.

dana_canzano · October 22, 2019, 7:30pm

thanks for this detail.
:sysinfo and 'Ttal Store Size:` of 33.93G does in fact include all graph.db/neostore.transaction* files. From your screenshot it appears your graph might be on the order of 25GB+/-.

The profile looks real good and it is surprising this would take 30 seconds. Is there some network latency in play here? if you run the query on the Azure instance itself and with bin/cypher-shell do you encounter the same 30 seconds? Are you running this through the Neo4j Browser? whereby some of the time may be as a result of rendering the result in a graph representation?

meet2sukalp · October 22, 2019, 7:43pm

the queries are running from a different pod(spring boot application) within the same cluster. Network doesn't seem to be the issue since there are lots of other things running in the cluster. Initially we thought it could be disk throughput issue since we were using smaller disk but a recent disk upgrade also didn't helped.

How much should be the read time when neo4j reads indexed data from disk? Are there any benchmarking statistics? There is one another observation that reads gets even slower when checkpointing is happening.

dana_canzano · October 22, 2019, 7:46pm

checkpointing? how long is checkpointing taking? if you have access to the logs\debug.log and if running a *nix OS you should be able to get this detail by running

grep -i triggered | logs\debug.log | grep -i check

are you encountering a lot of Garbage Collection events in the debug.log?

andrew_bowman · October 22, 2019, 8:44pm

Would you be able to provide the query, and is it possible to PROFILE the query and expand all elements of the query plan? The row and db hit info from a PROFILE plan is more useful for tuning.

meet2sukalp · October 23, 2019, 9:12am

Checkpointing was slow when we were using P10 disk. It was taking upto 14 minutes sometimes. But after upgrading, checkpointing has become very fast after disk upgrade. it's under 10 seconds mostly now.

garbage collection is not so much. just once in last 24 hours. Do you have rough idea how much should the query(with above query plan) take if all the data is read from disk?

meet2sukalp · October 23, 2019, 9:19am

Hi Andrew,

Please find expanded query plan below:

The query is:
MATCH (n:artifact) WHERE n.docId IN ['3747ee26-8b2e-40cf-bccc-c262be69fe67', '5cd4923c-0c22-4e79-b6da-75bd919da31f', 'e9afe2ec-3324-4027-968d-4f5839d71287', 'acc4a43c-9cb2-4bce-8ebc-9fc43bb5453a', '41579a30-809c-4cc8-bfc4-b01a114caa26', '0a37fe3a-0068-41eb-8d12-fb0795931501', 'eebd3a93-7da2-47e7-ae79-786f24ade2aa', 'fa689188-5075-4051-9515-903ec9042383', 'c791aaef-6b89-499f-974b-071eec329755', '4d350433-4874-4cec-9f6f-ad5621c1d232'] AND n.tenantId='my-tenant' AND n.language='en' RETURN n.id, n.graphVecEmbedding

docId has an index in artifact nodes

andrew_bowman · October 23, 2019, 9:33am

I think a composite index would help you here.

Please create an index on :artifact(docId, tenantId), then rerun the query and see if that helps.

meet2sukalp · October 23, 2019, 11:38am

But after filtering on index column, there would be hardly 2000 nodes at the max. Full scan for 2000 records should not take 30 seconds. I can even remove these filters on tenantId and langauge altogether. But i am still not sure why the read from disk is slow on an indexed column. Are there any benchmarking stats for neo4j read from disk?

Topic		Replies	Views
Neo4j Performance during writes Neo4j Graph Platform performance	6	2004	November 26, 2018
The Amazing Shrinking Data Footprint on Disk Neo4j Graph Platform	2	345	March 1, 2020
Read / write performance dramatically degrades with concurrent queries Neo4j Graph Platform migrated	1	168	July 17, 2022
Read / write performance dramatically degrades with concurrent queries Neo4j Graph Platform performance , cypher	15	9544	April 12, 2023
Disk Usage: Why does the used disk space is far different from the sum of the other related metrics? Neo4j Graph Platform operations	3	1313	May 13, 2019

Neo4j read from disk is slow

Related topics