Hi Team,
I am experiencing performance issues while reading data from disk. Here are some of the details regarding dataset and environment.
- Graph size is roughly 34G, out of which 6G is size of indexes
- Total no. of nodes in db: 24M, total relationships: 61M
- Page cache size is 12G
- we are using azure premium ssd (P30) for persistence (https://azure.microsoft.com/en-us/pricing/details/managed-disks/) which offers 5000 IOPS per disk and has a throughput of 200MB/s.
- Neo4j is community version 3.4.5 running on k8s cluster on azure as single pod.
I am trying to make a cypher query on one of the indexed field which is expected to return at most 1000 records of maximum size 4MB.
I understand since my graph is bigger than the page cache, some of the data will be read from disk. But the read operation takes more than even 30 seconds in some cases.
Is that normal behaviour when neo4j reads indexed data from disk? Any help would be appreciated
Thanks
have you prefaced the query in question with PROFILE
OR EXPLAIN
so as to determine if the index is being utilized?
How have you determined your graph size is 34G? Does this include transaction logs as well, which are not included in the pagecahce
hello Dana,
Yes, I have used explain with the query & it does use the index. Here is the output of explain
.
Filter after NodeIndexSeek doesn't matter much since they will be max 1000 records. I have also tried removing those but there was no significant improvement in the query.
I used :sysinfo to determine the size of graph. Here is the output of it. I think it includes transaction logs also. Not sure though. Transaction logs are around 2G.
thanks for this detail.
:sysinfo and 'Ttal Store Size:` of 33.93G does in fact include all graph.db/neostore.transaction* files. From your screenshot it appears your graph might be on the order of 25GB+/-.
The profile looks real good and it is surprising this would take 30 seconds. Is there some network latency in play here? if you run the query on the Azure instance itself and with bin/cypher-shell do you encounter the same 30 seconds? Are you running this through the Neo4j Browser? whereby some of the time may be as a result of rendering the result in a graph representation?
the queries are running from a different pod(spring boot application) within the same cluster. Network doesn't seem to be the issue since there are lots of other things running in the cluster. Initially we thought it could be disk throughput issue since we were using smaller disk but a recent disk upgrade also didn't helped.
How much should be the read time when neo4j reads indexed data from disk? Are there any benchmarking statistics? There is one another observation that reads gets even slower when checkpointing is happening.
checkpointing? how long is checkpointing taking? if you have access to the logs\debug.log and if running a *nix OS you should be able to get this detail by running
grep -i triggered | logs\debug.log | grep -i check
are you encountering a lot of Garbage Collection events in the debug.log?
Would you be able to provide the query, and is it possible to PROFILE the query and expand all elements of the query plan? The row and db hit info from a PROFILE plan is more useful for tuning.
Checkpointing was slow when we were using P10 disk. It was taking upto 14 minutes sometimes. But after upgrading, checkpointing has become very fast after disk upgrade. it's under 10 seconds mostly now.
garbage collection is not so much. just once in last 24 hours. Do you have rough idea how much should the query(with above query plan) take if all the data is read from disk?
Hi Andrew,
Please find expanded query plan below:
The query is:
MATCH (n:artifact) WHERE n.docId IN ['3747ee26-8b2e-40cf-bccc-c262be69fe67', '5cd4923c-0c22-4e79-b6da-75bd919da31f', 'e9afe2ec-3324-4027-968d-4f5839d71287', 'acc4a43c-9cb2-4bce-8ebc-9fc43bb5453a', '41579a30-809c-4cc8-bfc4-b01a114caa26', '0a37fe3a-0068-41eb-8d12-fb0795931501', 'eebd3a93-7da2-47e7-ae79-786f24ade2aa', 'fa689188-5075-4051-9515-903ec9042383', 'c791aaef-6b89-499f-974b-071eec329755', '4d350433-4874-4cec-9f6f-ad5621c1d232'] AND n.tenantId='my-tenant' AND n.language='en' RETURN n.id, n.graphVecEmbedding
docId has an index in artifact nodes
I think a composite index would help you here.
Please create an index on :artifact(docId, tenantId)
, then rerun the query and see if that helps.
But after filtering on index column, there would be hardly 2000 nodes at the max. Full scan for 2000 records should not take 30 seconds. I can even remove these filters on tenantId and langauge altogether. But i am still not sure why the read from disk is slow on an indexed column. Are there any benchmarking stats for neo4j read from disk?