cancel
Showing results for 
Search instead for 
Did you mean: 

Config optimization for heavy query traffic

mason_edmison
Node Link

I will soon be throwing a bunch of queries at a neo4j community instance and was wondering what sorts of modifications I could make to the neo4j.conf to prevent the server from getting bogged down.

Already I have made the following changes:

NEO4J_dbms_memory_pagecache_size=32G
NEO4J_dbms_memory_heap_max__size=32G

The graph I will be querying is quite large (~1.7 billion nodes with about the same number of relationships) and the query itself is:

UNWIND $pmids AS id // unserializes $pmids
MATCH (n:Article) MATCH (n:Article {id:id}) -[:HAS_MENTION]->() 
WITH n, [(n)-[:HAS_MENTION]->(m:Mention) | m] as mentions
ORDER BY n.id   // should use index-backed ordering, if a type hint is supplied earlier
RETURN n.id as pmid, mentions

... in which I am passing batches of approximately 1000 elements in $pmids.

As it is now, the query time drastically slows on each query (from 5 seconds to eventually over a minute) over the course of 100K pmids (I am using python multithreading in which I am concurrently querying the DB with 16 workers).

Any tips or tricks?

Thanks in advance.

4 REPLIES 4

dana_canzano
Neo4j
Neo4j

is this a continuation of https://community.neo4j.com/t/query-optimization-that-collects-and-orders-nodes-on-very-large-graph ???

Regardless I'm curious how you came about

NEO4J_dbms_memory_pagecache_size=32G
NEO4J_dbms_memory_heap_max__size=32G

How large is your graph on disk? and how much total RAM is on the instance?

mason_edmison
Node Link

Yep, sorry I thought I had already linked my previous post focusing on the query itself but thanks for including it.

In terms of the pagecache and heap_max size, I think it was a bit of a shot in the dark to be honest- something that I had set initially as part of my bash script to spin up the instance.

Total RAM on the instance is 128GB.
Graph on disk size (assuming you mean the size of the database directory in the databases directory): 327GB

Thanks!

oh and what version of Neo4j? If this is pre 4.x then the size of the graph should not be determined simply by getting the size of data/databases since this will include transaction logs which arent included in the size if dbms.memory.pagecahce.size.

if the size of the graph is 327GB and your dbms.memory.pagecahce.size=32G then only 1/10th of your graph is in RAM and so most of your queries are getting data from the file system rather than RAM.

In terms of the pagecache and heap_max size, I think it was a bit of a shot in the dark to be honest- something that I had set initially as part of my bash script to spin up the instance.

for 4.x documentation at Memory configuration - Operations Manual provides details on how these parameters should be set.

version=4.1

Cool, I will take a look at the documentation.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.