Neo4j Crashes due to 'java.lang.OutOfMemoryError: Java heap space'

out-of-memory

(Anshulghogre) #1

Hi All,

I am using a neo4j container instance in Azure. After loading data and running a few queries from the frontend app (Integrated to Neo4j) I am continuously getting an out-of-memory exception. After setting heap memory to the 16GB max and 4GB initial issue doesn't resolve.

Earlier everything was working fine for almost 2-3 months since the last 2 days this is issue is occurring.

I have around 2500 nodes and loaded CSV file of 400KB.
Version of Noe4j:-
Neo4j Browser version: 3.2.15
Neo4j Server version: [3.5.1]

Below is the log:-
`

**> * Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1" Exception in thread "Scheduler-200452658" java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.409+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 ERROR Unexpected error detected in bolt session 'bolt-385'. Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 WARN Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@4831ab11 in QueuedThreadPool[qtp1401633928]@538b3c88{STARTED,8<=13<=14,i=4,q=0}[ReservedThreadExecutor@4289f562{s=1/1,p=0}] ** **2019-02-09 08:29:55.410+0000 WARN Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@4831ab11 in QueuedThreadPool[qtp1401633928]@538b3c88{STARTED,8<=13<=14,i=5,q=0}[ReservedThreadExecutor@4289f562{s=1/1,p=0}] ** **2019-02-09 08:29:55.410+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.410+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.411+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.411+0000 WARN Java heap space java.lang.OutOfMemoryError: Java heap space ** **2019-02-09 08:29:55.411+0000 WARN Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@4831ab11 in QueuedThreadPool[qtp1401633928]@538b3c88{STARTED,8<=13<=14,i=7,q=0}[ReservedThreadExecutor@4289f562{s=1/1,p=0}] Exception in thread "neo4j.VmPauseMonitor-1" java.lang.OutOfMemoryError: Java heap space**

`


(Kunal Goyal) #2

could you please let us know that what kind of queries you are running and when this error occurs ?


(Anshulghogre) #3

Earlier everything were working fine for almost 2-3 months, since last 2 days this is issue is occuring.
below is one of the sample query which I am using:-

MATCH (a:Contact)-[:working_with]->(d:Leadcompany)-[:contacted_to]->(c:LeadCompanyContact) WITH max(date(c.date)) - duration('PYD') AS latest_date MATCH (b:LeadcompanyTechnology)<-[:working_on]-(d:Leadcompany)-[:contactd_to]->(c:LeadCompanyContact) where date(c.date)>latest_date and d.LeadCompanyNam<>"NA" Return distinct(d.LeadCompanyName),date(c.date) order by date(c.date) DESC LIMIT 50

rest all queries are similar to above.. (approx 20 more queries)


(Andrew Bowman) #4

There's several things to improve.

The first is the initial match, this has the potential to lead to multiple instances of the same :LeadCompanyContact nodes being matched to, occurring within different paths with other nodes matching the pattern. A better alternative would be:

MATCH (c:LeadCompanyContact)
WHERE (:Contact)-[:working_with]->(:Leadcompany)-[:contacted_to]->(c)

Next the max(date(c.date)) seems inefficient, is there any way to refactor so c.date is already a date type on all nodes? That will ensure you don't have to waste operations applying the type conversions across all rows.

More importantly, it would open up the possibility of adding an index on :LeadCompanyContact(date), which would make WHERE c.date > latest_date perform an index lookup, as it is now there is no index usage for this match.

We can adjust the second match to reduce the possibility of excessive cardinality by moving the section of the pattern including :LeadcompanyTechnology into the WHERE clause, since you don't use b in any way in the rest of the query:

MATCH (d:Leadcompany)-[:contacted_to]->(c:LeadCompanyContact) 
WHERE (:LeadcompanyTechnology)<-[:working_on]-(d) AND c.date > latest_date AND d.LeadCompanyNam <> "NA"

If this particular predicate (d.LeadCompanyNam <> "NA") is used often, you might consider applying an :NA node label to these nodes, and adjusting your predicate to WHERE NOT d:NA, as label checking is cheaper than filtering by property.

Lastly just a note that DISTINCT is a keyword applying to the entire row, it is not a function, so no need for parenthesis on usage.