Keep getting error of java heap space when use Merge and FOREACH

dannyfan558 · July 23, 2021, 8:50pm

Hi, sorry I am new to the Neo4j. I hope someone can help me to work this out or give me a direction (or Cypher function) I should working on.

The problem is that now I am trying to install a CSV file of collected movie information into the Neo4j to generate the graph. The filed of cast of the data set contains the names of the caste for each movie. In order to linked the nodes of movies to the node of the cast who appeared in bot the the movie. I tried to use the MERGE to check, create the cast nodes.
The entry in the CSV file looks like

The syntax looks like:
FOREACH (x IN cast_list_of_a_movie | MEREGE ( c: cast {name: x})
MERGE (a) -[:cast] -> (c)
Note: the a is the current movie we are working on.

However, the error of java heap always raised during this process. I have changed the parameters in neo4j.conf to 2G or higher but it still not work.

If I use the CREATE to create the node and relationship, it will work but this will miss the ability of finding two movies which has the same actor. The total number of nodes will be created is about 30,000 nodes, many of them are definitely duplicated.

So I believe the problem is due to the usage of Merge and FOREACH together. Looks like the MERGE will have to search through these 30,000 nodes and raised this error. Is there any way we can solve this problem? Thank you so much for helping.

Or whether I should try to CREATE the nodes first and the merge the cast nodes with the same name attributes?

david_allen · July 25, 2021, 7:15pm

If you're loading the data with LOAD CSV and you're having troubles with the heap the first thing to do is to use a smaller batch size, with USING PERIODIC COMMIT

I doubt the out of heap error is related to the FOREACH, but for completeness sake, it'd be worth posting your full query if I've got something wrong, there may be something else that's causing an issue. You might also try to prefix your query with EXPLAIN to see if you can spot any issues

dannyfan558 · July 26, 2021, 5:10am

Thank you so much for replying.
The full query is below:

MATCH (n) DETACH DELETE n;
LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row

MERGE (a:movie {name:row.id,title:row.title})
MERGE (e:duration {name:row.duration})
MERGE (a) -[:duration]->(e)
MERGE (f:available {time:coalesce(row.available_date,1000)})
MERGE (a) -[:available_date]->(f)
MERGE (g:parent_control {name:row.parent_control})
MERGE (a) -[:parent_control]->(g)
MERGE (h:production_detial {name:coalesce(row.production_detail,1)})
MERGE (a) -[:production_detail]->(h)
WITH

row.id as id_1,
row.title as title_1,

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1,
split(replace(replace(replace (row.genre,'[',''),"'",''),']',''),',') AS r1,
split(replace(replace(replace (row.director,'[',''),"'",''),']',''),',') AS s1,
split(replace(replace(replace (row.language,'[',''),"'",''),']',''),',') AS t1,
split(replace(replace(replace (row.audio_description,'[',''),"'",''),']',''),',') AS u1,
split(replace(replace(replace (row.subtitle,'[',''),"'",''),']',''),',') AS v1,
split(replace(replace(replace (row.rating,'{',''),"'",''),'}',''),',') AS w1

//FOREACH (x IN r1 | MERGE (b:genre {name:x}),MERGE (a) -[:genre]->(b))
MERGE (a:movie {name:id_1,title:title_1})
FOREACH (x IN r1 |
MERGE (b:genre {name:x})
MERGE (b) <-[:genre]-(a))

FOREACH (x IN q1 | // where probelm occur. If I delete this three lines
MERGE (c:cast {name:x}) // for the cast nodes and its realtion ship. All the
MERGE (a) -[:cast]->(c)) // other querys works.

FOREACH (x IN s1 |
MERGE (d:director {name:x})
MERGE (a) -[:director]-> (d))

FOREACH (x IN t1 |
MERGE (i:lanaguage {name:x})
MERGE (a) -[:language]-> (i))

FOREACH (x IN u1 |
MERGE (j:audio {name:x})
MERGE (a) -[:audio]-> (j))

FOREACH (x IN v1 |
MERGE (k:subtitle {name:x})
MERGE (a) -[:subtitle]-> (k))

FOREACH (x IN w1 |
MERGE (l:rating {name:split(x,':')[0],rate:coalesce(split(x,':')[1],'no rate')})
MERGE (a) -[:rate]-> (l))

dannyfan558 · July 26, 2021, 5:14am

Thank you so much for replying!
I have tried to use the periodic committ as you suggested. but the same error still raised.
I have posted the full query code below.

david_allen · July 26, 2021, 2:19pm

Which periodic commit setting did you use? I don't see it in your code. Additionally, what does your input data look like?

Unless each of these lines is enormous, this should work fine, I'm not quite seeing the issue from what you're posting so far.

Also, please be sure to post the exact error. Also, it'd help if you indicate what your neo4j server memory settings are.

dannyfan558 · July 26, 2021, 5:31pm

HI, sorry I think I posted the previous query, which is almost the same except the first line.
The query with periodic commit setting looks like:

:auto USING PERIODIC COMMIT 1000

MERGE (a:movie {name:row.id,title:row.title})
MERGE (e:duration {name:row.duration})
MERGE (a) -[:duration]->(e)
MERGE (f:available {time:coalesce(row.available_date,1000)})
MERGE (a) -[:available_date]->(f)
MERGE (g:parent_control {name:row.parent_control})
MERGE (a) -[:parent_control]->(g)
MERGE (h:production_detial {name:coalesce(row.production_detail,1)})
MERGE (a) -[:production_detail]->(h)
WITH

row.id as id_1,
row.title as title_1,

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1,
split(replace(replace(replace (row.genre,'[',''),"'",''),']',''),',') AS r1,
split(replace(replace(replace (row.director,'[',''),"'",''),']',''),',') AS s1,
split(replace(replace(replace (row.language,'[',''),"'",''),']',''),',') AS t1,
split(replace(replace(replace (row.audio_description,'[',''),"'",''),']',''),',') AS u1,
split(replace(replace(replace (row.subtitle,'[',''),"'",''),']',''),',') AS v1,
split(replace(replace(replace (row.rating,'{',''),"'",''),'}',''),',') AS w1

MERGE (a:movie {name:id_1,title:title_1})
FOREACH (x IN r1 |
MERGE (b:genre {name:x})
MERGE (b) <-[:genre]-(a))

FOREACH (x IN q1 |
MERGE (c:cast {name:x})
MERGE (a) -[:cast]->(c))

FOREACH (x IN s1 |
MERGE (d:director {name:x})
MERGE (a) -[:director]-> (d))

FOREACH (x IN t1 |
MERGE (i:lanaguage {name:x})
MERGE (a) -[:language]-> (i))

FOREACH (x IN u1 |
MERGE (j:audio {name:x})
MERGE (a) -[:audio]-> (j))

FOREACH (x IN v1 |
MERGE (k:subtitle {name:x})
MERGE (a) -[:subtitle]-> (k))

FOREACH (x IN w1 |
MERGE (l:rating {name:split(x,':')[0],rate:coalesce(split(x,':')[1],'no rate')})
MERGE (a) -[:rate]-> (l))

The exact error shew up was:

ERROR Neo.DatabaseError.Statement.ExecutionFailed

Java heap space

dannyfan558 · July 26, 2021, 5:44pm

The input data is a csv file which I collected by web scrap. The detail looks like:

As you said, the query works fine without trying to import and merge the caste nodes and relationship.

On the other hand, if I write the query as:
FOREACH (x IN q1 |
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c))

It also works but this will treat each cast name as a new node. This will results in 30,000 new nodes and relationships. So I think the MERGE will try to match all the exciting cast nodes during the execution and this is why the Java Heap Sapce error was raised. I don't know wherther I am correct.

Thank you so much

david_allen · July 26, 2021, 6:24pm

Several suggestions:

Check the EXPLAIN plan of your query and make sure those FOREACHs aren't blowing up the execution tree.
Consider replacing FOREACH with WITH ... UNWIND. For example:

Instead of:

FOREACH (x IN q1 |
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c))

Do this:

WITH a, line, q1
UNWIND q1 AS x
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c)

dannyfan558 · July 27, 2021, 12:35am

Thanks for helping. I did a test on the cast node only as your suggestion.

The query looks like:
:auto USING PERIODIC COMMIT 1000

LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row

MERGE (a:movie {name:row.id,title:row.title})

WITH

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1

UNWIND q1 AS x

MERGE (c:cast {name:x})

MERGE (a) -[:cast]->(c)

However the result is still:
ERROR Neo.DatabaseError.Statement.ExecutionFailed

Java heap space

andrew_bowman · July 27, 2021, 6:11am

There were some Eager operations in the query plan, those disable periodic commits.

In this case it looks like it can be fixed pretty easily. You need to include a in your WITH clause so it remains in scope, otherwise your MERGE at the end won't use the a node from earlier, but will create a new blank node instead, which isn't what you want.

Also, you want to make sure you have indexes or unique constraints on:
:movie(title)
:cast(name)

That will speed up your MERGE operations. If you do an EXPLAIN of the query, you want to make sure you do NOT see any NodeByLabelScans, that indicates that you are lacking an index for a MATCH or MERGE operation, which may be costly over larger data sets.

After your indexes are created and your EXPLAIN looks clear, try this, it includes a in scope in your WITH clause, and should get rid of any Eager operations in your plan, which should let the periodic commit do its thing and keep heap usage low.

Try this:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row
MERGE (a:movie {name:row.id,title:row.title})
WITH a, split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1
UNWIND q1 AS x
MERGE (c:cast {name:x})
MERGE (a) -[:cast]->(c)

Topic		Replies	Views
Memory Problem(Overflow) Cypher cypher , out-of-memory	0	500	June 16, 2020
While creating relationship, I got 'java.lang.OutOfMemoryError: Java heap space' please help! Cypher	1	351	June 30, 2020
Java heap space error when loading a huge csv file Import / Export	8	2287	May 26, 2021
Neo4j Crashes due to 'java.lang.OutOfMemoryError: Java heap space' Neo4j Graph Platform out-of-memory	3	3309	February 13, 2019
neo.TransientError.General.OutOfMemoryError Cypher	1	1662	October 24, 2019

Keep getting error of java heap space when use Merge and FOREACH

Related topics