Keep getting error of java heap space when use Merge and FOREACH

Hi, sorry I am new to the Neo4j. I hope someone can help me to work this out or give me a direction (or Cypher function) I should working on.

The problem is that now I am trying to install a CSV file of collected movie information into the Neo4j to generate the graph. The filed of cast of the data set contains the names of the caste for each movie. In order to linked the nodes of movies to the node of the cast who appeared in bot the the movie. I tried to use the MERGE to check, create the cast nodes.
The entry in the CSV file looks like

The syntax looks like:
FOREACH (x IN cast_list_of_a_movie | MEREGE ( c: cast {name: x})
MERGE (a) -[:cast] -> (c)
Note: the a is the current movie we are working on.

However, the error of java heap always raised during this process. I have changed the parameters in neo4j.conf to 2G or higher but it still not work.

If I use the CREATE to create the node and relationship, it will work but this will miss the ability of finding two movies which has the same actor. The total number of nodes will be created is about 30,000 nodes, many of them are definitely duplicated.

So I believe the problem is due to the usage of Merge and FOREACH together. Looks like the MERGE will have to search through these 30,000 nodes and raised this error. Is there any way we can solve this problem? Thank you so much for helping.

Or whether I should try to CREATE the nodes first and the merge the cast nodes with the same name attributes?

If you're loading the data with LOAD CSV and you're having troubles with the heap the first thing to do is to use a smaller batch size, with USING PERIODIC COMMIT

I doubt the out of heap error is related to the FOREACH, but for completeness sake, it'd be worth posting your full query if I've got something wrong, there may be something else that's causing an issue. You might also try to prefix your query with EXPLAIN to see if you can spot any issues

Thank you so much for replying.
The full query is below:

MATCH (n) DETACH DELETE n;
LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row

MERGE (a:movie {name:row.id,title:row.title})
MERGE (e:duration {name:row.duration})
MERGE (a) -[:duration]->(e)
MERGE (f:available {time:coalesce(row.available_date,1000)})
MERGE (a) -[:available_date]->(f)
MERGE (g:parent_control {name:row.parent_control})
MERGE (a) -[:parent_control]->(g)
MERGE (h:production_detial {name:coalesce(row.production_detail,1)})
MERGE (a) -[:production_detail]->(h)
WITH

row.id as id_1,
row.title as title_1,

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1,
split(replace(replace(replace (row.genre,'[',''),"'",''),']',''),',') AS r1,
split(replace(replace(replace (row.director,'[',''),"'",''),']',''),',') AS s1,
split(replace(replace(replace (row.language,'[',''),"'",''),']',''),',') AS t1,
split(replace(replace(replace (row.audio_description,'[',''),"'",''),']',''),',') AS u1,
split(replace(replace(replace (row.subtitle,'[',''),"'",''),']',''),',') AS v1,
split(replace(replace(replace (row.rating,'{',''),"'",''),'}',''),',') AS w1

//FOREACH (x IN r1 | MERGE (b:genre {name:x}),MERGE (a) -[:genre]->(b))
MERGE (a:movie {name:id_1,title:title_1})
FOREACH (x IN r1 |
MERGE (b:genre {name:x})
MERGE (b) <-[:genre]-(a))

FOREACH (x IN q1 | // where probelm occur. If I delete this three lines
MERGE (c:cast {name:x}) // for the cast nodes and its realtion ship. All the
MERGE (a) -[:cast]->(c)) // other querys works.

FOREACH (x IN s1 |
MERGE (d:director {name:x})
MERGE (a) -[:director]-> (d))

FOREACH (x IN t1 |
MERGE (i:lanaguage {name:x})
MERGE (a) -[:language]-> (i))

FOREACH (x IN u1 |
MERGE (j:audio {name:x})
MERGE (a) -[:audio]-> (j))

FOREACH (x IN v1 |
MERGE (k:subtitle {name:x})
MERGE (a) -[:subtitle]-> (k))

FOREACH (x IN w1 |
MERGE (l:rating {name:split(x,':')[0],rate:coalesce(split(x,':')[1],'no rate')})
MERGE (a) -[:rate]-> (l))

Thank you so much for replying!
I have tried to use the periodic committ as you suggested. but the same error still raised.
I have posted the full query code below.

Which periodic commit setting did you use? I don't see it in your code. Additionally, what does your input data look like?

Unless each of these lines is enormous, this should work fine, I'm not quite seeing the issue from what you're posting so far.

Also, please be sure to post the exact error. Also, it'd help if you indicate what your neo4j server memory settings are.

HI, sorry I think I posted the previous query, which is almost the same except the first line.
The query with periodic commit setting looks like:

:auto USING PERIODIC COMMIT 1000

MERGE (a:movie {name:row.id,title:row.title})
MERGE (e:duration {name:row.duration})
MERGE (a) -[:duration]->(e)
MERGE (f:available {time:coalesce(row.available_date,1000)})
MERGE (a) -[:available_date]->(f)
MERGE (g:parent_control {name:row.parent_control})
MERGE (a) -[:parent_control]->(g)
MERGE (h:production_detial {name:coalesce(row.production_detail,1)})
MERGE (a) -[:production_detail]->(h)
WITH

row.id as id_1,
row.title as title_1,

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1,
split(replace(replace(replace (row.genre,'[',''),"'",''),']',''),',') AS r1,
split(replace(replace(replace (row.director,'[',''),"'",''),']',''),',') AS s1,
split(replace(replace(replace (row.language,'[',''),"'",''),']',''),',') AS t1,
split(replace(replace(replace (row.audio_description,'[',''),"'",''),']',''),',') AS u1,
split(replace(replace(replace (row.subtitle,'[',''),"'",''),']',''),',') AS v1,
split(replace(replace(replace (row.rating,'{',''),"'",''),'}',''),',') AS w1

MERGE (a:movie {name:id_1,title:title_1})
FOREACH (x IN r1 |
MERGE (b:genre {name:x})
MERGE (b) <-[:genre]-(a))

FOREACH (x IN q1 |
MERGE (c:cast {name:x})
MERGE (a) -[:cast]->(c))

FOREACH (x IN s1 |
MERGE (d:director {name:x})
MERGE (a) -[:director]-> (d))

FOREACH (x IN t1 |
MERGE (i:lanaguage {name:x})
MERGE (a) -[:language]-> (i))

FOREACH (x IN u1 |
MERGE (j:audio {name:x})
MERGE (a) -[:audio]-> (j))

FOREACH (x IN v1 |
MERGE (k:subtitle {name:x})
MERGE (a) -[:subtitle]-> (k))

FOREACH (x IN w1 |
MERGE (l:rating {name:split(x,':')[0],rate:coalesce(split(x,':')[1],'no rate')})
MERGE (a) -[:rate]-> (l))

The exact error shew up was:

ERROR Neo.DatabaseError.Statement.ExecutionFailed

Java heap space

The input data is a csv file which I collected by web scrap. The detail looks like:

As you said, the query works fine without trying to import and merge the caste nodes and relationship.

On the other hand, if I write the query as:
FOREACH (x IN q1 |
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c))

It also works but this will treat each cast name as a new node. This will results in 30,000 new nodes and relationships. So I think the MERGE will try to match all the exciting cast nodes during the execution and this is why the Java Heap Sapce error was raised. I don't know wherther I am correct.

Thank you so much

Several suggestions:

  • Check the EXPLAIN plan of your query and make sure those FOREACHs aren't blowing up the execution tree.

  • Consider replacing FOREACH with WITH ... UNWIND. For example:

Instead of:

FOREACH (x IN q1 |
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c))

Do this:

WITH a, line, q1
UNWIND q1 AS x
CREATE (c:cast {name:x})
CREATE (a) -[:cast]->(c)

Thanks for helping. I did a test on the cast node only as your suggestion.

The query looks like:
:auto USING PERIODIC COMMIT 1000

LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row

MERGE (a:movie {name:row.id,title:row.title})

WITH

split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1

UNWIND q1 AS x

MERGE (c:cast {name:x})

MERGE (a) -[:cast]->(c)

However the result is still:
ERROR Neo.DatabaseError.Statement.ExecutionFailed

Java heap space

There were some Eager operations in the query plan, those disable periodic commits.

In this case it looks like it can be fixed pretty easily. You need to include a in your WITH clause so it remains in scope, otherwise your MERGE at the end won't use the a node from earlier, but will create a new blank node instead, which isn't what you want.

Also, you want to make sure you have indexes or unique constraints on:
:movie(title)
:cast(name)

That will speed up your MERGE operations. If you do an EXPLAIN of the query, you want to make sure you do NOT see any NodeByLabelScans, that indicates that you are lacking an index for a MATCH or MERGE operation, which may be costly over larger data sets.

After your indexes are created and your EXPLAIN looks clear, try this, it includes a in scope in your WITH clause, and should get rid of any Eager operations in your plan, which should let the periodic commit do its thing and keep heap usage low.

Try this:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///netflix_movie.csv' AS row
MERGE (a:movie {name:row.id,title:row.title})
WITH a, split(replace(replace(replace (row.cast,'[',''),"'",''),']',''),',') AS q1
UNWIND q1 AS x
MERGE (c:cast {name:x})
MERGE (a) -[:cast]->(c)