OutOfMemoryError with USING PERIODIC COMMIT LOAD CSV


(Wergeland) #1

We have written a sightly complex LOAD CSV script to load a few million lines. Unfortunately, I can't post the script as is, but each line results in two nodes, where one has an extra label (set with apoc.create.addLabel), an edge between the two nodes, and optionally two more edges to existing nodes. We also have implemented custom triggers to set created_by, created_at, updated_by, and updated_at for all created/updated nodes and edges.

When I run the script with LIMIT up to 10000, it succeeds, but with USING PERIODIC COMMIT even as low as 100, it fails with OutOfMemoryError.

We are trying to read the CSV file as provided (no preprocessing). We could of course read the files in another way, but none the less I would like to understand the behaviour of USING PERIODIC COMMIT.

For solution, I am considering:

  • Try apoc.periodic.commit and pass in values to SKIP and LIMIT (if this behaves different from USING PERIODIC COMMIT).
  • Preprocess/split the file.
  • Read the file using an application to batch the transactions.

Any input is greatly valued!

Best regards,

Øyvind Matheson Wergeland


(Michael Hunger) #2

It very likely turns into an EAGER operation where cypher pulls in the whole file through each step, basically disabling periodic commit.

Usually apoc.periodic.iterate solves it for you putting the load csv into the driving statement and the remainder into the action statement.

If possible share the EXPLAiN output of your statement.


for eager see:


(Wergeland) #3

Thank you for the reply, and the link to more information about EAGER. (I had just found another post referring to problems with LOAD CSV and EAGER, without explaining what it actually means.) Profiling the script indeed reports an execution plan with several EAGER operations. There are about 100 steps, so I don't know how useful it would be to post it.

We are doing a large initial import, while daily (or so) updates are quite small, but using the same format, so we're using the same script as a basis for both.

I will try running the initial import with apoc.periodic.commit.

Best regards,

Øyvind Matheson Wergeland.


(Michael Hunger) #4

Use apoc.periodic.iterate that works better for imports.


(Wergeland) #5

I was just going to post that we successfully imported using apoc.periodic.iterate, but it was somewhat slow (1 hour for 6,5 million lines). We managed to optimise the query somewhat, but were not able to get rid of all EAGER.

The EAGERs I have not been able to get rid of, comes from that we're trying to detect name changes, and preserve historic names. A distilled version of the script follows:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:/companies.csv' AS line
MERGE (c:company { id:line.id })
SET
  c.name = line.name,
  c.type = line.type
// [removed code]
// Detect name change
WITH c,line
OPTIONAL MATCH (c)-[r:named { is_active:true }]->(name:name)
  WHERE name.name <> line.name
SET
  r.end_date = date(),
  r.is_active = false
WITH c,line
MERGE (name:companyName { name:line.name })
MERGE (c)-[r:named]->(name)
ON CREATE SET
  r.start_date = date(),
  r.is_active = true

Could this be a case where we should read the CSV file in two passes - first create the names, and then just use MATCH in the second pass?


(Michael Hunger) #6

yes either multi-pass or use periodic iterate -> what was the exact call you used?

are you sure you have indexes/constraints for company(id) and :companyName(name) ? (odd choice of labels?)


(Wergeland) #7

Yes, there are unique constraint on both properties. And the labels and properties have been renamed from the actual script, so I see why you find them a bit odd.

I ran it this way:

CALL apoc.periodic.iterate(
  "LOAD CSV WITH HEADERS FROM 'file:/companies.csv' AS line RETURN line",
  <import script>,
  { batchSize: 10000, parallel: false }
);

I assume the best would be to refactor the script to get rid of the two EAGERs, but it is running within acceptable time, as we will only have to do the initial import once (per environment).


(Michael Hunger) #8

Hmm looks ok to me.
It should do iterateList:true automatically.

perhaps you can try this:

EXPLAIN
WITH {} AS line
MERGE (c:company { id:line.id })
SET
  c.name = line.name,
  c.type = line.type
// [removed code]
// Detect name change
WITH c,line
OPTIONAL MATCH (c)-[r:named { is_active:true }]->(name:name)
  WHERE name.name <> line.name
SET
  r.end_date = date(),
  r.is_active = false
WITH c,line
MERGE (name:companyName { name:line.name })
MERGE (c)-[r:named]->(name)
ON CREATE SET
  r.start_date = date(),
  r.is_active = true

(with your setup and share the query plan).


(Wergeland) #9

Of course:

+----------------------------------------------------------------------+
| Plan      | Statement    | Version      | Planner | Runtime   | Time |
+----------------------------------------------------------------------+
| "EXPLAIN" | "WRITE_ONLY" | "CYPHER 3.5" | "COST"  | "SLOTTED" | 390  |
+----------------------------------------------------------------------+

+------------------------------+----------------+---------------------------+---------------------------------------------+
| Operator                     | Estimated Rows | Identifiers               | Other                                       |
+------------------------------+----------------+---------------------------+---------------------------------------------+
| +ProduceResults              |              1 | r, line, name, name, r, c |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +EmptyResult                 |              1 | r, line, name, name, r, c |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Apply                       |              1 | r, line, name, name, r, c |                                             |
| |\                           +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply      |              1 | name, r, c                |                                             |
| | |\                         +----------------+---------------------------+---------------------------------------------+
| | | +SetProperty             |              1 | name, r, c                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +SetProperty             |              1 | name, r, c                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateRelationship |              1 | name, r, c                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +Argument                |              1 | name, c                   |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply      |              1 | name, r, c                |                                             |
| | |\                         +----------------+---------------------------+---------------------------------------------+
| | | +Optional                |              1 | name, r, c                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +ActiveRead              |              0 | name, r, c                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +Expand(Into)            |              0 | name, r, c                | (c)-[r:named]->(name)                       |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +LockNodes               |              1 | name, c                   | c, name                                     |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +Argument                |              1 | name, c                   |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Optional                  |              1 | name, r, c                |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead                |              0 | name, r, c                |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Expand(Into)              |              0 | name, r, c                | (c)-[r:named]->(name)                       |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Argument                  |              1 | name, c                   |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Apply                       |              1 | line, name, name, r, c    |                                             |
| |\                           +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply      |              1 | name, line                |                                             |
| | |\                         +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateNode         |              1 | name, line                |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +Argument                |              1 | line                      |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Optional                  |              1 | name, line                |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead                |              0 | name, line                |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Filter                    |              0 | name, line                | `name`.name = line.name                     |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +NodeByLabelScan           |              1 | name, line                | :companyName                                |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Eager                       |              1 | name, r, c, line          |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Apply                       |              1 | name, r, c, line          |                                             |
| |\                           +----------------+---------------------------+---------------------------------------------+
| | +SetProperty               |              1 | name, r, c, line          |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +SetProperty               |              1 | name, r, c, line          |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Eager                     |              1 | name, r, c, line          |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +OptionalExpand(All)       |              1 | name, r, c, line          | (c)-[r:named]->(name); `r`.is_active = true |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Argument                  |              1 | c, line                   |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Eager                       |              1 | c, line                   |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +SetProperty                 |              1 | c, line                   |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +SetProperty                 |              1 | c, line                   |                                             |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Apply                       |              1 | c, line                   |                                             |
| |\                           +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply      |              1 | c, line                   |                                             |
| | |\                         +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateNode         |              1 | c, line                   |                                             |
| | | |                        +----------------+---------------------------+---------------------------------------------+
| | | +Argument                |              1 | line                      |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Optional                  |              1 | c, line                   |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead                |              0 | c, line                   |                                             |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +Filter                    |              0 | c, line                   | c.id = line.id                              |
| | |                          +----------------+---------------------------+---------------------------------------------+
| | +NodeByLabelScan           |        4910263 | c, line                   | :company                                    |
| |                            +----------------+---------------------------+---------------------------------------------+
| +Projection                  |              1 | line                      | {line : {}}                                 |
+------------------------------+----------------+---------------------------+---------------------------------------------+