What's wrong with apoc.periodic.iterate.sub-batching.cypher example code?

apoc
import

(SunPark) #1

Hello,
I tried apoc.periodic.iterate.sub-batching.cypher example code from https://gist.github.com/jexp/caeb53acfe8a649fecade4417fb8876a, but failed with belows error.

Failed to invoke procedure apoc.periodic.iterate: Caused by: org.neo4j.cypher.internal.util.v3_4.SyntaxException: Unknown function 'apoc.coll.partition' (line 2, column 7 (offset: 67))

apoc.periodic.iterate.sub-batching.cypher

CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'FILE:///dropd_noun.csv' AS line
WITH apoc.coll.partition(collect(line),10000) AS batchesOfLines
UNWIND batchesOfLines as batch
RETURN batch",
"UNWIND {batch} AS word
MERGE (w:Word {word: word.sentence_noun})",
{batchSize: 1, parallel: true});

I made similar cypher code with above apoc.periodic.iterate.sub-batching.cypher, it works. dropd_noun.csv has one column, sentence_noun column

LOAD CSV WITH HEADERS FROM 'FILE:///dropd_noun.csv' AS line
WITH collect(line) AS nounlists
UNWIND nounlists AS nounlist
CREATE (w:Word {word:nounlist.sentence_noun} )

Thank you,


(Michael Hunger) #2

Which apoc version do you have?
Does it find the function otherwise?

Why do you do it so complicated? It's all built into periodic iterate.

CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'FILE:///dropd_noun.csv' AS line
RETURN line",
"MERGE (w:Word {word: word.sentence_noun})",
{batchSize: 10000, iterateList:true, parallel: true});

(SunPark) #3

Here're the answers.

  1. apoc version? The plugins directory has only one file apoc-3.4.0.3-all.jar. Neo4j server version is 3.4.6 under ubuntu 18.04.
~/neo4j/plugins$ ls
apoc-3.4.0.3-all.jar
  1. Does it find the function otherwise? I don't know how to check it find the function otherwise. Please let me know how to show debug trace it find the other functions. And I checked apoc.coll.partition be in apoc-xxx.jar.
neo4j> CALL apoc.coll.partition([1,2,3,4,5,6], 5) YIELD value
       RETURN value;
+-----------------+
| value           |
+-----------------+
| [1, 2, 3, 4, 5] |
| [6]             |
+-----------------+

2 rows available after 6 ms, consumed after another 1 ms

3, Why do you do it so complicated? It's all built into periodic iterate.
The reason why I test apoc,periodic.iterate, I read below git comment.
https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/714.

3. Use apoc.periodic.iterate

* the biggest benefit of using iterate() is you don't need a large Heap memory anymore;
* iterate() can split large update into smaller batches to execute and submit, which keeps the Heap memory usage low;
* iterate() can also leverage the CPU power by running updates in parallel (set parallel:true). This works particularly well for SSD, but avoid using it on mechanical HD;

Thank you,
Sun


(Michael Hunger) #4

So partition a procedure, not a function, so it would have to be called differently in your first example.

But I wouldn't recommend that kind of use anyway, except if one really knows why they are using that approach, and suggest to use the built in functionality.