Apoc.periodic.iterate for apoc.load.jdbcUpdate method

Hey, everyone

I'm trying to use apoc for loading data to click house by means of apoc.periodic.iterate

My query is big (about 400k and more, much more in future)

My code is as follows:

CALL apoc.periodic.iterate(
"
MY_BIG_QUERY return row1, row2
",
"
CALL apoc.load.jdbcUpdate("jdbc:my_clickhouse:///?user=&password=**", "INSERT INTO test.new_speed_test VALUES (?, ?)", [row1, row2])
yield row
return count(
)
",
{batchSize:10000, parallel:true}
)

Problems:

  1. It is slow, how can I increase the speed? It's about 1k records per 2 seconds
  2. As I use batches, I was expecting that click house would add records by portions too. But executing count(1) on click while the code is running in Neo, is not divisible by 10000 - batch size. Why doesn't that work like that?

The apoc.periodic.iterate method batches the results into batches of 10,000. It then effectively calls the second query for each result in the batch via an 'unwind' operation of the results, but in one transaction. As such, the apoc.load.jdbcUpdate is going to be called once per record, which means it is going to open/close a connection for each row. I would suspect this takes is adding time to your execution performance.

The default 'batching' approach is what I described above. You could try changing the 'batchMode' to 'BATCH_SINGLE' and unpack the batch results into one update statement. I gave it a try with this:

CALL apoc.periodic.iterate(
"
MY_BIG_QUERY return row1, row2
",
"
WITH (s="", i in $_batch, s + "("+i.row1+","+i.row2+"),") as x
WITH left(x,size(x)-1) as data
WITH "jdbc:my_clickhouse:///?user=&password=**", "INSERT INTO test.new_speed_test VALUES " + data as jdbcUpdateStatement
CALL apoc.load.jdbcUpdate(jdbcUpdateStatement)
yield row
return count()
",
{batchSize:10000, parallel:true, batchMode: 'BATCH_SINGLE'}
)

Thank you for your reply!

It helped a lot to get, what was wrong and what should be done, but the code itself caused error about the ',' in this string: WITH (s="", i in $_batch, s + "("+i.row1+","+i.row2+"),") as x

I didn't understand this structure: (s='', I in iterate..., action), are u sure this is correct? Looks like syntax for the function reduce

I changed code to this and it worked for me:

CALL apoc.periodic.iterate(
"
MATCH(a:Address{blockchain:'btc'})--(tx:Transaction{blockchain:'btc'})
where (a.entity_id_after_clusterization =~ 'cluster_.*') and (not tx.transaction_type is NULL) and (tx.was_clustered is NULL)
return distinct a.address AS row1, a.entity_id_after_clusterization AS row2
",
"
WITH [x IN $_batch | {row1: x.row1, row2: x.row2}] as rows
WITH replace('INSERT INTO test.new_speed_test VALUES ' + reduce (acc='', row in rows | acc + \"('\" + row.row1 + \"', '\" + row.row2 + \"'), \"), '`', '') as jdbcUpdateStatement

        CALL apoc.load.jdbcUpdate('jdbc:clickhouse://***/?user=***&password=***', jdbcUpdateStatement)
        yield row
        return count(*)
    ",
    {batchSize:1000, parallel:true, batchMode: 'BATCH_SINGLE'}
    )

If you help me to understand, what you meant, I would be great full
But...cypher and apoc do not have a ready function or method to load data to click house?? Really?

So glad it helped and you figured it out.

In response to your question, it looks as if I messed up. I forgot to include the ‘reduce’ clause in front of the parenthesis. Sorry if this caused you extended time and frustration.

yeah, it took some time of me))
But anyway thank you! You really helped a lot (not only in this topic, by the way)

1 Like