It is slow, how can I increase the speed? It's about 1k records per 2 seconds
As I use batches, I was expecting that click house would add records by portions too. But executing count(1) on click while the code is running in Neo, is not divisible by 10000 - batch size. Why doesn't that work like that?
The apoc.periodic.iterate method batches the results into batches of 10,000. It then effectively calls the second query for each result in the batch via an 'unwind' operation of the results, but in one transaction. As such, the apoc.load.jdbcUpdate is going to be called once per record, which means it is going to open/close a connection for each row. I would suspect this takes is adding time to your execution performance.
The default 'batching' approach is what I described above. You could try changing the 'batchMode' to 'BATCH_SINGLE' and unpack the batch results into one update statement. I gave it a try with this:
CALL apoc.periodic.iterate(
"
MY_BIG_QUERY return row1, row2
",
"
WITH (s="", i in $_batch, s + "("+i.row1+","+i.row2+"),") as x
WITH left(x,size(x)-1) as data
WITH "jdbc:my_clickhouse:///?user=&password=**", "INSERT INTO test.new_speed_test VALUES " + data as jdbcUpdateStatement
CALL apoc.load.jdbcUpdate(jdbcUpdateStatement)
yield row
return count()
",
{batchSize:10000, parallel:true, batchMode: 'BATCH_SINGLE'}
)
It helped a lot to get, what was wrong and what should be done, but the code itself caused error about the ',' in this string: WITH (s="", i in $_batch, s + "("+i.row1+","+i.row2+"),") as x
I didn't understand this structure: (s='', I in iterate..., action), are u sure this is correct? Looks like syntax for the function reduce
I changed code to this and it worked for me:
CALL apoc.periodic.iterate(
"
MATCH(a:Address{blockchain:'btc'})--(tx:Transaction{blockchain:'btc'})
where (a.entity_id_after_clusterization =~ 'cluster_.*') and (not tx.transaction_type is NULL) and (tx.was_clustered is NULL)
return distinct a.address AS row1, a.entity_id_after_clusterization AS row2
",
"
WITH [x IN $_batch | {row1: x.row1, row2: x.row2}] as rows
WITH replace('INSERT INTO test.new_speed_test VALUES ' + reduce (acc='', row in rows | acc + \"('\" + row.row1 + \"', '\" + row.row2 + \"'), \"), '`', '') as jdbcUpdateStatement
If you help me to understand, what you meant, I would be great full
But...cypher and apoc do not have a ready function or method to load data to click house?? Really?
In response to your question, it looks as if I messed up. I forgot to include the ‘reduce’ clause in front of the parenthesis. Sorry if this caused you extended time and frustration.