Neo4j streams replacement in neo4j 5.x

I'm trying to migrating from neo4j 4.4.42 community edition to neo4j 5.26.6 community edition as the EOL is nearing. I have a primary node and secondary node where the operations are executed against primary. secondary is kept in sync with primary using neo4j streams. I can see the neo4j streams support has been removed in neo4j 5.x.

Based on the documentation that i've seen, there are two alternate solutions:

  1. Apache kafka connect
  2. APOC triggers

I've got few questions related to this:

  1. Apache kafka connect uses querying to pull the data from primary and keep secondary in sync. Will there be more load on neo4j if I use kafka connect compared to neo4j streams?

  2. Is there chance that queries time out(I have neo4j transaction timeout of 30 seconds in neo4j.conf)? will the resources be locked until the apache kafka connect querying is completed(Again, would that not impact the performance of primary neo4j)?

    2.1. I’ve also maintained a neo4j transaction timeout of 30 seconds in neo4j.conf. Will the kafka connect queries timeout because query timeout can be set in both places?

  3. will APOC triggers add more load on neo4j compared to neo4j streams?

  4. Which is a better solution to implement?

Hi, thank you so much for reaching.

I would recommend you to use kafka connector’s latest version 5.1 and use query strategy to read data since you’re on community edition (which cdc is not available). On the other side you can use connector’s sink feature to keep secondary updated. This will bring you a strong integration between primary and secondary. Using query strategy might bring more load but I don’t think it’s gonna affect the overall performance.

On the timeout, it’s better to keep connectors <= database, so you don’t bump into an issue.

And read-only polls don’t use write locks, so they don’t block writers but they can still use some cpu, cache resources and I believe these won’t create any remarkable impact on the performance.

Thanks @emre.hizal for the quick response. your response got me thinking if it’s possible to compare the load that query strategy might bring in compared to neo4j streams? It would help me get a fair idea about how the resources can be assigned.
Also, I see that apache kafka connectors handle soft deletes only. is there a way to get around that?