Neo4j kafka connect vs Neo4j streams

Hi Team,

I’m trying to migrate from neo4j community edition 4.4 to 5.26 LTS CE. I came across the change in neo4j streams plugin to neo4j kafka connect( Query strategy - Neo4j Connector for Kafka ). I have a concern.
How does the load profile of using Neo4j Kafka Connect( Query strategy - Neo4j Connector for Kafka ) differ from using the Neo4j Streams plugin (procedures + embedded connector) in terms of query execution overhead, transaction throughput, and impact on Neo4j performance?

We would like to know the performance impact on the primary application data persistence and data query execution process (in terms of response time and throughput), when Kafka Connector is performing a pull of data (via query) to enrich a secondary instance. Will it increase the query execution response time and has any impact on the overall neo4j primary instance’s performance.

1 Like

Hello @harsha.gennerahalli,

Thanks for raising this question.

The Neo4j Connector for Kafka uses a completely different architecture compared to neo4j-streams plugin. While streams plugin was running in process to the underlying Neo4j DBMS, Neo4j Connector for Kafka runs within a Kafka Connect cluster and query the Neo4j DBMS externally using official drivers.

I think the answer to your question is very much dependent on your use case, data model and change rate happening on the database. It's clear that the latter model will definitely incur data transmission upon the query execution, but I don’t expect this to be too dramatic for most scenarios. Furthermore, we have several customers using Neo4j Connector for Kafka in production without any problems.

An important side note I would like to put here is that, we have introduced official Change Data Capture (CDC) feature within our on-prem and Neo4j AuraDB offerings, and Neo4j Connector for Kafka 5.1 already has built-in support for that. It sounds a more scalable solution to your scenario.

Thanks,

Ali

2 Likes