Curious how streaming of results works

f1-outsourcing · March 4, 2025, 10:52pm

I am curious how to approach streaming of data to clients. I guess you have to structure your cypher query in such a way that the results are able to stream and be displayed when received. The other thing is of course how does neo4j stream, does it send data as fast as possible or is there some sort of communication where the client notifies it has processed x amount of samples and is now ready to receive another x amount of samples. Also interesting is, how does this go when two clients execute the same query, but at a different time. Can you then 'reuse' the data of the client that already received these results?

[1]

grant_lodge · March 5, 2025, 7:28am

Hi @f1-outsourcing
Our drivers and bolt protocol stream records one by one in batches.
The way this works under the hood is the driver issues a query that opens a cursor on the server and then sends requests to read from that cursor.

If you ever log driver logs at the debug level, you will see these messages pipelined so that the pulling starts as soon as the values are available.

--> RUN ...
--> PULL X
<-- SUCCESS 
<-- RECORD ...
<-- SUCCESS {has_more:true}

Generally, on the server, as soon as the records are ready, they are written to the network up until the end of a batch, at which point it sends a success and if there is any more data. When data is written to the network, the network may or may not flush depending on the size of the data written; once all data for the batch is written, the server will then flush.

When the driver sees the SUCCESS {has_more:true}, and has processed that batch(some variation in behavior of drivers here) it will send another PULL X, note: This is abstracted for users, and we just expose an iterator of records in the driver.

how does this go when two clients execute the same query, but at a different time. Can you then 'reuse' the data of the client that already received these results?

This is not something the driver does; if the reuse of results is viable really depends more on your model and whether those results are still valid for reuse, to make this work would require some sort of caching layer far beyond the scope of a driver.

f1-outsourcing · March 21, 2025, 9:25am

Ok so basically the only thing I can do is optimize then the query I send.

This obviously streams nicely the results, whenever a record result is available it can be immediately send

match(n) return n

But when this is to much, and I need other data first, I can eg add some sorting of records that I want to have first.

MATCH (n)
RETURN n
ORDER BY head(labels(n))

But I guess this will prevent the database from transmitting as early as the first example because it needs to wait for the sorting to finish.

So a better alternative would be to identify some other criteria and issue multiple queries

MATCH (n)-[:REL]->(category:Label {name: 'category1'})
RETURN n
ORDER BY head(labels(n))

Is this a correct approach or are there other options

joshcornejo · March 21, 2025, 12:39pm

When you create a query, that query is "decomposed" into parts to produce an execution plan (prefix your query with EXPLAIN and you'll get that graph) - some of those parts could be done in parallel if you have a cluster (e.g. "fetch all nodes from the servers"), some need to be a singleton (e.g. a "join" or "order by").

But once the last step started processing, it is likely the results are streamed as they are produced.

Topic		Replies	Views
Streaming results with python vs javascript Python	3	522	April 22, 2022
Streaming results with python vs javascript Drivers & Stacks migrated	1	232	June 13, 2022
Streaming and Batching Results of a Query ETL-Tool apoc , performance , cypher	0	290	December 5, 2023
Reactively stream time-ordered nodes? General streaming , neo4j-streams	0	434	June 18, 2020
Can Neo4j 4.1 reactively stream time-ordered nodes? Neo4j Graph Platform	0	277	June 29, 2020

August Summer Fun!

Curious how streaming of results works

Related topics