Showing results for 
Search instead for 
Did you mean: 

Is the Distinct operator actually eager, or does it just involve a memory build-up?


The operator summary docs label the Distinct operator as "Eager". 

It doesn't seem like the type of operation that needs to be eager. De-duping can obviously be done in a streaming fashion.

The operator detail docs say:

The Distinct operator removes duplicate rows from the incoming stream of rows. To ensure only distinct elements are returned, Distinct will pull in data lazily from its source and build up state. This may lead to increased memory pressure in the system.

The "lazily" above suggests it's not eager, but that it may have the same memory implications as an eager operation.
So, is Distinct eager? Or is it actually streaming/lazy, but with memory implications?