What sampling method is used when calculating standard deviation using "stDev()" function

On this Neo4j Documentation page, it says that the function stDev():

Returns the standard deviation for the given value over a group for a sample of a population.

But it does not say anything about the sampling method. So, my question is: what the sampling method is being used?

it is an aggregating function. As such, it will calculate the standard deviation of the variable over the rows in the query result.

If you have a grouping value, it will aggregate over the rows for each grouping. Here are examples.

Sample Data:

create(:Employee{id:0,salary:45000,dept:"IT"}),(:Employee{id:1,salary:110000,dept:"IT"}),(:Employee{id:2,salary:75000,dept:"IT"}),(:Employee{id:3,salary:95000,dept:"HR"}),(:Employee{id:4,salary:225000,dept:"HR"}),(:Employee{id:5,salary:150000,dept:"FINANCE"}),(:Employee{id:6,salary:200000,dept:"FINANCE"}),(:Employee{id:7,salary:125000,dept:"FINANCE"})

Data:

Result of aggregating over the dept:

match(n:Employee)
return n.dept, n.salary

Result of aggregating over the entire company (no grouping):

match(n:Employee)
return stdev(n.salary)

This is how all the aggregation functions behave.

Hi Glilienfield,

Thanks for your answer. However, I'm interested in knowing about the sampling method used in the function itself.

It's not that Neo4j takes a sample... the formula for std deviation has 2 versions - one if your data represents the whole population and another if your data represents a sample or subset of the population.

1 Like