Slow performance on time series queries

vjramirez · January 23, 2019, 3:51pm

Hello,

I'm testing different databases to storage time series. I have some results using cassandra and InfluxDB. Now, I am trying with Neo4j Community using the schema suggested in https://www.graphgrid.com/modeling-time-series-data-with-neo4j/
I have data from 30 sensors for each second in a month time lapse. If I try to query the 86400 data points for a day time lapse of one sensor, the cypher query:

MATCH (y:Year)-[:TC]->(m:Month)-[:TC]->(d:Day)-[:TC]->(h:Hour)-[:TC]->(mi:Minute)-[:TC]->(se:Second)-[:EXIST]->(o:Observation)-[:OBSERVED_BY]->(s:Sensor) WHERE y.value = 2018 AND m.value = 08 AND d.value=11 AND s.id = 'XXXXXX' RETURN s.id, o.date, o.value

takes almost 6 seconds to complete. Same query in other databases takes less than a second. I tried to use indexes for year, month, day but the result is even worse (9 seconds).

Currently, the database has ~ 52 million nodes and relationships, which is not even close to Big Data, and Neo4j is intended to be used in that sort of scenarios, hence, I can't understand why the query is slow.

Some other queries I am testing are : The data of a sensor in a week and in a month combined with aggregations : AVG, COUNT, MIN, MAX. Always Neo4j has shown less performance than cassandra and influxDB (and SQL Server).

david_allen · January 23, 2019, 3:58pm

Unfortunately I believe that page could use some updating, it was written in 2015.

In recent 3.4+ releases of Neo4j we have temporal date types, which you can index. For most uses, they're goign to be far better than this time tree approach:

Consider putting dates/times on the observation, indexing that field, and then querying directly for the time property on the observation. It's going to be simpler and more performant.

vjramirez · January 24, 2019, 8:58am

Thanks David for your reply.
Using your suggestion, now the query is:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date.year=2018 AND o.date.month=12 AND o.date.day=11 RETURN s.id, o.date, o.value;"
(I'm using PROFILE to get the execution time)

As you pointed out, the performance is better but it's still taken more than 4 seconds. I'm trying to get a model than can be compared with the results using cassandra, influx or SQL Server. Now I'm really far from that.

andrew_bowman · January 24, 2019, 12:02pm

I'm assuming you have an index on :Observation(date).

However even with that index, if you run a PROFILE of that query, you'll see that it's not using an index lookup. You can't use this piecemeal approach to lookup the date by index.

Instead, use a date type for comparison like this:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;"

You should see the index being used in the profile plan, and your query speed should improve.

There are other ways to specify the date object of course, refer to the temporal types documentation.

vjramirez · January 24, 2019, 1:20pm

Hello Andrew, thanks for your replay.
Currently, the property o.date is a datetime type (i.e. "2018-12-11T00:10:47Z") and there is an index on :Observation(date).

So, the query:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;"

Returns 0 records.

Trying to convert the datetime to date, I used two aproximations:

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date >= datetime({year:2018, month:12, day:11, hour:0, minute:0, second:0}) AND  o.date < datetime({year:2018, month:12, day:12, hour:0, minute:0, second:0}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 11664003 total db hits in 12173 ms.

And...

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE date(o.date) = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 7014848 total db hits in 9019 ms.

As you can see, the performance is even worse.
Could you give me another possibility to improve the query?

andrew_bowman · January 28, 2019, 7:18am

Can you expand the elements of both query plans? It's hard to determine which part of the query the operations refer to without them expanded.

vjramirez · January 28, 2019, 8:40am

Hi Andrew,

These are the expanded query plans, I've created an index on :Sensor(id):

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date >= datetime({year:2018, month:12, day:11, hour:0, minute:0, second:0}) AND  o.date < datetime({year:2018, month:12, day:12, hour:0, minute:0, second:0}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 14342405 total db hits in 9694 ms.

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]-&gt;(o:Observation) WHERE date(o.date) = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 6928359 total db hits in 8143 ms.

Topic		Replies	Views
Very slow performance when aggregate on node property Cypher performance	21	4682	December 21, 2020
Very slow performance when aggregate on node property Neo4j Graph Platform performance , migrated , cypher-tagged	2	318	March 5, 2023
Time Series Quantitative Data Newbie Questions	4	546	January 11, 2021
Slow query performance on filter by date on relationship Cypher performance	6	1718	September 16, 2020
Analytics on Neo4j General performance	1	304	August 19, 2020

Get Certified in June!

Slow performance on time series queries

Related topics