Slow performance on time series queries

performance
cypher

(Vjramirez) #1

Hello,

I'm testing different databases to storage time series. I have some results using cassandra and InfluxDB. Now, I am trying with Neo4j Community using the schema suggested in https://www.graphgrid.com/modeling-time-series-data-with-neo4j/
I have data from 30 sensors for each second in a month time lapse. If I try to query the 86400 data points for a day time lapse of one sensor, the cypher query:

MATCH (y:Year)-[:TC]->(m:Month)-[:TC]->(d:Day)-[:TC]->(h:Hour)-[:TC]->(mi:Minute)-[:TC]->(se:Second)-[:EXIST]->(o:Observation)-[:OBSERVED_BY]->(s:Sensor) WHERE y.value = 2018 AND m.value = 08 AND d.value=11 AND s.id = 'XXXXXX' RETURN s.id, o.date, o.value

takes almost 6 seconds to complete. Same query in other databases takes less than a second. I tried to use indexes for year, month, day but the result is even worse (9 seconds).

Currently, the database has ~ 52 million nodes and relationships, which is not even close to Big Data, and Neo4j is intended to be used in that sort of scenarios, hence, I can't understand why the query is slow.

Some other queries I am testing are : The data of a sensor in a week and in a month combined with aggregations : AVG, COUNT, MIN, MAX. Always Neo4j has shown less performance than cassandra and influxDB (and SQL Server).


(M. David Allen) #2

Unfortunately I believe that page could use some updating, it was written in 2015.

In recent 3.4+ releases of Neo4j we have temporal date types, which you can index. For most uses, they're goign to be far better than this time tree approach:

https://neo4j.com/docs/cypher-manual/current/syntax/temporal/

Consider putting dates/times on the observation, indexing that field, and then querying directly for the time property on the observation. It's going to be simpler and more performant.


(Vjramirez) #3

Thanks David for your reply.
Using your suggestion, now the query is:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date.year=2018 AND o.date.month=12 AND o.date.day=11 RETURN s.id, o.date, o.value;"
(I'm using PROFILE to get the execution time)

As you pointed out, the performance is better but it's still taken more than 4 seconds. I'm trying to get a model than can be compared with the results using cassandra, influx or SQL Server. Now I'm really far from that.


(Andrew Bowman) #4

I'm assuming you have an index on :Observation(date).

However even with that index, if you run a PROFILE of that query, you'll see that it's not using an index lookup. You can't use this piecemeal approach to lookup the date by index.

Instead, use a date type for comparison like this:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;"

You should see the index being used in the profile plan, and your query speed should improve.

There are other ways to specify the date object of course, refer to the temporal types documentation.


(Vjramirez) #5

Hello Andrew, thanks for your replay.
Currently, the property o.date is a datetime type (i.e. "2018-12-11T00:10:47Z") and there is an index on :Observation(date).

So, the query:

"PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;"

Returns 0 records.

Trying to convert the datetime to date, I used two aproximations:

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date >= datetime({year:2018, month:12, day:11, hour:0, minute:0, second:0}) AND  o.date < datetime({year:2018, month:12, day:12, hour:0, minute:0, second:0}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 11664003 total db hits in 12173 ms.

And...

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE date(o.date) = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 7014848 total db hits in 9019 ms.

As you can see, the performance is even worse.
Could you give me another possibility to improve the query?


(Andrew Bowman) #6

Can you expand the elements of both query plans? It's hard to determine which part of the query the operations refer to without them expanded.


(Vjramirez) #7

Hi Andrew,

These are the expanded query plans, I've created an index on :Sensor(id):

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]->(o:Observation) WHERE o.date >= datetime({year:2018, month:12, day:11, hour:0, minute:0, second:0}) AND  o.date < datetime({year:2018, month:12, day:12, hour:0, minute:0, second:0}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 14342405 total db hits in 9694 ms.

PROFILE MATCH (s:Sensor{id:'XXXXXX'})-[:MAKE_OBSERVATION]-&gt;(o:Observation) WHERE date(o.date) = date({year:2018, month:12, day:11}) RETURN s.id, o.date, o.value;

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 6928359 total db hits in 8143 ms.