Hello,
I'm testing different databases to storage time series. I have some results using cassandra and InfluxDB. Now, I am trying with Neo4j Community using the schema suggested in https://www.graphgrid.com/modeling-time-series-data-with-neo4j/
I have data from 30 sensors for each second in a month time lapse. If I try to query the 86400 data points for a day time lapse of one sensor, the cypher query:
MATCH (y:Year)-[:TC]->(m:Month)-[:TC]->(d:Day)-[:TC]->(h:Hour)-[:TC]->(mi:Minute)-[:TC]->(se:Second)-[:EXIST]->(o:Observation)-[:OBSERVED_BY]->(s:Sensor) WHERE y.value = 2018 AND m.value = 08 AND d.value=11 AND s.id = 'XXXXXX' RETURN s.id, o.date, o.value
takes almost 6 seconds to complete. Same query in other databases takes less than a second. I tried to use indexes for year, month, day but the result is even worse (9 seconds).
Currently, the database has ~ 52 million nodes and relationships, which is not even close to Big Data, and Neo4j is intended to be used in that sort of scenarios, hence, I can't understand why the query is slow.
Some other queries I am testing are : The data of a sensor in a week and in a month combined with aggregations : AVG, COUNT, MIN, MAX. Always Neo4j has shown less performance than cassandra and influxDB (and SQL Server).