How to improve the performance of a query that compares properties relative to other nodes?

shane.a.husson · December 19, 2018, 4:48pm

I have a simple model in which a Patient has many events. Each event has a start date and a category like 'medication'.

It is normal for a patient to have 10s of events of each category.

I would like to find patients where:

a measurement event happened after an admission
and an operation event happened after the same admission event
and the admission event itself happened after a medication event

I've modelled the data in neo4j like:

(:Patient {id: "ABCD"})-[:HAS]-&gt;(:Event {startDate: '2018-01-01', category: 'medication'})

Which produces a graph like:

The query I've come up with is:

MATCH p1 = (measurement:Event {category: 'measurement'})&lt;-[:HAS]-(patient:Patient)
MATCH p2 = (admission:Event {category: 'admission'})&lt;-[:HAS]-(patient)
MATCH p3 = (operation:Event {category: 'operation'})&lt;-[:HAS]-(patient)
MATCH p4 = (medication:Event {category: 'medication'})&lt;-[:HAS]-(patient)
WHERE date(measurement.startDate) &gt; date(admission.startDate)
    AND date(operation.startDate) &gt; date(admission.startDate)
    AND date(admission.startDate) &gt; date(medication.startDate)
RETURN collect(distinct patient.pseudoId);

However this query times out with my test dataset of 1000 patients on my machine. The Explain produces:

Is there more efficient way to model or query the data in neo4j?

I've uploaded my test data here.

stefan.armbruster · December 19, 2018, 4:59pm

Some ideas to improve the data model:

convert your category property into a second label for :Event nodes, so you have e.g. :Event:Measurement nodes.
consider adding [:AFTER] relationships between Events of the same person. So you have a timeline of that patient explicitly and don't need to compare dates - that approach is what I call "graph thinking" in contrast to "relational thinking" (comparing dates)

shane.a.husson · December 20, 2018, 9:41am

Thanks for the ideas @stefan.armbruster.

Can you point to any examples of a model that has :AFTER? This approach sounds promising but we also have queries that want to know events that happened before or on the same day/month/year as another event. So I'm concerned that it will be complicated to create the graph and the cost will also be high.

stefan.armbruster · December 20, 2018, 9:50am

On important factor for a decision is IMO if you need to do cross-patient comparisions on dates.

If you only do this in a non-cross-patient way I'd maintain for each patient a chain of events. If "same day" is an important aspect of your domain, you can overlay the rather generic AFTER relationship with SAME_DAY relationships.

shane.a.husson · December 20, 2018, 9:54am

Yeah we only need to do this in a non-cross-patient way.

If you only do this in a non-cross-patient way I'd maintain for each patient a chain of events. If "same day" is an important aspect of your domain, you can overlay the rather generic AFTER relationship with SAME_DAY relationships.

Ok, I'll give it a shot.

michael.hunger · December 20, 2018, 12:13pm

You can also drive the query with a USING SCAN ON patient so it doesn't do the index lookups but instead filters.

michael.hunger · December 20, 2018, 1:21pm

Hmm I played around a bit with it, my original ideas (turning the categories into labels or rel-types) didn't help so much due to the 4x expand that the planner does.

It's much better to do the predicate on a collection per patient.

profile
MATCH (patient:Patient)
WITH patient, [(patient)-[:HAS]->(e:Event) WHERE e.category IN ['measurement','admission','operation','medication'] | e] as events
WITH patient, apoc.map.groupByMulti(events,'category') as grouped
WHERE all(pair IN [['admission','measurement'],['admission','operation'],['medication','admission']] WHERE ANY(first IN grouped[pair[0]] WHERE ANY(second IN grouped[pair[1]] WHERE first.startDate < second.startDate)))
RETURN size(collect(distinct patient.pseudoId));
// Started streaming 1 records after 85 ms and completed after 85 ms.
// 959

TODO does it have to be the same admission or not?
If yes then the predicate has to be adapted a bit.

profile
MATCH (patient:Patient)
WITH patient, [(patient)-[:HAS]->(e:Event) WHERE e.category IN ['measurement','admission','operation','medication'] | e] as events
WITH patient, apoc.map.groupByMulti(events,'category') as grouped
WHERE any(admission IN grouped['admission'] WHERE 
          ANY(medication IN grouped['medication'] WHERE admission.startDate > medication.startDate) AND 
          ANY(operation IN grouped['operation'] WHERE admission.startDate < operation.startDate) AND 
          ANY(measurement IN grouped['measurement'] WHERE admission.startDate < measurement.startDate)
      )
RETURN size(collect(distinct patient.pseudoId));
// Started streaming 1 records after 127 ms and completed after 127 ms.
// 875

shane.a.husson · December 20, 2018, 3:50pm

Thanks @michael.hunger, the query you suggested is much faster and seems to scale as more filters are added. I'll continue to investigate it.

Topic		Replies	Views
Sequential query with order and co-occurence Cypher apoc , cypher , relationship , knowledge-base	3	480	August 5, 2021
How to compare datetime with only month and day in Cypher Cypher cypher , knowledge-base	17	350	April 17, 2024
How to create relationships between consecutive nodes based on Date property? Cypher	8	1274	November 24, 2020
Best way to create timeline from events? Neo4j Graph Platform migrated	5	403	July 6, 2022
Slow query performance on filter by date on relationship Cypher performance	6	1721	September 16, 2020

August 🏄 🏖️ 🏊

How to improve the performance of a query that compares properties relative to other nodes?

Related topics