Virtual graph grouping with hierarchy

Hi everyone,

I new to neo4j. I’m doing an initial investigation to understand whether neo4j will be suitable for my problem, and if so, how technically.

I have events that have happened to patients, the order in which they occur is relevant. I have 10’s of thousands of patients and each patient can have 10’s of events. I need to group these patients by their events and the order they occur. This is to understand what the most common path of events. The diagram below hopefully elaborates. I will also want filter out patients by say their gender, etc but I think this is fairly straight forward. Also, is this a fairly trivial problem for neo4j in terms of performance?

I’ve had a play around with the movie graph example and I think I need to use visual graph grouping. What I’ve struggled to find examples of, is a way to group while maintaining the graph hierarchy.

Thanks,
Ali

Grouping

CREATE (David:Person {name:'David'})
CREATE (Ali:Person {name:'Ali'})
CREATE (Danny:Person {name:'Danny'})
CREATE (Amin:Person {name:'Amin'})

CREATE (A1_1:Event_01 {name:'A'})
CREATE (A1_2:Event_01 {name:'A'})
CREATE (B1:Event_01 {name:'B'})
CREATE (C1:Event_01 {name:'C'})

CREATE (D2_1:Event_02 {name:'D'})
CREATE (D2_2:Event_02 {name:'D'})
CREATE (A2:Event_02 {name:'A'})

CREATE (E3:Event_03 {name:'E'})
CREATE (C3:Event_03 {name:'C'})

CREATE
 (David)-[:NEXT]->(A1_1), (A1_1)-[:NEXT]->(D2_1), (D2_1)-[:NEXT]->(E3),
 (Ali)-[:NEXT]->(A1_2), (A1_2)-[:NEXT]->(D2_2), (D2_2)-[:NEXT]->(C3),
 (Danny)-[:NEXT]->(B1), (B1)-[:NEXT]->(A2),
 (Amin)-[:NEXT]->(C1)
;
call apoc.nodes.group(['Event_01','Event_02', 'Event_03'],['name']);

It depends, I think we need you to say more about how you're planning on using this.

Note that the events don't have to be linked to one another. If they have dates on them, you can date order and still get them in the right order, even if all of the events are linked to the same patient.

One thing that's not clear to me about the question is if the events cluster in any way. Imagine a patient who comes to get diabetes treated (this could create a "thread" of many events/encounters) but who separately is being treated for a comorbidity like heart disease (this could create an only marginally related "thread" of other events/encounters). Do you have one thread per patient, or many? You might consider creating a node per thread, and then link all of the events to the "Thread node". Then link all of the threads to the patient, meaning you have a hierarchy, while retaining order of the individual event sequences.

Hi David,

Thanks for replying.

I think we need you to say more about how you're planning on using this.
The plan is to use something like d3.js to create dynamic visualisations to show the most common path taken by the patients, possibly a graph and sankey diagram combined. The user may then choose to filter e.g. by gender and then the visualisation updates accordingly.

Note that the events don't have to be linked to one another. If they have dates on them, you can date order and still get them in the right order, even if all of the events are linked to the same patient.
Would be good to know how to do this, would save me some upfront data wrangling.

Do you have one thread per patient, or many?
At the moment, it's one thread per patient.

Thanks,
Ali

I'm making this up, but this is what I mean by a "Thread".

CREATE (p:Patient { name: "Bob" })
CREATE (t:Thread { name: "Diabetes" })
CREATE (e1:Event { name: "Do a Thing", date: date("2019-09-30") })
CREATE (e2:Event { name: "Do another thing", date: date("2019-10-01") })
CREATE (p)-[:THREAD]->(t)
CREATE (t)-[:EVENT]->(e1)
CREATE (t)-[:EVENT]->(e2)

Bob has a diabetes thread where he did 2 things. Each has a date. By doing:

MATCH (t:Thread)-[:EVENT]->(e:Event)
RETURN e
ORDER BY e.date

You'll never lose ordering on the events. But notice that the events aren't connected to one another. They're grouped by a "Thread" object. To temporally order things, all you need is a "date" field, you don't need relationships between events.

I would model this in Neo4j as you have illustrated above.

Then to count all sequence occurences (including subsequences) you can expand all sequences with a recursive path expansion Cypher query:

match p=(firstevent:Event)-[:NEXT*]->(:Event)
where not((firstevent)<-[:NEXT]-(:Event)
return p as sequence, count(1) as occurences

Finally, you can create a suffix tree from the results of this of this query

You should also be able to easily create a sankey diagram from the results of above query

David,

I've rewritten my example with dates, as you've described (I haven't added Thread). How do I then do the grouping to get the result I want.

CREATE (David:Person {name:'David'})
CREATE (Ali:Person {name:'Ali'})
CREATE (Danny:Person {name:'Danny'})
CREATE (Amin:Person {name:'Amin'})


CREATE (DavidA:Event {name:'A', date: date("2019-09-01")})
CREATE (DavidD:Event {name:'D', date: date("2019-09-02")})
CREATE (DavidE:Event {name:'E', date: date("2019-09-03")})


CREATE (AliA:Event {name:'A', date: date("2019-09-04")})
CREATE (AliD:Event {name:'D', date: date("2019-09-05")})
CREATE (AliC:Event {name:'C', date: date("2019-09-06")})

CREATE (DannyB:Event {name:'B', date: date("2019-09-01")})
CREATE (DannyA:Event {name:'A', date: date("2019-09-02")})

CREATE (AminC:Event {name:'C', date: date("2019-09-01")})


CREATE
 (David)-[:NEXT]->(DavidA), (David)-[:NEXT]->(DavidD), (David)-[:NEXT]->(DavidE),
 (Ali)-[:NEXT]->(AliA), (Ali)-[:NEXT]->(AliD), (Ali)-[:NEXT]->(AliC),
 (Danny)-[:NEXT]->(DannyB), (B1)-[:NEXT]->(DannyA),
 (Amin)-[:NEXT]->(AminC)
;

Thanks,
Ali

Hi Niclas,

Are you referring to my solution or David's solution?

Thanks,
Ali

Hi,

Sorry for the confusion. I was referring to your original post where you illustrate the sequences

/Niclas

Thanks for clarifying Niclas.

It's doesn't give precisely what I want. There are two A--->D subgraphs and node C is missing.

Also, there was a missing bracket:

match p=(firstevent:Event)-[:NEXT*]->(:Event)
where not((firstevent)<-[:NEXT]-(:Event))
return p as sequence, count(1) as occurences

What I need is as follows, there is a count property in nodes A--->D, with a value of 2 in each.

Ali

1 Like