Hi all, I've setup a 30 min transaction time out configuration in my Neo4j Community 3.5.9 instance. Most of my queries finish well before that. Rare use cases where I'm refactoring the graph, time out and then I rely on apoc.periodic.iterate as a first pass and if the part of the graph my query has to run through is too big, I then rewrite my query to use apoc.periodic.commit instead with the appropriate syntax. This two step approach has always worked for me with big updates, until now. I first tried apoc.periodic.iterate with the appropriate syntax adjustments and it timed out, then if I run the following query using apoc.periodic.commit it times out as well:
call apoc.periodic.commit("
MATCH (b:Encounter:PB:FY2017)<-[:HAS_ENCOUNTER]-(p:Patient)
-[:HAS_ENCOUNTER]->(a:Encounter:Inpatient_Type_HB:FY2017),
(e:Encounter:PB:FY2017)-[r:HAS_ADMIT]->(a)
where size((b)-[:NEXT]->()) = 0
and size((b)-[:PREVIOUS]->()) = 0
and size((b)-[:HAS_ADMIT]->()) = 0
and coalesce(e.min_svc_dt,e.adm_svc_dt) = coalesce(b.min_svc_dt,b.adm_svc_dt)
with p, b, a
limit {limit}
set p.complete_link = 'N',
p.complete_first_spec = 'N'
set b.unlink = 'Y'
merge (b)-[:HAS_ADMIT]->(a)
return count(*)
",{limit : 100}
)
;
The kicker is, that running the query above without the apoc wrapper and just pure cypher, it doesn't time out. I'm aware that the pure cypher version is equivalent to one pass of the multi pass iteration of the apoc.periodic.commit instance, but I thought each pass of the apoc.periodic.commit is an individual transaction in itself, hence each pass has 30 mins to complete not the overall query?
Pure cypher version:
MATCH (b:Encounter:PB:FY2017)<-[:HAS_ENCOUNTER]-(p:Patient)
-[:HAS_ENCOUNTER]->(a:Encounter:Inpatient_Type_HB:FY2017),
(e:Encounter:PB:FY2017)-[r:HAS_ADMIT]->(a)
where size((b)-[:NEXT]->()) = 0
and size((b)-[:PREVIOUS]->()) = 0
and size((b)-[:HAS_ADMIT]->()) = 0
and coalesce(e.min_svc_dt,e.adm_svc_dt) = coalesce(b.min_svc_dt,b.adm_svc_dt)
with p, b, a
limit 100
set p.complete_link = 'N',
p.complete_first_spec = 'N'
set b.unlink = 'Y'
merge (b)-[:HAS_ADMIT]->(a)
return count(*)
;
Results of first query with apoc.periodic.commit, 30 min timeout:
"
Neo.ClientError.Transaction.TransactionTimedOut
Neo.ClientError.Transaction.TransactionTimedOut: The transaction has been terminated. Retry your operation in a new transaction, and you should see a successful result. The transaction has not completed within the specified timeout (dbms.transaction.timeout). You may want to retry with a longer timeout.
"
Results of pure cypher query:
Profile plan of the successful query (since the failed query is a procedure call, the plan doesn't provide relevant info):
Label counts for the labels used in my query:
count(*) | label |
---|---|
29325132 | ":Encounter" |
22239827 | ":PB" |
6109010 | ":FY2017" |
1856828 | ":Patient" |
455051 | ":Inpatient_Type_HB" |
Relevant indices:
"INDEX ON :Encounter(adm_svc_dt)"
"INDEX ON :Encounter(min_svc_dt)"
Although the query without apoc is successful, I would need to sit there and refresh it until zero results which would be avoided with apoc.periodic.commit. I know I could increase the transaction time out but I feel like there may be an optimization step I may be missing.