Query times out when using apoc.periodic.commit, but removing the procedure wrapper and just running pure cypher is successful

Hi all, I've setup a 30 min transaction time out configuration in my Neo4j Community 3.5.9 instance. Most of my queries finish well before that. Rare use cases where I'm refactoring the graph, time out and then I rely on apoc.periodic.iterate as a first pass and if the part of the graph my query has to run through is too big, I then rewrite my query to use apoc.periodic.commit instead with the appropriate syntax. This two step approach has always worked for me with big updates, until now. I first tried apoc.periodic.iterate with the appropriate syntax adjustments and it timed out, then if I run the following query using apoc.periodic.commit it times out as well:

call apoc.periodic.commit("

MATCH (b:Encounter:PB:FY2017)<-[:HAS_ENCOUNTER]-(p:Patient)
-[:HAS_ENCOUNTER]->(a:Encounter:Inpatient_Type_HB:FY2017),
(e:Encounter:PB:FY2017)-[r:HAS_ADMIT]->(a)

where size((b)-[:NEXT]->()) = 0
and size((b)-[:PREVIOUS]->()) = 0
and size((b)-[:HAS_ADMIT]->()) = 0
and coalesce(e.min_svc_dt,e.adm_svc_dt) = coalesce(b.min_svc_dt,b.adm_svc_dt)

with p, b, a
limit {limit}

set p.complete_link = 'N',
p.complete_first_spec = 'N'

set b.unlink = 'Y'

merge (b)-[:HAS_ADMIT]->(a)

return count(*)

",{limit : 100}
)
;

The kicker is, that running the query above without the apoc wrapper and just pure cypher, it doesn't time out. I'm aware that the pure cypher version is equivalent to one pass of the multi pass iteration of the apoc.periodic.commit instance, but I thought each pass of the apoc.periodic.commit is an individual transaction in itself, hence each pass has 30 mins to complete not the overall query?

Pure cypher version:

MATCH (b:Encounter:PB:FY2017)<-[:HAS_ENCOUNTER]-(p:Patient)
-[:HAS_ENCOUNTER]->(a:Encounter:Inpatient_Type_HB:FY2017),
(e:Encounter:PB:FY2017)-[r:HAS_ADMIT]->(a)

where size((b)-[:NEXT]->()) = 0
and size((b)-[:PREVIOUS]->()) = 0
and size((b)-[:HAS_ADMIT]->()) = 0
and coalesce(e.min_svc_dt,e.adm_svc_dt) = coalesce(b.min_svc_dt,b.adm_svc_dt)

with p, b, a
limit 100

set p.complete_link = 'N',
p.complete_first_spec = 'N'

set b.unlink = 'Y'

merge (b)-[:HAS_ADMIT]->(a)

return count(*)
;

Results of first query with apoc.periodic.commit, 30 min timeout:
"
Neo.ClientError.Transaction.TransactionTimedOut

Neo.ClientError.Transaction.TransactionTimedOut: The transaction has been terminated. Retry your operation in a new transaction, and you should see a successful result. The transaction has not completed within the specified timeout (dbms.transaction.timeout). You may want to retry with a longer timeout.
"

Results of pure cypher query:

Profile plan of the successful query (since the failed query is a procedure call, the plan doesn't provide relevant info):

Label counts for the labels used in my query:

count(*) label
29325132 ":Encounter"
22239827 ":PB"
6109010 ":FY2017"
1856828 ":Patient"
455051 ":Inpatient_Type_HB"

Relevant indices:
"INDEX ON :Encounter(adm_svc_dt)"
"INDEX ON :Encounter(min_svc_dt)"

Although the query without apoc is successful, I would need to sit there and refresh it until zero results which would be avoided with apoc.periodic.commit. I know I could increase the transaction time out but I feel like there may be an optimization step I may be missing.

So point #1 -- because you're using count(*) as the return type, are you sure that this query ever returns nothing? I can see how the query works, but some of the items in your last with statement are still bound. It might be safer to return how many items you merged, rather than how many total items exist at the end of the query, considering that some of those items were from the match earlier and not the merge.

Second -- when you use apoc.periodic.commit, the "wrapper APOC" is one big transaction, which is committing sub-transactions if you will, for you. That one big transaction is still subject to the TX timeout that's configured in neo4j.

Imagine each round of processing took 10 seconds. And you need to do 10 batches. So it ought to take (roughly) 100 seconds. If then, your TX timeout was say 90 seconds, then you'd be guaranteed to fail, because apoc.periodic.commit can't finish in 90 seconds (it needs 100) - while at the same time a single batch still succeeds when run standalone! Make sense? If this is the cause, then you need to do some rough math (how many batches * average run time) and make sure your TX timeout is greater than that value.

The documentation for using apoc.periodic.commit requires "return count(*)" to be the last statement when using that proc., I'm returning the same in the pure cypher just to show that's exactly the same query. When nothing returns that's when the loop is ended from my understanding. Since I'm not using optional match, I don't see how anything would be returned for a partial match of the full pattern and where clause. The query I included was for a specific fiscal year FY2017 label, the same issue happened with other years and running the pure cypher version worked for those years until eventually the query returned nothing.

Thanks, Nelson Asport

Also, thanks for confirming that the transaction timeout applies to the bigger procedure call as well, I was thinking since each pass was a commit each pass had to be an isolated transaction.

Thanks, Nelson Asport

We would definitely recommend using apoc.periodic.iterate() instead of apoc.periodic.commit(), since in nearly all cases apoc.periodic.iterate() will be more efficient (since apoc.periodic.commit() will usually have to reprocess data that you already processed on each subsequent pass, eventually filtering out previously processed entries (over and over as many times as it has to loop), but in any case it's doing redundant data processing).

Could you try something like this instead:

CALL apoc.periodic.iterate("
MATCH (b:Encounter:PB:FY2017)<-[:HAS_ENCOUNTER]-(p:Patient)
-[:HAS_ENCOUNTER]->(a:Encounter:Inpatient_Type_HB:FY2017),
(e:Encounter:PB:FY2017)-[:HAS_ADMIT]->(a)

WHERE NOT (b)-[:NEXT]->()
AND NOT (b)-[:PREVIOUS]->()
AND NOT (b)-[:HAS_ADMIT]->()
AND coalesce(e.min_svc_dt,e.adm_svc_dt) = coalesce(b.min_svc_dt,b.adm_svc_dt)

RETURN p, b, a", "
SET p.complete_link = 'N',
p.complete_first_spec = 'N'
SET b.unlink = 'Y'
MERGE (b)-[:HAS_ADMIT]->(a)", {}) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages
1 Like

Is there a particular reason why "not" operator might be faster than checking the degree of the relationship with size() ? My SQL thinking tells me that perhaps avoiding a function call is the reason?

Thanks, Nelson Asport

The plans generated from either case should be equivalent, so there shouldn't be a performance difference.