Cypher query speed slows dramatically after every few CSV imports

  • Version: 3.5.8
  • Edition: Community

Problem

I have a neo4j database whose data is replaced hourly via CSV import. After importing there are 6 distinct node types, 5 indexes, and 1042 total nodes.
Every so often after this CSV import runs (maybe 1/30 imports) neo4j slows down significantly, and won't speed up again until the import runs again. When this happens, we simply rerun the csv import manually or wait for the next hour to roll around and neo4j becomes fast again.

Here's a profile of a Match query during slowness:

And here is the same profile after simply rerunning my csv import with the same data:

Question

What could be causing this slowness? What can we do to avoid it? What would be good to check the next time?

Some things I've checked / tried

  • I've compared the raw csv values with the resulting data, the number of nodes is the same during slowness as it is after and it matches the number of nodes in the csvs
  • I've made sure this isn't due to the actual data in the CSV, importing the same CSVs will fix the slowness
  • All 5 indexes are created during and after slowness

Debug.log during slowness:

2019-12-19 01:30:44.134+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 7186 seconds: MATCH(c:Ch {hostName: $hn})
return c.id
2019-12-19 01:30:44.149+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 2814 seconds: MATCH(c:Ch {id: $key})
       -[gr]-(a:Ag)
       -[ar]-(ad:Ad)
       -[fr]-(ff:Ff)
       -[sr]-(sp:Sp)
  WHERE sp.id IN $sps
  AND (:Cu)-[:HAS_AG]-(a)
  AND NOT ()-[:HAS_FILL_AG]->(a)
  AND ad.act = true
  AND ad.eD > datetime()
  AND ad.sD < datetime()
//AND a.eD > datetime()
//AND a.sD < datetime()
with distinct ad, sp, c, ff, a
RETURN ad.id, ad.nm, ad.iag, ad.ual, ad.ml, sp.id, c.hN as hn, c.spl as spl, ff.width, ff.height, a.isHq
2019-12-19 01:30:45.524+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 8236 seconds: MATCH(c:Ch {id: $key})
       -[gr]-(a:Ag)
       -[ar]-(ad:Ad)
       -[fr]-(ff:Ff)
       -[sr]-(sp:Sp)
  WHERE sp.id IN $sps
  AND (:Cu)-[:HAS_AG]-(a)
  AND NOT ()-[:HAS_FILL_AG]->(a)
  AND ad.act = true
  AND ad.eD > datetime()
  AND ad.sD < datetime()
//AND a.eD > datetime()
//AND a.sD < datetime()
with distinct ad, sp, c, ff, a
RETURN ad.id, ad.nm, ad.iag, ad.ual, ad.ml, sp.id, c.hN as hn, c.spl as spl, ff.width, ff.height, a.isHq
2019-12-19 01:30:51.880+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 20253 checkpoint started...
2019-12-19 01:30:51.895+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 20253 to [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.a], from [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.b].
2019-12-19 01:30:51.912+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 20253 checkpoint completed in 31ms
2019-12-19 01:30:51.912+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 21
2019-12-19 01:30:54.886+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 10 seconds: MATCH(c:Ch {hostName: $hn})
return c.id
2019-12-19 01:38:29.422+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 464 seconds: MATCH(c:Ch {id: $key})
       -[gr]-(a:Ag)
       -[ar]-(ad:Ad)
       -[fr]-(ff:Ff)
       -[sr]-(sp:Sp)
  WHERE sp.id IN $sps
  AND (:Cu)-[:HAS_AG]-(a)
  AND NOT ()-[:HAS_FILL_AG]->(a)
  AND ad.act = true
  AND ad.eD > datetime()
  AND ad.sD < datetime()
//AND a.eD > datetime()
//AND a.sD < datetime()
with distinct ad, sp, c, ff, a
RETURN ad.id, ad.nm, ad.iag, ad.ual, ad.ml, sp.id, c.hN as hn, c.spl as spl, ff.width, ff.height, a.isHq
2019-12-19 01:47:42.049+0000 WARN [o.n.b.r.DefaultBoltConnection] The client is unauthorized due to authentication failure.
2019-12-19 01:47:42.585+0000 WARN [o.n.b.r.DefaultBoltConnection] The client is unauthorized due to authentication failure.
2019-12-19 02:04:05.996+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 2000 seconds: MATCH(c:Ch {id: $key})
       -[gr]-(a:Ag)
       -[ar]-(ad:Ad)
       -[fr]-(ff:Ff)
       -[sr]-(sp:Sp)
  WHERE sp.id IN $sps
  AND (:Cu)-[:HAS_AG]-(a)
  AND NOT ()-[:HAS_FILL_AG]->(a)
  AND ad.act = true
  AND ad.eD > datetime()
  AND ad.sD < datetime()
//AND a.eD > datetime()
//AND a.sD < datetime()
with distinct ad, sp, c, ff, a
RETURN ad.id, ad.nm, ad.iag, ad.ual, ad.ml, sp.id, c.hN as hn, c.spl as spl, ff.width, ff.height, a.isHq
2019-12-19 02:04:11.967+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 20260 checkpoint started...
2019-12-19 02:04:11.984+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 20260 to [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.b], from [/var/lib/neo4j/data/databases/graph.db/neostore.counts.db.a].
2019-12-19 02:04:12.010+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 20260 checkpoint completed in 43ms
2019-12-19 02:04:12.011+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 21

Replying to add more than 2 images:

Profile during slowness:

Profile after reimport:

The 2 extra db hits can be seen in the bottommost box on the right hand side

Is the difference between "fast" and "slow" here the difference between 70ms and 6ms? Because 70ms is already pretty low, that's less than 1/10th of a second. But yes it's 10x as long. There are some external factors here like garbage collection in the virtual machine, and your page cache size that you might want to look into and control for. Let us know what your heap/page cache settings are (although for a data set this small, it shouldn't matter unless you have other workload happening at the same time.

Also, are you destroying and re-creating the indexes every time?

All 5 indexes are created during and after slowness

Index build itself is probably unnecessary. Create one index and just leave it.

@david.allen Thanks for the reply!

Is the difference between "fast" and "slow" here the difference between 70ms and 6ms? Because 70ms is already pretty low, that's less than 1/10th of a second.

70ms is absolutely when this is slow, 6ms is what I'd expect to be the performance for this query at any point but instead I'm seeing 70ms after every 20-30 imports or so. I'd argue that 6ms is already pretty slow but that's neither here nor there.

Also, are you destroying and re-creating the indexes every time?
I'm not dropping the indexes, but I am rerunning the create index query with my import query. The full flow is:

  1. Run MATCH (n) DETACH DELETE n to clear nodes and relationships
  2. Run a bunch of LOAD CSV WITH HEADERS FROM ... AS row CREATE (sp:Sp {...}) with sp with associations and create index on :Sp(id); after data is loaded.

Let us know what your heap/page cache settings are
For both heap and page cache we have left it default but memrec says:

dbms.memory.heap.initial_size=1900m
dbms.memory.heap.max_size=1900m
dbms.memory.pagecache.size=512m