I have a rather large cypher query to import a CSV, the current version that exists works fine, but I'm trying to add one bit to it which seems to be blowing up the server.
Here's the current (working) version:
using periodic commit load csv with headers from 'https://s3..csv' as r
// lookup the analysis node to connect to later on
match (q:Analysis) where ID(q)=$id
// create the variant node if it doesn't exist
merge (v:Variant {name: r.variant_id})
on create set v.field = ...
// create the consequence node
create (n:TranscriptConsequence {
// ~120 fields being initialized from the CSV here
})
// create connections
merge (n)-[:PRODUCED_BY]->(q)
merge (n)-[:FOR_VARIANT]->(v)
Now, what I'm attempting to do is take one of the fields in the CSV (a gene_id
) and - if it exist - create a connection to a Gene
node already in the database. So, I've modified the query like so:
match (q:Analysis) where ID(q)=$id
// lookup the optional gene for this consequence
optional match (g:Gene) where r.gene_id is not null and g.name = r.gene_id
// ... create nodes + relationships ...
// optionally create a relationship to the gene
with n, case g when null then [] else [g] end as gs
foreach (g in gs | merge (n)-[:FOR_GENE]->(g))
Adding this bit of code ends up hanging the server. The query never inserts anything into the database, there are no cypher errors/warnings, and all cypher queries done from then on never return (or timeout for that matter). It's as if the server has locked the DB and every future query is waiting for a global transaction to finish. I have to kill -9
the server and restart it.
The last log entries that have any new information are in debug.log:
2019-01-05 02:54:46.490+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by scheduler for time threshold @ txId: 321297 checkpoint started...
2019-01-05 02:54:48.002+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 321297 to [/home/ec2-user/efs/neo4j/data/databases/graph.db/neostore.counts.db.b], from [/home/ec2-user/efs/neo4j/data/databases/graph.db/neostore.counts.db.a].
2019-01-05 02:57:49.822+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by scheduler for time threshold @ txId: 321297 checkpoint completed in 3m 3s 326ms
2019-01-05 02:57:50.866+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 1070
I'm hoping someone here can point out to me the error of my ways or tell me a better way to do what I'd like?
I'm using neo4j version 3.4.4.
Thanks!