LOAD CSV and apparently bogus RWLOCK deadlocks?

alec.muffett · June 4, 2019, 4:50pm

Hey all,

I am using Community Edition Neo4j 3.5.6 and APOC 3.5.0.2 on a 16 core 64Gb machine with an additional 64Gb of swap (that is never used, but just there for safety)

My machine is split into sharing memory 20/20/20/4 as described in:

...this seems stable now; it's small, but this is a proof of concept machine.

I am importing about 500M objects and about 3B relationships, with about 100 reltypes and about 50 labels/objtypes; the imports are from $big gigabytes of compressed, database-generated CSV file.

The Cypher code for importing data is statically generated from a python script, using templates, because (literally) 10,000 CSV files produced by data warehouse, and each one has to have its own LOAD CSV because LOAD CSV cannot be iterated over input filenames (eg: via UNWIND).

These templated cypher scripts (some over 5000 lines long) contain code like:

USING PERIODIC COMMIT 256
LOAD CSV WITH HEADERS
FROM 'file:/path/foo001.csv.gz' AS batch
MERGE (a:Account { uid:toInteger(batch.INTEGER__ID) })
SET
a.status = toInteger(batch.INTEGER__STATUS);

or

USING PERIODIC COMMIT 256
LOAD CSV WITH HEADERS
FROM 'file:/path/bar001.csv' AS batch
MATCH (x:Account { uid:toInteger(batch.INTEGER__X) })
MATCH (y:Account { uid:toInteger(batch.INTEGER__Y) })
MERGE (x)-[edge:Linky]->(y)
ON CREATE SET edge.reason = batch.REASON
ON MATCH SET edge.reason = batch.REASON;

...simply repeated for files foo001 through foo300 and so forth.

I am aware of and am considering eventual migration to neo4j-admin import as described here:

...which sounds amazing but I want to check that it fits my use case, which includes doing updates of pre-existing counters ON MATCH SET edge.weight = batch.more_weight + edge.weight; as well as other stuff like that.

Also, I am using small transactions of 256 having read in a previous posting that small transactions tend to reduce collisions.

Further I am following the advice of (I think it was) @michael.hunger, elsewhere in this community, to ORDER/GROUP my CSV data in order to minimise deadlock issues across parallel transactions.

I am also following Michael's advice to load distinct labels, and distinct relationships, in parallel:

WITH ALL OF THIS CONTEXT: here's the question:

I am collecting CSVs and importing them in phases, and executing them in parallel
in any given phase, the set of objects being imported, or relationships being imported, are completely disjoint
that is to say, I am importing (:A), (:B), (:C)-[:foo]->(:D) ... and in any given batch, none of A/B/C/foo/D are ever of the same type, and for each of A/B/C-foo-D the LOAD CSV is run serially for each shard/CSV input file.

In theory, there should be no lock contention? But no, I am wrong; it appears that LOAD CSV can tread upon itself:

LockClient[13625877] can't wait on resource RWLock[NODE(93708867), hash=515089118] since => LockClient[13625877] <-[:HELD_BY]- RWLock[NODE(469290148), hash=686715308] <-[:WAITING_FOR]- LockClient[13626010] <-[:HELD_BY]- RWLock[NODE(93708867), hash=515089118]

LockClient[13626299] can't wait on resource RWLock[NODE(469054858), hash=2000485859] since => LockClient[13626299] <-[:HELD_BY]- RWLock[NODE(469317257), hash=1199527701] <-[:WAITING_FOR]- LockClient[13626312] <-[:HELD_BY]- RWLock[NODE(469054858), hash=2000485859]

LockClient[13630461] can't wait on resource RWLock[RELATIONSHIP(2319588353), hash=477452471] since => LockClient[13630461] <-[:HELD_BY]- RWLock[NODE(411438449), hash=1187551531] <-[:WAITING_FOR]- LockClient[13630397] <-[:HELD_BY]- RWLock[RELATIONSHIP(2319588353), hash=477452471]

NB: 2x NODE deadlocks + 1x RELATIONSHIP deadlock, in examples above.

...I have perhaps a dozen of these, from the latest [edit: small, delta] import, and it seems that the problem occurs when the/a nodes is "hot", and has several (tens? hundreds?) of thousands of relationships pointing to it; this is the sort of thing/requirement that I have discussed previously:

I am not sure how I can mitigate this, if at all; possibly it is related to the issue described here:

github.com/neo4j/neo4j

Cannot wait for lock on unrelated relationship

opened 12:01AM - 27 Sep 18 UTC

closed 09:06AM - 25 Oct 18 UTC

themaco

team-kernel

Hi, while creating new relationships concurrently, I get: ```org.neo4j.driver.v…1.exceptions.TransientException: LockClient[8502] can't wait on resource RWLock[RELATIONSHIP(8527), hash=214738524] since => LockClient[8502] <-[:HELD_BY]- RWLock[RELATIONSHIP(8446), hash=1813255184] <-[:WAITING_FOR]- LockClient[8504] <-[:HELD_BY]- RWLock[RELATIONSHIP(8527), hash=214738524]```. However, if I check for relationships `8527` and `8446` they belong to totally unrelated nodes. I sort the nodes and acquire the lock of both ends of the relationships before creating it (see below), so deadlock is not a problem. ** Environment** - Neo4j version: 3.3.4 and 3.4.7 (single community instance) - Operating system: (Docker image on Mac OS High Sierra) - API/Driver: Java Driver 1.6.1 - Configurations: - dbms.lock.acquisition.timeout=1m - (driver `MaxTransactionRetryTime` = 15s) **Steps to reproduce** 1. Create 5 Person nodes:`CREATE (p:Person {name: person-n})` 2. Generate 10k random relationships between the 5 Person 3. Distribute the relationships across 8 threads 4. For each relationship: 4.1 Sort both ends by the name. If `personA > personB`, run: ``` WITH {personA} as personA, {personB} as personB MATCH (target:Person { `name` : personB }) MATCH (source:Person { `name` : personA }) SET target._lock=true SET source._lock=true WITH source, target CREATE (source)-[rel:KNOWS]->(target) ``` 4.2 else: ``` WITH {personA} as personA, {personB} as personB MATCH (source:Person { `name` : personA }) MATCH (target:Person { `name` : personB }) SET source._lock=true SET target._lock=true WITH source, target CREATE (source)-[rel:KNOWS]->(target) ``` Threads create relationships one at the time. Tested with implicit and explicit transactions and using `session.writeTransaction`.

...or perhaps it's because of the sorting/grouping, that the number of relations linking:

(foo)-[:Bar]-(where:City {name:"London"})

...greatly exceeds one-or-more transactions-worth of space.

The annoying thing here is that I apparently cannot fix this; all of the commentary that I see online about deadlocks, "solves" them by saying something like "simply trap the exception in Java and re-run the transaction"; but as above, I don't have any of that.

Hence this posting, with its vast amounts of context. Have I missed anything, please, or is anything in the above somehow a stupid choice, given what I am trying to achieve?

Thanks!

andrew_bowman · June 4, 2019, 7:00pm

A couple things to check:

Make sure you have relevant indexes and/or unique constraints already created before your loads (such as :Account(uid)). This can ensure good matching speed, and prevent use of time and memory when it has to use a label scan instead.

Don't parallelize your relationship loads, unless you can be absolutely sure the end nodes of the relationship won't collide with each other (or other transactions involving those nodes). Relationship creation must take locks on the start and end nodes of the relationship, that's likely what's happening here. So CREATE (a)-[:KNOWS]->(b) needs to take locks on a and b for the duration of the batch.

As the error messages you encountered have node and relationship ids, you can check for yourself what nodes or relationships are involved, maybe that can help you find nodes in the graph that are being referenced by multiple CSVs.

alec.muffett · June 5, 2019, 9:02am

Hi Andrew!

That's literally what I mean by in any given batch, none of A/B/C/foo/D are ever of the same type - that I select the objects and relationships for import on the basis that absolutely no two load-csvs should ever require locks on the same table.

From this, I am wondering whether individual LOAD CSVs are treading upon themselves, because...

Exactly; I have triaged 93708867 and 2319588353 from the above, and in all cases the/one node is a "hot" node ("has several (tens? hundreds?) of thousands of relationships pointing to it") — and so what I am wondering is how this can happen in a single and independent LOAD CSV?

Perhaps something in Neo is pipelining 2x transactions from a single LOAD CSV and they (the pipeline) are colliding with each other over the hot node?

Topic		Replies	Views
Concurrently Loading Hundreds of CSVs to Neo4j Import / Export apoc , operations , import	0	301	March 4, 2021
Load-CSV very slow with millions of nodes Import / Export load-csv , import , neo4j-import , csv , neo4j	10	11552	April 7, 2022
How to make load csv go faster? Cypher	17	4047	November 20, 2023
Help me merge 170M relationships with LOAD CSV Cypher load-csv	10	3639	October 23, 2019
Another "speed up the load" question from a relatively inexperienced Neo4j user Procedures & APOC apoc , cypher , import	37	1759	July 14, 2020

July Summer Fun!

LOAD CSV and apparently bogus RWLOCK deadlocks?

Related topics