How to debug NOT PART OF CHAIN! error

I have a CSV file with these data:
latitude,longitude,duplicate_ids
-84.93620649907857,-166.584936,"163,150"
-84.92621120161785,-176.6079961,"94,64,93"
-78.4748578001507,163.73635759999996,"64270335,64270336"
-78.45997140147799,163.7564968,"64270468,64272133,64272834"

The duplicate_ids columns contains 2-4 IDs of nodes already created in Neo4j with the property global_id. I am trying to run a Cypher query to parse the CSV file and create EQUAL_TO relationships between nodes with the global_id listed in the CSV rows. This is my query:
LOAD CSV WITH HEADERS FROM 'file:///duplicate-nodes-planet.csv' AS row
CALL {
WITH row
WITH split(row.duplicate_ids, ',') AS ids
WITH [id IN ids | toInteger(id)] AS integerIds
MATCH (n:WaterNode) WHERE n.global_id IN integerIds
WITH collect(n) AS nodes
WITH nodes AS n1, nodes AS n2
UNWIND n1 AS node1
UNWIND n2 as node2
WITH node1, node2 WHERE node1 <> node2
AND NOT (node1)-[:EQUAL_TO]-(node2)
AND node1.latitude = node2.latitude
AND node1.longitude = node2.longitude
MERGE (node1)-[r:EQUAL_TO]-(node2)
SET r.distance = 0
} IN TRANSACTIONS OF 1000 ROWS;

In a small dataset the query runs without any issues. On a large dataset I get this error after creating 9,897,806 relationships. It always stops at the same spot but unfortunately I could not find the CSV row where it fails. I get this error:
NOT PART OF CHAIN!
RelationshipTraversalCursor
[id=4293918719, open state with: denseNode=false, next=4293918719, , underlying record=Relationship[4293918719,used=false,source=-1,target=-1,type=-1,sCount=1,sNext=-1,tCount=1,tNext=-1,prop=-1, sFirst, tFirst]]

Could you please advise how to debug this issue or how to circumvent it to run the query without failing? Thank you.

This may be a tricky one :wink: Let's hope we can sort it out together.

Is this on a cluster?

The issue may be related to re-use of internal id's. There are some very hidden configuration to increase the time before an id can be reused.

# In neo4j 5.X
internal.dbms.cluster.raft.id_reuse.min_time=10m
internal.dbms.cluster.raft.id_reuse.max_time=20m
internal.dbms.cluster.raft.id_reuse.max_commits=5000

# In neo4j 4.4
causal_clustering.min_time_delay_id_reuse=10m
causal_clustering.max_time_delay_id_reuse=20m
causal_clustering.max_commits_delay_id_reuse=5000

It may also help (be easier) to split your csv file into smaller parts and process them one at a time. I would probably start with that. Easier to see if it is failing inside a specific part/faster to re-test.

Thank you for your advice. I'll split the CSV in two and see how the two parts run. I'll delete all relationships created and re-run the two CSVs.

BTW- I think in your algorithm you are going to create relationships in both directions. Is that what you intend, or do you just need one between each pair of nodes that are considered equal?

If not, change WHERE node1 <> node2 to WHERE elementId(node1) < elementId(node2)