How to debug NOT PART OF CHAIN! error

tijuhasz · January 5, 2024, 8:48am

I have a CSV file with these data:
latitude,longitude,duplicate_ids
-84.93620649907857,-166.584936,"163,150"
-84.92621120161785,-176.6079961,"94,64,93"
-78.4748578001507,163.73635759999996,"64270335,64270336"
-78.45997140147799,163.7564968,"64270468,64272133,64272834"

The duplicate_ids columns contains 2-4 IDs of nodes already created in Neo4j with the property global_id. I am trying to run a Cypher query to parse the CSV file and create EQUAL_TO relationships between nodes with the global_id listed in the CSV rows. This is my query:
LOAD CSV WITH HEADERS FROM 'file:///duplicate-nodes-planet.csv' AS row
CALL {
WITH row
WITH split(row.duplicate_ids, ',') AS ids
WITH [id IN ids | toInteger(id)] AS integerIds
MATCH (n:WaterNode) WHERE n.global_id IN integerIds
WITH collect(n) AS nodes
WITH nodes AS n1, nodes AS n2
UNWIND n1 AS node1
UNWIND n2 as node2
WITH node1, node2 WHERE node1 <> node2
AND NOT (node1)-[:EQUAL_TO]-(node2)
AND node1.latitude = node2.latitude
AND node1.longitude = node2.longitude
MERGE (node1)-[r:EQUAL_TO]-(node2)
SET r.distance = 0
} IN TRANSACTIONS OF 1000 ROWS;

In a small dataset the query runs without any issues. On a large dataset I get this error after creating 9,897,806 relationships. It always stops at the same spot but unfortunately I could not find the CSV row where it fails. I get this error:
NOT PART OF CHAIN!
RelationshipTraversalCursor
[id=4293918719, open state with: denseNode=false, next=4293918719, , underlying record=Relationship[4293918719,used=false,source=-1,target=-1,type=-1,sCount=1,sNext=-1,tCount=1,tNext=-1,prop=-1, sFirst, tFirst]]

Could you please advise how to debug this issue or how to circumvent it to run the query without failing? Thank you.

hakan.lofqvist1 · January 5, 2024, 9:15am

This may be a tricky one Let's hope we can sort it out together.

Is this on a cluster?

The issue may be related to re-use of internal id's. There are some very hidden configuration to increase the time before an id can be reused.

# In neo4j 5.X
internal.dbms.cluster.raft.id_reuse.min_time=10m
internal.dbms.cluster.raft.id_reuse.max_time=20m
internal.dbms.cluster.raft.id_reuse.max_commits=5000

# In neo4j 4.4
causal_clustering.min_time_delay_id_reuse=10m
causal_clustering.max_time_delay_id_reuse=20m
causal_clustering.max_commits_delay_id_reuse=5000

It may also help (be easier) to split your csv file into smaller parts and process them one at a time. I would probably start with that. Easier to see if it is failing inside a specific part/faster to re-test.

tijuhasz · January 5, 2024, 11:31am

Thank you for your advice. I'll split the CSV in two and see how the two parts run. I'll delete all relationships created and re-run the two CSVs.

glilienfield · January 5, 2024, 4:43pm

BTW- I think in your algorithm you are going to create relationships in both directions. Is that what you intend, or do you just need one between each pair of nodes that are considered equal?

If not, change WHERE node1 <> node2 to WHERE elementId(node1) < elementId(node2)

Topic		Replies	Views
NOT PART OF CHAIN errors on neo4j 4.0.12 Neo4j Graph Platform migrated	0	259	December 14, 2022
Nodes not in a CSV-list Cypher	1	312	December 14, 2020
Not part of chain! Import / Export	0	467	November 24, 2020
Import CSV relationship error Cypher	10	943	June 21, 2019
How to fix NOT PART OF CHAIN! Neo4j Graph Platform migrated	2	363	June 20, 2022

How to debug NOT PART OF CHAIN! error

Related topics