Creating a family node for connected nodes

Hello everyone,
I'm trying to build a device graph where I have lots of different user identifiers connected together to a user node.
My goal is to group up the user nodes "Rows" who share at least 1 identifier into a "family" node.

Example of raw data:

As you can see, row 1 and 2 share an identifier
rows 2,4 share an identifier, and rows 3,4 share an identifier
My ultimate goal would be creating a family node with the same index as the minimum row index, hence "1".
The scale is approximately 36M nodes (11.5M row nodes, rest are unique identifiers), and 46M connections.
I know the initial load is significant, but the incremental uploads would be much smaller.

After running into multiple memory issues the best I was able to achieve is :

CALL apoc.periodic.iterate("UNWIND range(1,11500000) as id return id",
'MATCH (a:Row {index:id})-[:USES]->(c)<-[:USES]-(b:Row)   with a, collect(distinct b) as familyMembers , case when a.index < min(b.index) then a.index else min(b.index) end as min_index_final  MERGE(f:Family {index: min_index_final}) MERGE (a)-[:BELONGS_TO]->(f) with min_index_final, familyMembers,f UNWIND familyMembers as member MERGE (member)-[r:BELONGS_TO]->(f)',{batchSize:5000})

Basically, iterating through all the row nodes, finding first degree connected rows and creating a master node sharing the index of the lowest number in the cluster (to keep it deterministic).

the result : (in the comment since I can't post 2 pictures)

As you can see there are two issues here

  1. Redundant families were created. When iterating on rows 1 and 2, everything is smooth and nodes 1,2,4 got connected to family 1. When iterating on node 3 and 4 - 1 wasn't an immediate relation so it couldn't grab that index , resulting in additional families. I could potentially clean it up later but still 3 wouldn't connect to 1 without extending the relationship degree.
  2. When applied on millions, it takes forever. It took me 4 hours to go through 600k rows out of 11.5M.

Would love to hear if there's anything I'm doing wrong, or anything that could make it smarter/ faster as I'm running out of ideas.
Thanks

I was able to achieve the end result using immediate connections between the Row nodes first, and then looking for 2nd degree connections.
This works on a small subset of the data, but when trying to apply it on the whole dataset it's not working too well.
How can this be more efficient? maybe deleting the "RELATED" connections while creating the families?

query:

CALL apoc.periodic.iterate("UNWIND range(1,4) as id return id",
'MATCH (a:Row {index:id})-[r:RELATED*..2]->(b:Row) where b.index < id  with a, collect(distinct b) as familyMembers , case when a.index < min(b.index) then a.index else min(b.index) end as min_index_final  MERGE(f:Family {index: min_index_final}) MERGE (a)-[:BELONGS_TO]->(f) with min_index_final, familyMembers,f UNWIND familyMembers as member MERGE (member)-[r:BELONGS_TO]->(f)',{batchSize:5000})