Right now I have a graph with ~70 M nodes of a given type and have just finished uploading ~20 M nodes of another type. These node type can be considered like so:
// name is unique, index on position
CREATE (:A { name: string, position: int })
// index on start, end guaranteed > start
CREATE (:B { start: int, end: int })
Now, what I need to do is create links from (a)-[:IN]->(b)
where the position falls within the given range.
I've tried multiple times using various forms of apoc.period.iterate
(streaming over all B's or all A's):
call apoc.periodc.interate(
"MATCH (b:B) RETURN b", // stream all B's
"MATCH (a:A) WHERE a.position >= b.start AND a.position < b.end AND NOT ((a)-[:IN]->(b)) MERGE (a)-[:IN]->(b)",
{
// i've tried parallel true & false, various batch sizes, etc.
}
)
And apoc.periodic.commit
(admittedly, I'm sure sure if commit even makes sense for this or where in the query the limit
should be applied):
call apoc.periodic.commit(
"MATCH (a:A) MATCH (b:B) WHERE a.position >= b.start AND a.position < b.end AND NOT ((a)-[:IN]->(b)) MERGE (a)-[:IN]->(b) WITH a, b LIMIT {limit} MERGE (a)-[:IN]->(b)",
{ limit: 1000 }
)
Regardless of how I run this, one of a few outcomes happens:
- The query returns almost immediately and does nothing.
- The query runs for ~2 hours, the connection is lost, nothing has been updated.
- The query runs for a little while (~1 hour), ends "successfully", nothing has been updated.
My "last hope" option is to actually perform the link up when loading B's into the database as part of the LOAD CSV
query. But this is sub-optimal, works, but is abysmally slow (at it's current rate it would take upwards of 6 days to complete).
Any help solving this would be a huge help. Thanks!