The below sometimes finishes and sometimes doesn't. When using a batch size of 5000 it finished in 93 hours which is outrageous. The below has a few optimization but still gets stuck on the last batch.
"CALL apoc.export.json.query('UNWIND $_batch as row with row.comm as comm MATCH (m:User) USING INDEX m:User(mup_id) where m.mup_id = comm
OPTIONAL MATCH (m)-[rs:OBSERVED_WITH]->(s:Segment)
OPTIONAL MATCH (m)-[rg:OBSERVED_WITH]->(e:Email)
OPTIONAL MATCH (m)-[rh:OBSERVED_WITH]->(h:Hash)
OPTIONAL MATCH (m)-[ra:OBSERVED_WITH]->(a:Adobe)
OPTIONAL MATCH (m)-[rl:OBSERVED_WITH]->(l:Liveramp)
WITH comm AS mup_id,
COLLECT(distinct {muid: m.uid, first_obs: m.first_obs, last_obs: m.last_obs}) AS uids,
COLLECT(distinct
CASE
WHEN s IS NULL THEN NULL
ELSE {id: s.segment_id, first_obs: rs.first_obs} END
) AS segs,
COLLECT(distinct
CASE
WHEN e IS NULL THEN NULL
ELSE {id: e.email, first_obs: rg.first_obs} END
) AS eml,
COLLECT(distinct
CASE
WHEN h IS NULL THEN NULL
ELSE {id: h.hash_id, first_obs: rh.first_obs} END
) AS hashs,
COLLECT(distinct
CASE
WHEN a IS NULL THEN NULL
ELSE {id: a.adobe_id, first_obs: ra.first_obs} END
) AS adb,
COLLECT(distinct
CASE
WHEN l IS NULL THEN NULL
ELSE {id: l.liveramp_id, first_obs: rl.first_obs} END
) AS lrs RETURN {mup_id:mup_id,uid:uids, segment:segs, email:eml,adobe:adb,liveramp:lrs,hash_id:hashs} as map','/mnt/lv1/export_data/init_test/test-' + $_count+'.json',{useTypes:true, storeNodeIds:false,params:{_batch:$_batch}}) YIELD nodes return sum(nodes)",{batchSize:50000,iterateList:true,parallel:true,concurrency:20,retries:2});
When using 50k batchsize, it gets to 734 files before it gets stuck, that means it reaches 36,700,000 lines approximately. This was the runtime when using 5k before:
"Starting Init Export at 18:41:20.586Z"
batches, total, timeTaken, committedOperations, failedOperations, failedBatches, retries, errorMessages, batch, operations, wasTerminated, failedParams
7334, 36665559, 334991, 36665559, 0, 0, 0, {}, {total: 7334, committed: 7334, failed: 0, errors: {}}, {total: 36665559, committed: 36665559, failed: 0, errors: {}}, FALSE, {}
As you can see it had 7334 batches at 5k a piece which is 36,670,000 and it completed in 334k seconds which is ~ 93 hours. My guess here is it gets stuck on the last batch because prior to the final batch it was making files every few seconds. I think it may be a bug.
- neo4j version 3.5.5 Enterprise
- neo4j-graph-algorithms-3.5.6.0-standalone.jar
- Limited to 10 without using the csv export component:
+-------------------------------------------------------------------------------------+| Plan | Statement | Version | Planner | Runtime | Time | DbHits | Rows |+-------------------------------------------------------------------------------------+| "PROFILE" | "READ_ONLY" | "CYPHER 3.5" | "COST" | "SLOTTED" | 4 | 236 | 5 |+-------------------------------------------------------------------------------------+
+------------------------+----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers | Other |
+------------------------+----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +ProduceResults | 5 | 5 | 0 | 0/0 | map, lrs, adb, hashs, uids, mup_id, eml, segs | |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +Projection | 5 | 5 | 0 | 0/0 | map, lrs, adb, hashs, uids, mup_id, eml, segs | {map : {mup_id: mup_id, uid: uids, segment: segs, email: eml, adobe: adb, liveramp: lrs, hash_id: hashs}} |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +EagerAggregation | 5 | 5 | 60 | 380/0 | lrs, adb, hashs, uids, mup_id, eml, segs | mup_id |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +Apply | 26 | 10 | 0 | 380/0 | e, comm, s, rs, rh, a, m, ra, l, rl, h, rg | |
| |\ +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +OptionalExpand(All) | 26 | 10 | 30 | 78/0 | e, comm, s, rs, rh, a, m, ra, l, rl, h, rg | (m)-[rl:OBSERVED_WITH]->(l); l:Liveramp |
| | | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +OptionalExpand(All) | 26 | 10 | 30 | 78/0 | e, comm, s, rs, rh, a, m, ra, h, rg | (m)-[ra:OBSERVED_WITH]->(a); a:Adobe |
| | | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +OptionalExpand(All) | 26 | 10 | 30 | 78/0 | e, comm, s, rs, rh, m, h, rg | (m)-[rh:OBSERVED_WITH]->(h); h:Hash |
| | | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +OptionalExpand(All) | 26 | 10 | 30 | 78/0 | e, comm, s, rs, m, rg | (m)-[rg:OBSERVED_WITH]->(e); e:Email |
| | | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +OptionalExpand(All) | 26 | 10 | 30 | 78/0 | comm, m, rs, s | (m)-[rs:OBSERVED_WITH]->(s); s:Segment |
| | | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| | +NodeIndexSeek | 26 | 10 | 15 | 78/0 | comm, m | :User(mup_id) |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +Distinct | 10 | 5 | 0 | 380/0 | comm | comm |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +Limit | 10 | 10 | 0 | 380/0 | cached[u.mup_id], u | 10 |
| | +----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| +NodeIndexScan | 100120771 | 10 | 11 | 0/0 | cached[u.mup_id], u | :User(mup_id) |
+------------------------+----------------+------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------+
- neo4j.log and debug.log
2019-06-26 12:59:07.395+0000 WARN [o.n.k.i.p.Procedures] Retrying operation 0 of 2
2019-06-26 12:59:07.395+0000 WARN [o.n.k.i.p.Procedures] Retrying operation 1 of 2
2019-06-26 12:59:13.745+0000 WARN [o.n.k.i.p.Procedures] Retrying operation 0 of 2
2019-06-26 12:59:13.845+0000 WARN [o.n.k.i.p.Procedures] Retrying operation 1 of 2
2019-06-26 12:59:13.946+0000 WARN [o.n.k.i.p.Procedures] Error during iterate.commit:
2019-06-26 12:59:13.946+0000 WARN [o.n.k.i.p.Procedures] 1 times: java.lang.NullPointerException