What is the most efficient way to Delete 20 Million Nodes from a database with over 7.5 Billion Nodes and 15 Billion relationships?

(Benjamin Squire) #1

I have been having trouble with doing a mass delete of data observed on the first day. I am trying to delete around 20-30 Million nodes each day to cycle my data.

I have 3 proposed methods and they are all too slow.

method 1:
step 1 -Mark all the nodes to be deleted in parallel using apoc.periodic
step 2 -Detach delete user nodes using apoc.periodic (non parallel)

method 2:
step 1- Mark all the nodes to be deleted in parallel using apoc.periodic
step 2- using apoc.periodic + apoc.nodes.delete (non-parallel)

method 3:
step 1- Mark all the nodes to be deleted in parallel using apoc.periodic
step 2- Delete all the relationships using apoc.periodic (non-parallel)
step 3- Delete all the disconnected marked nodes in parallel using apoc.periodic

Right now my estimates are to delete 20 Million nodes using method 1 will take 55 hours.
Method 2 is even slower

Using Method 3 it will take 6 hours to delete all the relationships and I am testing how long to delete all the nodes in parallel now that the relationships have been removed.

Looking for thoughts and ideas on improving this. Seems like Neo4j struggles at scale.

  • neo4j 3.49
    -32 vCPU 244 GB Ram server with 16000 IOPs SSD
Parallel deletes with apoc.periodic.iterate
(Kunal Goyal) #2

Hi @benjamin.squire ,

There is a trick to delete large dataset from neo4j.
simply use LIMIT with each delete and perform this operation at 100k records once.
you have to give manual efforts here but it would be very much faster .

(Senthilc78) #3

Just curious to know why have to delete such 20M data and relationship everyday.

[1] If so i believe the Data Model needs to be revisited.

[2] If cost and time factors allows , can we create a one more GraphDB and just take only the needed data and relationship and import to the new GraphDB instead of messing up with existing one (assume the data is not sensitive , storage cost may not be big constraint)

[3] I think any DB kind of this operation wud take time

(Benjamin Squire) #4

@12kunal34 I have deleted in this fashion using limit 1000 and 10000, it takes 8-10 seconds for 1k and 80-90 seconds for 10k deletions and detaches. Hence, the time estimates of 55 hours, further apoc.periodic commit or iterate should batch the same as limit does and so should be considered equivalent.

@senthilc78 the reasoning of deleting 20M records per day is related to the amount of data I am dealing with and the length of time. I need to maintain at least 1 year of data of all clickstream records for 150 websites. Right now at 10 Months the data is 7.5 Billion records and 15 Billion relationships. When I hit 12 months it will be getting higher, the fact is I am using community and have a limit of 34 billion rels and relationships, in a previous data model I already hit that amount at 10 months. Since then I have revised the model to bare minimum needs. The 20M records per day relates to deleting the first day of logs a year ago. Since I am adding about 20M records per day and will be getting closer to the 34 Bill limit (plus hardware constraints of my machine) I need to be able to recycle that same and delete some of these old logs which we identify as no longer required.

1 Like
(Michael Black) #5

It's not a technical solution but it does solve getting around the size limit of community, purchase an enterprise license. Then you won't have a size limit and don't need to prune your data. It would also open you up to more flexibility such as clustering as your data grows. Otherwise 2 hrs to prune that much data isn't that bad, you could consider moving to an hourly pruning schedule if daily is too large.

(Senthilc78) #6

Got it Benjamin . Thanks for the details. so you made me thinking on Time to live nodes . I cud see some thirdparty plug in for that but believe using trigger and common counter with little change in data model we could achieve in community edition as an alternative i think. Let me check that out in detail and update. Thanks!

1 Like
(Michael Hunger) #7

What are your batch sizes for period.iterate calls?
If you observe the machine is it bound on IO or CPU?

What are the degree's of the nodes you are deleting?
Are those nodes related/connected to each other or unrelated?

I will try to reproduce your use-case and see how it behaves on our test machine.

1 Like
(Michael Hunger) #8

Also what is your heap config and your page-cache size?
And how do you invoke the procedures?

(Benjamin Squire) #9

@michael.hunger the batch size I tested was for 1k and 10k. Using method 1 detach delete node in series batches took 8-10 seconds for 1k and 80-100 seconds for 10k. When I tested apoc.nodes.delete (method 2) it was even slower 15-17 seconds for 1k and 10k took almost 2.5 mins per batch. Method 3 where I delete rels in batches was much faster, 1k took 5 to 6 seconds and 10k took like 30 seconds. The estimated time for Method 1 is around 50 hours ((20 MM / 10000 node batches) * 100 sec = 50 hrs). Method 2 was even slower. and Method three finished deleting all rels in 7 hours and finished deleteing the remaining floating nodes in 9 hours.

I don't have the exact number degree of each node but It should have an average of 4 nodes to each user, given there were 5 MM users on the first day and 20 MM connected nodes, but I will have to do some sampling to give a direct answer. They do have some connecting third party nodes.


total machine memory is 244 GB and all of it is dedicated to the neo4j server

This is Method 3 (deleting rels in series and deleting nodes in parallel that remain)

Call apoc.periodic.iterate("Match (u:User) where u.last_obs < datetime('2018-01-03') return u","Match (u) with apoc.coll.toSet([(u)-[:OBSERVED_WITH]->(a) | a]) + collect(distinct u) as nodes unwind nodes as n Set n:MarkDel",{batchSize:100,iteratelist:true,parallel:true,retries:3});

Call apoc.periodic.iterate("Match (u:MarkDel:User)-[r]->() return r", "delete r",{batchSize:10000});

Call apoc.periodic.iterate("Match (u:MarkDel) where size((u)--()) = 0 return u", "detach delete u",{batchSize:1000,iteratelist:true,parallel:true,retries:3});

The reason I markDel but only delete rels from Users is: 1.) because users are the central connector in the graph hence everything that is connected has a rel from a user so it looks like (User)-[r]->(id) 2.) I don't want to delete all (id) connected to Users markDel, because some (id) also touch users who were more recent, i.e. an (id) that connects to a user from the first day was also observed connected to a user yesterday, I don't want them to be deleted so the third iterate looks for orphaned nodes after I delete all the users which connect them. Note - there is no race condition once all the nodes rels are deleted because they all have size 0.

(Michael Hunger) #10

Did you observe the machine during the deletion process to see where the bottleneck was?
Also check with iotop how much write throughput is happening.

For single threaded deletes you can also increase the batch-size to 100k to 10m.
One other thing that could help is to order the to be deleted rels by id but I have to try that first.

One other thing one could do is to run a clustering algo on the :MarkDel nodes and delete clusters (nodes and rels) in parallel.

(Benjamin Squire) #11

I checked Iotop and these are the results. I upped the delete rels from 1000 to 1000000 in batches in series:

Total DISK READ :      23.07 M/s | Total DISK WRITE :	 1703.66 K/s
Actual DISK READ:      23.01 M/s | Actual DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                     
 50759 be/4 neo4j      10.91 M/s   46.25 K/s  0.00 % 73.48 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
 46510 be/4 neo4j      11.14 M/s   34.69 K/s  0.00 % 70.67 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
 42817 be/4 neo4j    1040.70 K/s 1622.72 K/s  0.00 % 15.63 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
 43008 be/4 neo4j	0.00 B/s    0.00 B/s  0.00 %  0.00 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j

Are these expected results? Do I need a higher IOPS SSD- currently it is a 16000 IOPS EBS

Tasks: 344 total,   2 running, 182 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.1 sy,  0.0 ni, 97.3 id,  2.4 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 25174608+total,  1702016 free, 20220547+used, 47838596 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 47205400 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                               
 42623 neo4j     20   0  210.2g 192.0g   3656 S  25.9 80.0   1961:36 java                                             
(Michael Hunger) #12

Those numbers are not really high.
I thinnk we should at least see 10x (250MB/s) on IO.

cpu utilization is zero only io/wait.

I need to test this on our machine to see how it behaves there.

(Benjamin Squire) #13

I have reviewed my EC2 and everything appears to be correct, Neo4j is highly underutilizing the available hardware.

ec2 - 32 vCPUs - 244 GB Ram - 8 TB 16000 IOPS Provisioned SSD - r4.8xlarge

Instance type	EBS-optimized by default	Maximum bandwidth (Mbps)	Maximum throughput (MB/s, 128 KB I/O)	Maximum IOPS (16 KB I/O)
r4.8xlarge	    Yes	                              7,000	                           875	                         37,500

These instance types can support maximum performance for 30 minutes at least once every 24 hours. For example, r4.8xlarge instances can deliver 875 MB/s for 30 minutes at least once every 24 hours.

(Benjamin Squire) #15

I have reached out to AWS support team to see what their opinion on this is.

(Michael Hunger) #16

I forgot to mention one thing. Due to the way the compiled runtime works, it doesn't behave well in periodic iterate
that's why it's best to prefix the driving statement with cypher runtime=slotted.

I have here tried to delete 1M rels, which took about 90s

 Call apoc.periodic.iterate("cypher runtime=slotted Match (:MarkDel)-[r]->() return r limit 1000000", "delete r",{batchSize:100000});
| batches | total   | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch                                                     | operations                                                          | wasTerminated |
| 10      | 1000000 | 91        | 1000000             | 0                | 0             | 0       | {}            | {total -> 10, committed -> 10, failed -> 0, errors -> {}} | {total -> 1000000, committed -> 1000000, failed -> 0, errors -> {}} | false         |
1 row
92048 ms

I'm recording my experiments here:

(Michael Hunger) #17

I now deleted 82291607M rels on our machine and it took. 2hrs 44 minutes

With disk I/O going up to these levels.

Total DISK READ :       0.00 B/s | Total DISK WRITE :     772.07 M/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:    1790.73 M/s

| batches | total    | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch                                                       | operations                                                            | wasTerminated |
| 823     | 82291607 | 8677      | 82291607            | 0                | 0             | 0       | {}            | {total -> 823, committed -> 823, failed -> 0, errors -> {}} | {total -> 82291607, committed -> 82291607, failed -> 0, errors -> {}} | false         |
1 row
8677673 ms

Now off to delete the nodes.

(Gigauser) #18

we did it like the following codes and took a lot of time, so we tried Michael's guide, but it took a little longer than our methods below:

MATCH (n:News)-[r]-(w:Word2)
WITH r LIMIT 10000


while [ $counter -le 50772 ]
echo $counter
time cat cypher/delete.cypher | bin/cypher-shell -a bolt://192.168.1.:* -u neo4j -p "********"

(Michael Hunger) #20

Btw. I realized that I had too little page-cache configured on that machine, that's why it took much longer than expected. If you can up your page-cache that works better. Also you can delete more of it, e.g. 100k per batch.

I would have used APOC's periodic.iterate instead so you don't start a new driver/connection for every query.

In graph-algos we're currently trying to use clustering (e.g. unionFind) to find independent clusters which then can have their relationships created/deleted in concurrently. That might also be an option for you.

(Jameslaneconkling) #21

In graph-algos we're currently trying to use clustering (e.g. unionFind) to find independent clusters which then can have their relationships created/deleted in concurrently.

@michael.hunger does this imply that relationships should be able to be deleted in parallel, if neither of the relationship's nodes are also being deleted? I tried to run a parallel delete of relationships via apoc.path.subgraphAll to extract fully disjoint subgraphs (which I had thought would be similar to your above suggested unionFind approach), but got NullPointerExceptions in the logs.


CALL apoc.periodic.iterate(
  "MATCH (metadata:METADATA)
  WHERE metadata.uri STARTS WITH $sourceUri
  RETURN metadata",
  "CALL apoc.path.subgraphAll(metadata, {relationshipFilter: 'METADATA>,RELATIONSHIP'}) YIELD relationships
  UNWIND relationships as r
  DELETE r",
  { batchSize: 1000, iterateList: true, parallel: true, concurrency: 50, params: { sourceUri: $sourceUri } }
(Benjamin Squire) #22

@jameslaneconkling What @michael.hunger was talking about is when you try to delete in parallel, the reason you get null.point.exception error's is basically because there is a race condition if you can't guarantee segregation of the nodes and relationships per thread. when he mentioned UnionFind, he is saying if you can basically cluster all the data (e.g. running subgraph all on all of your data) and then use periodic iterate so that the Delete Rels per thread only within a given community, you can avoid the race condition.

What you are doing is gathering all the rels for each of the nodes from your first query, but then on successive iterations, some of the nodes might have been deleted because they were connected to your subgraphAll in one of the batches. What you want to do to delete in parallel is make sure that for each delete r unwind, no other threads touch the data that is being deleted in a different thread.

Hence, run UnionFind with writes on all data, it writes a property community (write:true) or streams (UnionFind.Stream()) those clusters to a periodic iterate, then delete in parallel the clusters that show up in periodic iterate so that for each thread the batch contains data exclusive to that thread and can't be accessed by another thread.

Hope that helps