I have a db with about 730000000 nodes. I need to delete some orphaned node trees. I've written a query to locate the top of the tree, this returns about 5500000 nodes, and delete all the these nodes and the nodes connected to itin indefinite depth. I know there are no loops, i.e. no inner node in the tree is connected to another tree or above the tree head. My db has 28800 MB of heap and 62GB of heap space. When running this query neo runs out of heap memory and I have to restart. Need help to get this job done.
Here is my query:
call apoc.periodic.iterate('match (s:BudgetDetails) where not (s)-[:BUDEGT_DETAILS_OF]-() with s optional match (s)-[*]-(a) return s,a','detach delete s,a',{batches:1000}) yield total return total
match (s:BudgetDetails) where not (s)-[:BUDEGT_DETAILS_OF]-()
returns about 5500000 nodes
optional match (s)-[*]-(a)
this part may return upto 20 nodes for each node found in the first query.
I don't care how long this query runs as long as it does not crash my db.
maybe we can use only one match query to do the same. Try this: call apoc.periodic.iterate('match (s:BudgetDetails)-[r:]-(a) where type(r) not BUDEGT_DETAILS_OF return s,a','detach delete s,a',{batches:1000}) yield total return total
Also, I think you can change the memory limits on the <where_the ne4j_is>/conf/neo4.conf
I believe the issue might lie with optional match (s)-[*]-(a) as the * operator goes out an indeterminate length of connections. This might be unintentionally walking the entire graph.
If you are are just trying to go out 1 hop from s to get a you should remove the *.
The '*' is intentional. As I wrote, I need to delete a tree of nodes where the 'BudgetDetails' is the top of. I know for a fact that all nodes under the 'BudgetDetails' do not connect outside their tree thus I know that I'm not walking the whole graph. Taking this out would mean that I will need to delete the nodes in the tree manually and this is not an option.
Thnx
This query is wrong. I'm looking for all the BudgetDetails that do not have the BUDGET_DETAILS_OF relation. This query simply disregards it and would bring back all BUDGET_DETAILS as they all have other connections.
Thanx
This has happens to me more than I like to admit -- I feel your pain. :)
I wish there was some reasonable way to design a test harness for cypher to catch this sort of thing. I don't even know what such a beast would look like -- I just know that in more usual development environments, my tests pick up nearly all mistakes like this.
Actually if I were to try this query without 'apoc', the query window shows a warning that it does not have the misspelled relationship in the db. So if this window was able to analyze 'apoc' queries it would have shown me the problem straight away.