Efficiently filter nodes which have multiple relationships


(M Kiuchi) #1

Hi all,

I have a graph data with millions of nodes which is grouped with property. For several reason, I don't use node label.
When I filter nodes which have multiple relationships, Neo4j returned "Out of memory" error and I cannot complete my query like this.

MATCH (n) WHERE n.idtype='ip'
WITH n
MATCH p=(n)-[r:have_ip]->()
WITH count(r) as cntr, p
WHERE cntr>2
RETURN p LIMIT 100
Neo.TransientError.General.OutOfMemoryError: There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.

How do I complete my query efficiently (without tweaking heap size) ?
Any comment is welcome !


(M Kiuchi) #2

Resolved by myself... Haha.

MATCH (n) WHERE n.idtype='ip'
WITH n
MATCH p=(n)-[r:have_ip]->()
WITH count(r) as cntr, n
WHERE cntr>2
WITH n
MATCH p=(n)-[:have_ip]->()<--()
RETURN p LIMIT 50


(Andrew Bowman) #3

For one, you really should be using labels, AllNodesScans are expensive.

Second, you can use the size() of a pattern with just the relationship type and/or direction to get the degree of a relationship without paying the cost of expanding it, that's a more efficient way to get the info you need.

MATCH (n)
WHERE n.idtype='ip' AND size((n)-[:have_ip]->()) > 1
WITH n
LIMIT 50 // every node would have at least 2 relationships, so at least 100 paths total
MATCH p = (n)-[:have_ip]->()
RETURN p
LIMIT 100

(M Kiuchi) #4

So informative. Thanks much !