cancel
Showing results for 
Search instead for 
Did you mean: 

Neo4j Efficiently Query Multiple Nodes Each With Different WHERE Values

marcuschiu
Node

Given a database with thousands of nodes:

  • MERGE (n1:Node {id:1});
  • MERGE (n2:Node {id:2, first:"John", last:"Doe"});
  • etc

I want to construct a SCALABLE query that returns nodes if any 1 of the following conditions applies:

  • (node.id = 12) OR (node.first = "Turkey" AND node.last = "Legs")
  • (node.id = 12) OR (node.first = "John" AND node.last = "Doe")
  • (node.id = 1) OR (node.first = "Jiggly" AND node.last = "Puff")
  • thousands more

I have come up with the following query but it threw an out-of-memory error when the number of conditions reached 10,000

WITH [
   {id: 12, first: "Turkey", last: "Legs"},
   {id: 12, first: "John", last: "Doe"},
   {id: 1, first: "Jiggly", last: "Puff"}
   // thousands more
] AS conditions
UNWIND conditions AS condition
MATCH (n:Node) WHERE node.id = condition.id) AND (n.first = condition.first AND n.last = condition.last)
RETURN collect(DISTINCT n)

Just in case you are wondering, I have added indices and composite indices where needed

5 REPLIES 5

micro
Node

IMHO , this is not graph traversal and you are just trying with typical SQL way of scanning the entire table based on Index filter . When you say 1000 more such conditions, we need to have labels or relationship based on the conditions .
Wud be great to understand your problem statement more .  Is it like you want to check Friend of Friend network at Hops level? 
Cant we label the Generic name 'Node' to differentiate with 'SpecialNode' | ' General' | 'Churn' and using sub query you can consolidate the results from different labels of Node

Check this apoc.path.expand - APOC Documentation (neo4j.com)
 

yes, you are correct it's more of a typical SQL way of scanning the entire table based on index filter. There are no graph traversals in the query that I am trying to construct, thus the conditions have no labels nor relationships. All nodes are assumed to be the same label (i.e. Node in this case).

You may be wondering why use Neo4j instead of an SQL database, well this is a subproblem extracted from a much larger graph database with different node labels and relationships. But I left that out since it wasn't needed as part of the question

glilienfield
Ninja
Ninja

Do you run out of memory if you return ‘n’ , without the collect and distinct? 

Yes I ran it without collect(DISTINCT ?) and it still ran out of memory.

Can you post the query plan. You insert ‘explain’ at the beginning of the query.  There will be a new ‘Plan’ icon in the list of output formats.  

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.