cancel
Showing results for 
Search instead for 
Did you mean: 

Check if there exists a path between two nodes when data is big

shan63
Node

Let's assume in a shopping platform, every user is part of a single customer group and every transaction is from a customer to another customer. The data is modelled by the following paths:

(:CustomorGroup)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(:CustomorGroup)

Now the goal is to write a query the finds all pairs of CustomerGroups that are connected through transactions. One way of doing this is:
MATCH (cg1:CustomorGroup),(cg2:CustomorGroup) WHERE (cg1)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(cg2) RETURN DISTINCT cg1, cg2

Another solution which is more efficient is:

MATCH (cg1:CustomorGroup)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(cg2:CustomorGroup) RETURN DISTINCT cg1, cg2

But in general none of these are efficient enough because they first find all paths between pairs of CustomerGroup nodes while we are interested in knowing if there exists a path. In my data there could be many many paths between two CustomerGroups but I care only if there exists a path. I tried to tune it further by adding a hint to join on the Transaction node but still didn't change the performance a lot.

What are the more efficient queries for this usecase? How can I somehow include this logic in the query that I only care about path existence between two CustomerGroups?

0 REPLIES 0