Cypher Query for Forking and Merging Path Patterns

Hi all,

Using Neo4j 4.1.0, I was wondering if it's possible to detect patterns where transactions are being split up and then merged together. The following example is a trail of payments between users, where user 1 is paying user 6 $500.

Here, User 1 is splitting the money up via two different routes and different amounts so they don't pay user 6 directly.

I understand it's easy to hard-code a query that matches this, but is it possible to write queries that detect this pattern in various forms (where user 1 is paying user 6 indirectly, using various transactions and other users)? What about if more than 2 branches were used?

Any guidance will be much appreciated.

Thanks!

There are probably several ways of approaching this, but you can count distinct paths as one way.

For example:

MATCH (start:User { id: 1 })
MATCH (end:User { id: 6 })
WITH start, end
MATCH p=(start)-[:TRANSACTION*]->(end)
RETURN count(distinct(p)) as paths

OK, that would get you the number of distinct paths from A -> B. This is good when your analysis is targeted, i.e. you know the A and B you're interested in. And it would probably work OK if you only knew the start of the path, or only the end of the path.

Where this gets complicated is when you want to blow it out to every user pair in the database. This would produce a cartesian product; if you have 1,000 users, then the number of pairwise connections you have is (N * (N-1))/2 possible paths to consider, which quickly gets out of hand.

So what you might consider to do is some pre-filtering. For example, you could label nodes that have some trivial amount of total "cash-in" so they don't ever get considered in a path. And you could limit the total path length; for example maybe you'll look at 6-hop chains, but not be willing to consider 20-hop chains.

Hi David,

Thanks for getting back to me! Sorry if I wasn't clear - I'm not just looking for all paths (up to N hops) between two nodes, I'm looking for patters where:

  1. The same amount of money is passed on, on the same day, several hops. Eg. return A - $100 -> B - $100 -> C - $100 -> D (if these transactions are all on the same day, or within a few days of each other).
  2. For patterns that match 1. are there any cases where 1 person essentially pays a certain amount of money via a network of transactions that all make it to another person (again, on the same day or within a date tolerance). So looking at the above diagram, "User 1" pays "User 6" $500 indirectly, by sending $100 through the top path, and $400 through the bottom path.

If something like this is possible, we're not too concerned about performance as this wouldn't be run very often. But to begin with, we can assume we know the starting node

Sure, here's one way you could do this:

MATCH (start:User { id: 1 })-[r:Transaction]->(first)
WHERE r.sendDate >= date('2020-10-20')
WITH start, r, first, r.sendDate as sendDate, r.sendDate + duration({days:1}) as endDate, r.amount as amount
MATCH path = (start)-[r]->(first)-[:Transaction*]->(last)
WHERE all(rel in relationships(path) WHERE sendDate <= r.sendDate <= endDate AND r.amount = amount) 
 AND NOT EXISTS { 
    MATCH (last)-[rel:Transaction]->() 
    WHERE sendDate <= rel.sendDate <= endDate AND rel.amount = amount}
RETURN last, sum(amount) as totalTransferred, collect(path) as pathsUsed

This requires existential subqueries from Neo4j 4.x.

1 Like

Hi Andrew,

Thank you for your reply, that's very helpful!

Quick question - is the "not exists" statement meant for performance, so it rules out fragments of a payment chain (i.e. only targeting full length chains, rather than several snippets)?

I'm using this query to look for this pattern (for transactions above a certain threshold value), however, the starting user is not known. Therefore I'm running it across the full database (10m+ transactions and ~800k users), and unsurprisingly it's taking a very long time to return a result.

Is there anything I can do to make this more efficient? As I can't index a relationship property, I've also tried replicating this model where the transaction is a node, and then index the transaction amount. Should I also index the datetime aspect of it?

Many thanks,
Nick