Head's Up! Site maintenance this Wednesday, February 1. Disruptions expected as we migrate the forums.
β10-20-2020 04:37 AM
Hi all,
Using Neo4j 4.1.0, I was wondering if it's possible to detect patterns where transactions are being split up and then merged together. The following example is a trail of payments between users, where user 1 is paying user 6 $500.
Here, User 1 is splitting the money up via two different routes and different amounts so they don't pay user 6 directly.
I understand it's easy to hard-code a query that matches this, but is it possible to write queries that detect this pattern in various forms (where user 1 is paying user 6 indirectly, using various transactions and other users)? What about if more than 2 branches were used?
Any guidance will be much appreciated.
Thanks!
Solved! Go to Solution.
β10-22-2020 05:03 PM
Sure, here's one way you could do this:
MATCH (start:User { id: 1 })-[r:Transaction]->(first)
WHERE r.sendDate >= date('2020-10-20')
WITH start, r, first, r.sendDate as sendDate, r.sendDate + duration({days:1}) as endDate, r.amount as amount
MATCH path = (start)-[r]->(first)-[:Transaction*]->(last)
WHERE all(rel in relationships(path) WHERE sendDate <= r.sendDate <= endDate AND r.amount = amount)
AND NOT EXISTS {
MATCH (last)-[rel:Transaction]->()
WHERE sendDate <= rel.sendDate <= endDate AND rel.amount = amount}
RETURN last, sum(amount) as totalTransferred, collect(path) as pathsUsed
This requires existential subqueries from Neo4j 4.x.
β10-20-2020 06:57 AM
There are probably several ways of approaching this, but you can count distinct paths as one way.
For example:
MATCH (start:User { id: 1 })
MATCH (end:User { id: 6 })
WITH start, end
MATCH p=(start)-[:TRANSACTION*]->(end)
RETURN count(distinct(p)) as paths
OK, that would get you the number of distinct paths from A -> B. This is good when your analysis is targeted, i.e. you know the A and B you're interested in. And it would probably work OK if you only knew the start of the path, or only the end of the path.
Where this gets complicated is when you want to blow it out to every user pair in the database. This would produce a cartesian product; if you have 1,000 users, then the number of pairwise connections you have is (N * (N-1))/2 possible paths to consider, which quickly gets out of hand.
So what you might consider to do is some pre-filtering. For example, you could label nodes that have some trivial amount of total "cash-in" so they don't ever get considered in a path. And you could limit the total path length; for example maybe you'll look at 6-hop chains, but not be willing to consider 20-hop chains.
β10-20-2020 07:27 AM
Hi David,
Thanks for getting back to me! Sorry if I wasn't clear - I'm not just looking for all paths (up to N hops) between two nodes, I'm looking for patters where:
If something like this is possible, we're not too concerned about performance as this wouldn't be run very often. But to begin with, we can assume we know the starting node
β10-22-2020 05:03 PM
Sure, here's one way you could do this:
MATCH (start:User { id: 1 })-[r:Transaction]->(first)
WHERE r.sendDate >= date('2020-10-20')
WITH start, r, first, r.sendDate as sendDate, r.sendDate + duration({days:1}) as endDate, r.amount as amount
MATCH path = (start)-[r]->(first)-[:Transaction*]->(last)
WHERE all(rel in relationships(path) WHERE sendDate <= r.sendDate <= endDate AND r.amount = amount)
AND NOT EXISTS {
MATCH (last)-[rel:Transaction]->()
WHERE sendDate <= rel.sendDate <= endDate AND rel.amount = amount}
RETURN last, sum(amount) as totalTransferred, collect(path) as pathsUsed
This requires existential subqueries from Neo4j 4.x.
β10-28-2020 04:25 AM
Hi Andrew,
Thank you for your reply, that's very helpful!
Quick question - is the "not exists" statement meant for performance, so it rules out fragments of a payment chain (i.e. only targeting full length chains, rather than several snippets)?
I'm using this query to look for this pattern (for transactions above a certain threshold value), however, the starting user is not known. Therefore I'm running it across the full database (10m+ transactions and ~800k users), and unsurprisingly it's taking a very long time to return a result.
Is there anything I can do to make this more efficient? As I can't index a relationship property, I've also tried replicating this model where the transaction is a node, and then index the transaction amount. Should I also index the datetime aspect of it?
Many thanks,
Nick
All the sessions of the conference are now available online