Query optimization for fraud rings matching


(Borism) #1

Hello,
I have the following case :
We are looking for fraud were 2 persons are connected to one car in one police case
and the same 2 persons are connected to 2 different cars in another police case.
unnamed

The script for database creation :

CREATE (police1:PoliceFile {FileID:1})
CREATE (police2:PoliceFile {FileID:2})
CREATE (car1:Car {CarID:1})
CREATE (car2:Car {CarID:2})
CREATE (car3:Car {CarID:3})
CREATE (inv1:Involved {PesronID:1})
CREATE (inv2:Involved {PesronID:2})
CREATE (cl1:Claim {ClaimID:1})
CREATE (cl2:Claim {ClaimID:2})
CREATE (police1)-[:ConnectedTo]->(car1)
CREATE (police2)-[:ConnectedTo]->(car2)
CREATE (police2)-[:ConnectedTo]->(car3)
CREATE (car1)-[:ConnectedTo]->(inv1)
CREATE (car1)-[:ConnectedTo]->(inv2)
CREATE (car2)-[:ConnectedTo]->(inv1)
CREATE (car3)-[:ConnectedTo]->(inv2)
CREATE (inv1)-[:ConnectedTo]->(cl1)
CREATE (inv1)-[:ConnectedTo]->(cl2)
CREATE (cl1)-[:ConnectedTo]->(police1)
CREATE (cl2)-[:ConnectedTo]->(police2)

I use the following query to find the ring :

MATCH p=(c3:Car)-[:ConnectedTo]-(inv2:Involved)-[:ConnectedTo]-(c2:Car)-[:ConnectedTo]-(p1:PoliceFile)-[:ConnectedTo]-(c1:Car)-[:ConnectedTo]-(inv1:Involved)-[:ConnectedTo]-(c3:Car)-[:ConnectedTo]-(p2:PoliceFile)
where ID(p1)<>ID(p2) and ID(inv1)<>ID(inv2)
return p

It works with the sample data, but in the real database with 2 million nodes it works hours and does not bring results back.

How should I change the query in order to efficiently find fraud rings in the database?

Thank you,
Boris


(Paul Thomas) #2

Could be the data, could be not enough RAM on the box ...

But your query is looking for a long chain not a ring!

Any ring loops back to the start for example, notice p1 at the start and at the end

match pathx=(p1:Person)--(c1:Claim)--(p2:Person)--(c2:Claim)--(p1)
return pathx


(Ameyasoft) #3

Hi,

May be using the same label for all relationship is causing the problem?.

You can try this.
It may be better to collect all the claims filed by PersonID = 1 and use the collected claim ids to see all other persons involved in these claims.

Here is the query:

MATCH (c:Claim)-->(p)-->(d:Car)--(i:Involved)
WHERE i.PesronID = 1
WITH COLLECT (c) as n1
UNWIND n1 as n2
MATCH (n2)-->(p)-->(d:Car)--(i:Involved)
RETURN n2, p, d, i;
Result:

fraudcase2

If I may suggest a better model, here it is:

fraudcase .

-Kamal