Hi,
I am using the Neo4j community edition for my POC and I have configured Neo4J in HA. There are two high-end servers one of which is acting as master and another one as slave. I have a data simulator that is pumping data continuously to Kafka and through Kafka consumer, we are storing data in batch and inserting into Neo4J.
We have more than 2 Billion of nodes and kind of relationships. It is like, (card:Card)-[r:Transaction]->(terminal:Terminal). I have read the official documentation of Neo4J and it states that it is good for fraud detection. I am actually trying out the CPP (compromised point of purchase).
I have some properties for Transaction relation such as txndatetime, isFraud, location, etc. Now, from my graph database, i.e, out of those 2 Billion nodes, I am trying to find out 200 cards (these 200 card numbers are my input to the cypher query) and their relationships.
Like, let's say there are 2 cards c1 and c2 as my input to the cypher Query. In DB, I have relations such as:
c1-[Transaction]->(t1) [isFraud = false],
c1-[Transaction]->(t3) [isFraud = true],
c1-[Transaction]->(t11) [isFraud = true],
c1-[Transaction]->(t100) [isFraud = false],
c2-[Transaction]->(t10) [isFraud = false],
c2-[Transaction]->(t100) [isFraud = false],
c2-[Transaction]->(t150) [isFraud = true],
c2-[Transaction]->(t500) [isFraud = true],
.....
and so on ( other cards relations exists )which comprises of 2 Billion nodes.
Now given the input c1 and c2... I want to retrieve those relations for which card number is c1 and c2 and for which isFraud = false.
Here is my cypher.
WITH {batch_list} AS batch UNWIND batch AS row
MATCH (card)-[transact:Transaction]->(anotherTerminal)
WHERE card.Cardno = row[0]
AND transact.isFraud = false
WITH card, transact, anotherTerminal RETURN *
The above query is taking more than 12 hours or more if I pass 200 card numbers in batch_list.. So graph display is not possible.. still, if I get the data I can prepare reports out of it.
Note that, I was in the impression that the results will come out within 3 to 5 minutes and I can display in graph.
Kindly let me know where I am doing wrong. Would really appreciate any help.