Hello, I have read through your comments, and there is a lot to try and unpick, but here is a start.
1. How to model your data.
The data model you were sent in the post above was led by me, but contributed to by many people who work on fraud solutions daily (as it is a major use case in our industry).
The idea is that it is a starting point and one you can work from. Whether you use it or not is up to you, but the way we model transactions, for example, is based on new features around QPP (Quantified Path Patterns) and enhancements coming through GQL.
Your example aligns nicely with a few additions, which is EXACTLY what we want. We never planned to boil the ocean with the data model. Here is how it relates:
Constraints and Indices
-- Core constraints from baseline model
CREATE CONSTRAINT account_number IF NOT EXISTS
FOR (a:Account) REQUIRE a.accountNumber IS NODE KEY;
CREATE CONSTRAINT transaction_id IF NOT EXISTS
FOR (t:Transaction) REQUIRE t.transactionId IS NODE KEY;
CREATE CONSTRAINT customer_id IF NOT EXISTS
FOR (c:Customer) REQUIRE c.customerId IS NODE KEY;
CREATE CONSTRAINT counterparty_id IF NOT EXISTS
FOR (cp:Counterparty) REQUIRE cp.counterpartyId IS NODE KEY;
CREATE CONSTRAINT country_code IF NOT EXISTS
FOR (c:Country) REQUIRE c.code IS NODE KEY;
-- Additional constraint for Institution nodes (extending the baseline)
CREATE CONSTRAINT institution_id IF NOT EXISTS
FOR (i:Institution) REQUIRE i.institutionId IS NODE KEY;
-- Performance indices for common queries
CREATE INDEX transaction_date_idx IF NOT EXISTS FOR (t:Transaction) ON (t.date);
CREATE INDEX transaction_amount_idx IF NOT EXISTS FOR (t:Transaction) ON (t.amount);
CREATE INDEX account_trust_score_idx IF NOT EXISTS FOR (a:Account) ON (a.trustScore);
Node Labels and Properties
Institution (Extension to baseline model)
Labels: :Institution
This node represents financial institutions in the network:
Property |
Type |
Description |
Example |
institutionId |
String |
Unique identifier |
"SBZAZAJJ" |
name |
String |
Institution name |
"Barclays" |
swiftCode |
String |
SWIFT/BIC code |
"BARCGB22" |
institutionType |
String |
Type of institution |
"BANK", "CREDIT_UNION", "FINTECH" |
Account
Labels: :Account
, with additional labels :Internal
(our accounts) or :External
(other banks' accounts)
Standard baseline properties:
Property |
Type |
Description |
accountNumber |
String |
Unique account identifier |
accountType |
String |
"CURRENT", "SAVINGS", "BUSINESS" |
openDate |
DateTime |
Account opening date |
closedDate |
DateTime |
Account closure date (optional) |
suspendedDate |
DateTime |
Account suspension date (optional) |
Our extensions:
Property |
Type |
Description |
trustScore |
Float |
Calculated trust score (0-100) |
lastTransactionAt |
DateTime |
Most recent transaction timestamp |
Transaction
Labels: :Transaction
Standard baseline properties:
Property |
Type |
Description |
transactionId |
String |
Unique transaction identifier |
amount |
Float |
Transaction amount (positive value) |
currency |
String |
ISO currency code (e.g., "GBP", "EUR") |
date |
DateTime |
Transaction timestamp |
message |
String |
Description/reference text |
type |
String |
"SWIFT", "FASTER_PAYMENT", "SEPA" |
Our extension:
Property |
Type |
Description |
riskScore |
Float |
Risk assessment score from our ML model |
Customer
Labels: :Customer
Standard baseline properties:
Property |
Type |
Description |
customerId |
String |
Unique customer identifier |
firstName |
String |
Customer's first name (optional) |
middleName |
String |
Customer's middle name (optional) |
lastName |
String |
Customer's surname (optional) |
dateOfBirth |
Date |
Customer's date of birth |
placeOfBirth |
String |
Customer's place of birth |
countryOfBirth |
String |
Customer's country of birth |
Country
Labels: :Country
Property |
Type |
Description |
Example |
code |
String |
ISO 3166-1 alpha-2 code |
"GB" |
name |
String |
Full country name |
"United Kingdom" |
Relationships
Standard Baseline Relationships
-- Account initiates transaction
(a:Account)-[:PERFORMS]->(t:Transaction)
-- Transaction benefits receiving account
(t:Transaction)-[:BENEFITS_TO]->(a:Account)
-- Customer owns account
(c:Customer)-[:HAS_ACCOUNT]->(a:Account)
-- Account is domiciled in country
(a:Account)-[:IS_HOSTED]->(c:Country)
### Our Extensions
-- Account is held at institution (our addition)
(a:Account)-[:IS_HELD_AT]->(i:Institution)
2. How to model the data if it comes in out of order
Here is some cypher code that implements your comments below:
Where I'm stuck at the moment is how to model, define the Cypher code (for the Kafka Connect Neo4J sink) to edges/link the outbound Txn event with the outbound accountHolder (Payer) onto the inbound accountHolder (Payee) as a PaidFundsTo relationship, and then similar define the inbound event for the inbound accountHolder (Payee) originating from the outbound accountHolder (Payer) as a ReceivedFundsFrom Edge/Link.
Each transaction event have unique eventId's, but do share a common transactionID.
Each event additionally have an accountId and counterAccountId.
for the outbound we have an accountId which matches the Payer.accountID and a counterAccountId matching the Payee's accountID.
for the inbound we have a accountId which matches the inbound Payee.accountId and counterAccountId which now matches the payers.accountId.
Cypher:
//--------------------
// OUT-OF-ORDER EVENT PROCESSING EXAMPLE
//
// KEY INSIGHT: "Data coming in the wrong way" is NOT a problem!
// This example proves that events can arrive in ANY order and still produce
// the correct graph structure using the BASELINE DATA MODEL.
//
// BASELINE RELATIONSHIPS USED:
// - (Account)-[:PERFORMS]->(Transaction) - Account initiates transaction
// - (Transaction)-[:BENEFITS_TO]->(Account) - Transaction benefits receiving account
//--------------------
// ===== SCENARIO 1: TRANSACTION A - OUTBOUND EVENT ARRIVES FIRST =====
// Step 1A: Process OUTBOUND event for Transaction A
// Event data: eventId="EVT-OUT-A", transactionId="TXN-A", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
//
// EXPLANATION: The outbound event knows:
// - Who is paying (accountId = TEST-PAYER)
// - Who is receiving (counterAccountId = TEST-PAYEE)
// - The transaction details
//
// Create COMPLETE baseline model with Institution nodes
MERGE (internalBank:Institution {institutionId: "BANK-INTERNAL"})
ON CREATE SET
internalBank.name = "Our Bank",
internalBank.swiftCode = "OURBANK22",
internalBank.institutionType = "BANK"
MERGE (externalBank:Institution {institutionId: "BANK-EXTERNAL"})
ON CREATE SET
externalBank.name = "External Bank",
externalBank.swiftCode = "EXTBANK33",
externalBank.institutionType = "BANK"
MERGE (payerAccount:Account {accountNumber: "TEST-PAYER"}) // accountId from event
ON CREATE SET
payerAccount:Internal,
payerAccount.accountType = "CURRENT",
payerAccount.trustScore = 80.0
MERGE (payeeAccount:Account {accountNumber: "TEST-PAYEE"}) // counterAccountId from event
ON CREATE SET
payeeAccount:External,
payeeAccount.accountType = "CURRENT",
payeeAccount.trustScore = 70.0
// BASELINE MODEL: Connect accounts to their institutions
MERGE (payerAccount)-[:IS_HELD_AT]->(internalBank)
MERGE (payeeAccount)-[:IS_HELD_AT]->(externalBank)
// Create baseline transaction model
MERGE (txnA:Transaction {transactionId: "TXN-A"})
ON CREATE SET
txnA.amount = 1000.00,
txnA.currency = "GBP",
txnA.date = datetime("2024-07-24T10:00:00"),
txnA.message = "Payment A",
txnA.riskScore = 15.0
// Standard baseline relationships
MERGE (payerAccount)-[:PERFORMS]->(txnA);
// BASELINE MODEL: The outbound event only establishes the PERFORMS relationship
// The payer account performs the transaction (this is what the outbound event knows)
// Step 2A: Process INBOUND event for Transaction A (ARRIVES LATER)
// Event data: eventId="EVT-IN-A", transactionId="TXN-A", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event knows:
// - Who received the funds (accountId = TEST-PAYEE)
// - Who sent the funds (counterAccountId = TEST-PAYER)
// - The transaction details from the receiver's perspective
//
// IMPORTANT: All nodes already exist from the outbound event!
// We just complete the BENEFITS_TO relationship (what the inbound event knows)
MATCH (txnA_existing:Transaction {transactionId: "TXN-A"})
MATCH (payeeAccount_existing:Account {accountNumber: "TEST-PAYEE"}) // accountId from inbound event
MATCH (payerAccount_existing:Account {accountNumber: "TEST-PAYER"}) // counterAccountId from inbound event
WITH txnA_existing, payeeAccount_existing
// Complete the baseline transaction model
MERGE (txnA_existing)-[:BENEFITS_TO]->(payeeAccount_existing);
// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// The transaction benefits the payee account (this is what the inbound event knows)
// Note: No additional relationship needed - the Transaction node connects everything
// ===== SCENARIO 2: TRANSACTION B - INBOUND EVENT ARRIVES FIRST =====
// Step 1B: Process INBOUND event for Transaction B (ARRIVES FIRST!)
// Event data: eventId="EVT-IN-B", transactionId="TXN-B", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event arrives first this time, but that's fine!
// - We know who received (accountId = TEST-PAYEE)
// - We know who sent (counterAccountId = TEST-PAYER)
// - We create what we know about and set up the BENEFITS_TO relationship
// Note: Institution nodes and IS_HELD_AT relationships already exist from Transaction A
MERGE (payeeAccount_b:Account {accountNumber: "TEST-PAYEE"}) // accountId from inbound event
MERGE (payerAccount_b:Account {accountNumber: "TEST-PAYER"}) // counterAccountId from inbound event
// Create baseline transaction (same transactionId will be used by outbound event later)
MERGE (txnB:Transaction {transactionId: "TXN-B"})
ON CREATE SET
txnB.amount = 2000.00,
txnB.currency = "GBP",
txnB.date = datetime("2024-07-24T11:00:00"),
txnB.message = "Payment B",
txnB.riskScore = 25.0
// Standard baseline relationship from inbound perspective
MERGE (txnB)-[:BENEFITS_TO]->(payeeAccount_b);
// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// Even though this arrives first, we can still create the complete relationship
// The Transaction node serves as the hub connecting both accounts
// Step 2B: Process OUTBOUND event for Transaction B (ARRIVES LATER)
// Event data: eventId="EVT-OUT-B", transactionId="TXN-B", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
//
// EXPLANATION: The outbound event arrives second, but finds everything already exists!
// - Transaction node exists
// - Both account nodes exist
// - We just add the missing PERFORMS relationship (baseline model)
MATCH (txnB_existing:Transaction {transactionId: "TXN-B"})
MATCH (payerAccount_b_existing:Account {accountNumber: "TEST-PAYER"}) // accountId from outbound event
MATCH (payeeAccount_b_existing:Account {accountNumber: "TEST-PAYEE"}) // counterAccountId from outbound event
WITH txnB_existing, payerAccount_b_existing
// Complete baseline transaction model
MERGE (payerAccount_b_existing)-[:PERFORMS]->(txnB_existing);
// BASELINE MODEL: The outbound event establishes the PERFORMS relationship
// This completes the baseline transaction model:
// (Payer)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Payee)
//--------------------
// VERIFICATION QUERIES
// These prove that both scenarios produce the same final graph structure
//--------------------
// Show final state of Transaction A (outbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-A"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction A Final State" as scenario,
payerBank.name as payer_bank,
payer.accountNumber as payer_account,
payeeBank.name as payee_bank,
payee.accountNumber as payee_account,
txn.transactionId as transaction_id,
txn.amount as amount,
"OUTBOUND_FIRST" as arrival_order;
// Show final state of Transaction B (inbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-B"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction B Final State" as scenario,
payerBank.name as payer_bank,
payer.accountNumber as payer_account,
payeeBank.name as payee_bank,
payee.accountNumber as payee_account,
txn.transactionId as transaction_id,
txn.amount as amount,
"INBOUND_FIRST" as arrival_order;
//--------------------
// KEY TAKEAWAYS
//--------------------
// 1. Event arrival order DOES NOT MATTER
// 2. Each event creates the graph elements it knows about using BASELINE MODEL
// 3. MERGE operations ensure no duplicates
// 4. transactionId links both events together via the Transaction node
// 5. Final graph follows COMPLETE baseline model: (Institution)<-[:IS_HELD_AT]-(Account)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Account)-[:IS_HELD_AT]->(Institution)
// 6. Transaction node serves as the central hub - no direct account-to-account relationships needed
// 7. Institution nodes provide complete institutional context for compliance and reporting
// 8. This approach naturally handles the distributed/async nature of Kafka while maintaining 100% data model standards
Hope this all makes sense!