Project is something between tech explore and Blog covering what I've learned

Probably nothing mind blowing.

Real time Financial data stream from a source (for development it is from comprised gz data files), published onto Kakfa, from there we're using Kafka Connect Sink framework to sink messaged into Neo4J. As AccountHolders and TransactionEvents.

Where we link Banks (node/vertices) to AccountHolders (node/vertices) and AccountHolders to TransactionEvents, each transaction is represented as 2 Events, an outbound event from the Payer and a Inbound event on the Payee side,

Each event has a Risk score associated with it, each AccountHolder has a TrustScore associated with it.

Below is a graphical overview of the current idea, MVP1. Currently have the Banks loaded, they are a very static list so got loaded from a csv file into Bank nodes

Next we have the AccountHolders, we have a inbound and outbound AccountHolder, they get extracted from Inbound and Outbound transactions, they are first associated with their banks, this way we dynamically expand our known AccountHolders as we get transactions...

This is all done/working...

Where I'm stuck at the moment is how to model, define the Cypher code (for the Kafka Connect Neo4J sink) to edges/link the outbound Txn event with the outbound accountHolder (Payer) onto the inbound accountHolder (Payee) as a PaidFundsTo relationship, and then similar define the inbound event for the inbound accountHolder (Payee) originating from the outbound accountHolder (Payer) as a ReceivedFundsFrom Edge/Link.

Each transaction event have unique eventId's, but do share a common transactionID.
Each event additionally have an accountId and counterAccountId.

for the outbound we have an accountId which matches the Payer.accountID and a counterAccountId matching the Payee's accountID.

for the inbound we have a accountId which matches the inbound Payee.accountId and counterAccountId which now matches the payers.accountId.

Currently I show AccountHolders, understand AccountHolders can be (for now will be) associated with a single Person (or as a Corporate node)

Future
Future I want to change the accountHolder and the owner separate. An Account has an accountHolder which is a corporate or person, if corporate then we also define owners of the corporate.

The plan is to further model the Person (or corporate) that is associated with AccountHolder containing a SSN or company Registration number, providing the ability to match "Person/s" that have accounts at multiple banks and Companies that have multiple Accounts at multiple banks (Not depicted at this point).

Also been thinking of making an fin transaction as an event... on the account, with the potential capability of handling other types of events on an account.

for current, See below.

This is the idea for lets call it phase 1 - blog 1... The next bit after this might be to add some AI/ML, some how, thats a different rabbit hole to then explore...

G

My colleague at Neo4j just recently published this - may be worth having a look at it

1 Like

Hi John

Awesome, thank you.

Def be looking at this... Looking at what I see... some diagrams and pdf downloadable makes for easy off line reading,

Appreciate share.
Gives ideas for future...

My model however is being "made to fit" our ISV's application/structure/flow/design...

So the share is great to read/understand, but can see it does not fit my need/pattern.

G

bump bump.

so would love to get some feedback, my thinking makes me think it would be better to model a payment as a event and the event in this question is of type payment, thus allowing me to handle account and accountholder detail changes as update events also.

I then tie the transaction (as a node) to the outbound payer as a edge (TAG:PaidFundsTo or Debit_Event) using the outbound event

and tie the transaction/node to the inbound payee account/node using the inbound event and (TAG:ReceivedFundsFrom or Credit_Event).

G

Hello, I have read through your comments, and there is a lot to try and unpick, but here is a start.

1. How to model your data.

The data model you were sent in the post above was led by me, but contributed to by many people who work on fraud solutions daily (as it is a major use case in our industry).

The idea is that it is a starting point and one you can work from. Whether you use it or not is up to you, but the way we model transactions, for example, is based on new features around QPP (Quantified Path Patterns) and enhancements coming through GQL.

Your example aligns nicely with a few additions, which is EXACTLY what we want. We never planned to boil the ocean with the data model. Here is how it relates:

Constraints and Indices

-- Core constraints from baseline model
CREATE CONSTRAINT account_number IF NOT EXISTS
FOR (a:Account) REQUIRE a.accountNumber IS NODE KEY;

CREATE CONSTRAINT transaction_id IF NOT EXISTS
FOR (t:Transaction) REQUIRE t.transactionId IS NODE KEY;

CREATE CONSTRAINT customer_id IF NOT EXISTS
FOR (c:Customer) REQUIRE c.customerId IS NODE KEY;

CREATE CONSTRAINT counterparty_id IF NOT EXISTS
FOR (cp:Counterparty) REQUIRE cp.counterpartyId IS NODE KEY;

CREATE CONSTRAINT country_code IF NOT EXISTS
FOR (c:Country) REQUIRE c.code IS NODE KEY;

-- Additional constraint for Institution nodes (extending the baseline)
CREATE CONSTRAINT institution_id IF NOT EXISTS
FOR (i:Institution) REQUIRE i.institutionId IS NODE KEY;

-- Performance indices for common queries
CREATE INDEX transaction_date_idx IF NOT EXISTS FOR (t:Transaction) ON (t.date);
CREATE INDEX transaction_amount_idx IF NOT EXISTS FOR (t:Transaction) ON (t.amount);
CREATE INDEX account_trust_score_idx IF NOT EXISTS FOR (a:Account) ON (a.trustScore);

Node Labels and Properties

Institution (Extension to baseline model)

Labels: :Institution

This node represents financial institutions in the network:

Property Type Description Example
institutionId String Unique identifier "SBZAZAJJ"
name String Institution name "Barclays"
swiftCode String SWIFT/BIC code "BARCGB22"
institutionType String Type of institution "BANK", "CREDIT_UNION", "FINTECH"

Account

Labels: :Account, with additional labels :Internal (our accounts) or :External (other banks' accounts)

Standard baseline properties:

Property Type Description
accountNumber String Unique account identifier
accountType String "CURRENT", "SAVINGS", "BUSINESS"
openDate DateTime Account opening date
closedDate DateTime Account closure date (optional)
suspendedDate DateTime Account suspension date (optional)

Our extensions:

Property Type Description
trustScore Float Calculated trust score (0-100)
lastTransactionAt DateTime Most recent transaction timestamp

Transaction

Labels: :Transaction

Standard baseline properties:

Property Type Description
transactionId String Unique transaction identifier
amount Float Transaction amount (positive value)
currency String ISO currency code (e.g., "GBP", "EUR")
date DateTime Transaction timestamp
message String Description/reference text
type String "SWIFT", "FASTER_PAYMENT", "SEPA"

Our extension:

Property Type Description
riskScore Float Risk assessment score from our ML model

Customer

Labels: :Customer

Standard baseline properties:

Property Type Description
customerId String Unique customer identifier
firstName String Customer's first name (optional)
middleName String Customer's middle name (optional)
lastName String Customer's surname (optional)
dateOfBirth Date Customer's date of birth
placeOfBirth String Customer's place of birth
countryOfBirth String Customer's country of birth

Country

Labels: :Country

Property Type Description Example
code String ISO 3166-1 alpha-2 code "GB"
name String Full country name "United Kingdom"

Relationships

Standard Baseline Relationships

-- Account initiates transaction
(a:Account)-[:PERFORMS]->(t:Transaction)

-- Transaction benefits receiving account
(t:Transaction)-[:BENEFITS_TO]->(a:Account)

-- Customer owns account
(c:Customer)-[:HAS_ACCOUNT]->(a:Account)

-- Account is domiciled in country
(a:Account)-[:IS_HOSTED]->(c:Country)

### Our Extensions

-- Account is held at institution (our addition)
(a:Account)-[:IS_HELD_AT]->(i:Institution)

2. How to model the data if it comes in out of order

Here is some cypher code that implements your comments below:

Where I'm stuck at the moment is how to model, define the Cypher code (for the Kafka Connect Neo4J sink) to edges/link the outbound Txn event with the outbound accountHolder (Payer) onto the inbound accountHolder (Payee) as a PaidFundsTo relationship, and then similar define the inbound event for the inbound accountHolder (Payee) originating from the outbound accountHolder (Payer) as a ReceivedFundsFrom Edge/Link.

Each transaction event have unique eventId's, but do share a common transactionID.
Each event additionally have an accountId and counterAccountId.

for the outbound we have an accountId which matches the Payer.accountID and a counterAccountId matching the Payee's accountID.

for the inbound we have a accountId which matches the inbound Payee.accountId and counterAccountId which now matches the payers.accountId.

Cypher:

//--------------------
// OUT-OF-ORDER EVENT PROCESSING EXAMPLE
// 
// KEY INSIGHT: "Data coming in the wrong way" is NOT a problem!
// This example proves that events can arrive in ANY order and still produce
// the correct graph structure using the BASELINE DATA MODEL.
// 
// BASELINE RELATIONSHIPS USED:
// - (Account)-[:PERFORMS]->(Transaction) - Account initiates transaction
// - (Transaction)-[:BENEFITS_TO]->(Account) - Transaction benefits receiving account
//--------------------

// ===== SCENARIO 1: TRANSACTION A - OUTBOUND EVENT ARRIVES FIRST =====

// Step 1A: Process OUTBOUND event for Transaction A
// Event data: eventId="EVT-OUT-A", transactionId="TXN-A", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
// 
// EXPLANATION: The outbound event knows:
// - Who is paying (accountId = TEST-PAYER) 
// - Who is receiving (counterAccountId = TEST-PAYEE)
// - The transaction details
//
// Create COMPLETE baseline model with Institution nodes
MERGE (internalBank:Institution {institutionId: "BANK-INTERNAL"})
ON CREATE SET
    internalBank.name = "Our Bank",
    internalBank.swiftCode = "OURBANK22",
    internalBank.institutionType = "BANK"

MERGE (externalBank:Institution {institutionId: "BANK-EXTERNAL"})
ON CREATE SET
    externalBank.name = "External Bank",
    externalBank.swiftCode = "EXTBANK33",
    externalBank.institutionType = "BANK"

MERGE (payerAccount:Account {accountNumber: "TEST-PAYER"})  // accountId from event
ON CREATE SET 
    payerAccount:Internal,
    payerAccount.accountType = "CURRENT",
    payerAccount.trustScore = 80.0

MERGE (payeeAccount:Account {accountNumber: "TEST-PAYEE"})  // counterAccountId from event
ON CREATE SET
    payeeAccount:External,
    payeeAccount.accountType = "CURRENT", 
    payeeAccount.trustScore = 70.0

// BASELINE MODEL: Connect accounts to their institutions
MERGE (payerAccount)-[:IS_HELD_AT]->(internalBank)
MERGE (payeeAccount)-[:IS_HELD_AT]->(externalBank)

// Create baseline transaction model
MERGE (txnA:Transaction {transactionId: "TXN-A"})
ON CREATE SET
    txnA.amount = 1000.00,
    txnA.currency = "GBP",
    txnA.date = datetime("2024-07-24T10:00:00"),
    txnA.message = "Payment A",
    txnA.riskScore = 15.0

// Standard baseline relationships
MERGE (payerAccount)-[:PERFORMS]->(txnA);

// BASELINE MODEL: The outbound event only establishes the PERFORMS relationship
// The payer account performs the transaction (this is what the outbound event knows)

// Step 2A: Process INBOUND event for Transaction A (ARRIVES LATER)
// Event data: eventId="EVT-IN-A", transactionId="TXN-A", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event knows:
// - Who received the funds (accountId = TEST-PAYEE)
// - Who sent the funds (counterAccountId = TEST-PAYER) 
// - The transaction details from the receiver's perspective
//
// IMPORTANT: All nodes already exist from the outbound event!
// We just complete the BENEFITS_TO relationship (what the inbound event knows)
MATCH (txnA_existing:Transaction {transactionId: "TXN-A"})
MATCH (payeeAccount_existing:Account {accountNumber: "TEST-PAYEE"})  // accountId from inbound event
MATCH (payerAccount_existing:Account {accountNumber: "TEST-PAYER"})  // counterAccountId from inbound event
WITH txnA_existing, payeeAccount_existing
// Complete the baseline transaction model
MERGE (txnA_existing)-[:BENEFITS_TO]->(payeeAccount_existing);

// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// The transaction benefits the payee account (this is what the inbound event knows)
// Note: No additional relationship needed - the Transaction node connects everything

// ===== SCENARIO 2: TRANSACTION B - INBOUND EVENT ARRIVES FIRST =====

// Step 1B: Process INBOUND event for Transaction B (ARRIVES FIRST!)
// Event data: eventId="EVT-IN-B", transactionId="TXN-B", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event arrives first this time, but that's fine!
// - We know who received (accountId = TEST-PAYEE)
// - We know who sent (counterAccountId = TEST-PAYER)
// - We create what we know about and set up the BENEFITS_TO relationship
// Note: Institution nodes and IS_HELD_AT relationships already exist from Transaction A
MERGE (payeeAccount_b:Account {accountNumber: "TEST-PAYEE"})  // accountId from inbound event
MERGE (payerAccount_b:Account {accountNumber: "TEST-PAYER"})  // counterAccountId from inbound event

// Create baseline transaction (same transactionId will be used by outbound event later)
MERGE (txnB:Transaction {transactionId: "TXN-B"})
ON CREATE SET
    txnB.amount = 2000.00,
    txnB.currency = "GBP", 
    txnB.date = datetime("2024-07-24T11:00:00"),
    txnB.message = "Payment B",
    txnB.riskScore = 25.0

// Standard baseline relationship from inbound perspective
MERGE (txnB)-[:BENEFITS_TO]->(payeeAccount_b);

// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// Even though this arrives first, we can still create the complete relationship
// The Transaction node serves as the hub connecting both accounts

// Step 2B: Process OUTBOUND event for Transaction B (ARRIVES LATER)
// Event data: eventId="EVT-OUT-B", transactionId="TXN-B", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
//
// EXPLANATION: The outbound event arrives second, but finds everything already exists!
// - Transaction node exists
// - Both account nodes exist
// - We just add the missing PERFORMS relationship (baseline model)
MATCH (txnB_existing:Transaction {transactionId: "TXN-B"})
MATCH (payerAccount_b_existing:Account {accountNumber: "TEST-PAYER"})  // accountId from outbound event
MATCH (payeeAccount_b_existing:Account {accountNumber: "TEST-PAYEE"})  // counterAccountId from outbound event
WITH txnB_existing, payerAccount_b_existing
// Complete baseline transaction model
MERGE (payerAccount_b_existing)-[:PERFORMS]->(txnB_existing);

// BASELINE MODEL: The outbound event establishes the PERFORMS relationship
// This completes the baseline transaction model:
// (Payer)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Payee)

//--------------------
// VERIFICATION QUERIES
// These prove that both scenarios produce the same final graph structure
//--------------------

// Show final state of Transaction A (outbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-A"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction A Final State" as scenario,
       payerBank.name as payer_bank,
       payer.accountNumber as payer_account,
       payeeBank.name as payee_bank,
       payee.accountNumber as payee_account,
       txn.transactionId as transaction_id,
       txn.amount as amount,
       "OUTBOUND_FIRST" as arrival_order;

// Show final state of Transaction B (inbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-B"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction B Final State" as scenario,
       payerBank.name as payer_bank,
       payer.accountNumber as payer_account,
       payeeBank.name as payee_bank,
       payee.accountNumber as payee_account,
       txn.transactionId as transaction_id,
       txn.amount as amount,
       "INBOUND_FIRST" as arrival_order;

//--------------------
// KEY TAKEAWAYS
//--------------------
// 1. Event arrival order DOES NOT MATTER
// 2. Each event creates the graph elements it knows about using BASELINE MODEL
// 3. MERGE operations ensure no duplicates
// 4. transactionId links both events together via the Transaction node
// 5. Final graph follows COMPLETE baseline model: (Institution)<-[:IS_HELD_AT]-(Account)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Account)-[:IS_HELD_AT]->(Institution)
// 6. Transaction node serves as the central hub - no direct account-to-account relationships needed
// 7. Institution nodes provide complete institutional context for compliance and reporting
// 8. This approach naturally handles the distributed/async nature of Kafka while maintaining 100% data model standards

Hope this all makes sense!

1 Like

Hi there

Thanks, will work through this, need to see how I map this as a target model into my source data, which is fixed... I need to pull out of there and see if i can push it into this.

appreciate.

G

You can do the translation in the cypher when writing to Neo4j.

If you give me an example record, I can give you some sample code to give you something to review.

taking if into chat, as I'm sharing ISV "propriety" data...

G