Project is something between tech explore and Blog covering what I've learned

georgelza · July 22, 2025, 6:29pm

Probably nothing mind blowing.

Real time Financial data stream from a source (for development it is from comprised gz data files), published onto Kakfa, from there we're using Kafka Connect Sink framework to sink messaged into Neo4J. As AccountHolders and TransactionEvents.

Where we link Banks (node/vertices) to AccountHolders (node/vertices) and AccountHolders to TransactionEvents, each transaction is represented as 2 Events, an outbound event from the Payer and a Inbound event on the Payee side,

Each event has a Risk score associated with it, each AccountHolder has a TrustScore associated with it.

Below is a graphical overview of the current idea, MVP1. Currently have the Banks loaded, they are a very static list so got loaded from a csv file into Bank nodes

Next we have the AccountHolders, we have a inbound and outbound AccountHolder, they get extracted from Inbound and Outbound transactions, they are first associated with their banks, this way we dynamically expand our known AccountHolders as we get transactions...

This is all done/working...

Where I'm stuck at the moment is how to model, define the Cypher code (for the Kafka Connect Neo4J sink) to edges/link the outbound Txn event with the outbound accountHolder (Payer) onto the inbound accountHolder (Payee) as a PaidFundsTo relationship, and then similar define the inbound event for the inbound accountHolder (Payee) originating from the outbound accountHolder (Payer) as a ReceivedFundsFrom Edge/Link.

Each transaction event have unique eventId's, but do share a common transactionID.
Each event additionally have an accountId and counterAccountId.

for the outbound we have an accountId which matches the Payer.accountID and a counterAccountId matching the Payee's accountID.

for the inbound we have a accountId which matches the inbound Payee.accountId and counterAccountId which now matches the payers.accountId.

Currently I show AccountHolders, understand AccountHolders can be (for now will be) associated with a single Person (or as a Corporate node)

Future
Future I want to change the accountHolder and the owner separate. An Account has an accountHolder which is a corporate or person, if corporate then we also define owners of the corporate.

The plan is to further model the Person (or corporate) that is associated with AccountHolder containing a SSN or company Registration number, providing the ability to match "Person/s" that have accounts at multiple banks and Companies that have multiple Accounts at multiple banks (Not depicted at this point).

Also been thinking of making an fin transaction as an event... on the account, with the potential capability of handling other types of events on an account.

for current, See below.

This is the idea for lets call it phase 1 - blog 1... The next bit after this might be to add some AI/ML, some how, thats a different rabbit hole to then explore...

G

john.stegeman · July 23, 2025, 12:08pm

My colleague at Neo4j just recently published this - may be worth having a look at it

georgelza · July 23, 2025, 12:34pm

Hi John

Awesome, thank you.

Def be looking at this... Looking at what I see... some diagrams and pdf downloadable makes for easy off line reading,

Appreciate share.
Gives ideas for future...

My model however is being "made to fit" our ISV's application/structure/flow/design...

So the share is great to read/understand, but can see it does not fit my need/pattern.

G

georgelza · July 24, 2025, 5:57am

bump bump.

so would love to get some feedback, my thinking makes me think it would be better to model a payment as a event and the event in this question is of type payment, thus allowing me to handle account and accountholder detail changes as update events also.

I then tie the transaction (as a node) to the outbound payer as a edge (TAG:PaidFundsTo or Debit_Event) using the outbound event

and tie the transaction/node to the inbound payee account/node using the inbound event and (TAG:ReceivedFundsFrom or Credit_Event).

G

michael.down · July 24, 2025, 2:11pm

Hello, I have read through your comments, and there is a lot to try and unpick, but here is a start.

1. How to model your data.

The data model you were sent in the post above was led by me, but contributed to by many people who work on fraud solutions daily (as it is a major use case in our industry).

The idea is that it is a starting point and one you can work from. Whether you use it or not is up to you, but the way we model transactions, for example, is based on new features around QPP (Quantified Path Patterns) and enhancements coming through GQL.

Your example aligns nicely with a few additions, which is EXACTLY what we want. We never planned to boil the ocean with the data model. Here is how it relates:

Constraints and Indices

-- Core constraints from baseline model
CREATE CONSTRAINT account_number IF NOT EXISTS
FOR (a:Account) REQUIRE a.accountNumber IS NODE KEY;

CREATE CONSTRAINT transaction_id IF NOT EXISTS
FOR (t:Transaction) REQUIRE t.transactionId IS NODE KEY;

CREATE CONSTRAINT customer_id IF NOT EXISTS
FOR (c:Customer) REQUIRE c.customerId IS NODE KEY;

CREATE CONSTRAINT counterparty_id IF NOT EXISTS
FOR (cp:Counterparty) REQUIRE cp.counterpartyId IS NODE KEY;

CREATE CONSTRAINT country_code IF NOT EXISTS
FOR (c:Country) REQUIRE c.code IS NODE KEY;

-- Additional constraint for Institution nodes (extending the baseline)
CREATE CONSTRAINT institution_id IF NOT EXISTS
FOR (i:Institution) REQUIRE i.institutionId IS NODE KEY;

-- Performance indices for common queries
CREATE INDEX transaction_date_idx IF NOT EXISTS FOR (t:Transaction) ON (t.date);
CREATE INDEX transaction_amount_idx IF NOT EXISTS FOR (t:Transaction) ON (t.amount);
CREATE INDEX account_trust_score_idx IF NOT EXISTS FOR (a:Account) ON (a.trustScore);

Node Labels and Properties

Institution (Extension to baseline model)

Labels: :Institution

This node represents financial institutions in the network:

Property	Type	Description	Example
institutionId	String	Unique identifier	"SBZAZAJJ"
name	String	Institution name	"Barclays"
swiftCode	String	SWIFT/BIC code	"BARCGB22"
institutionType	String	Type of institution	"BANK", "CREDIT_UNION", "FINTECH"

Account

Labels: :Account, with additional labels :Internal (our accounts) or :External (other banks' accounts)

Standard baseline properties:

Property	Type	Description
accountNumber	String	Unique account identifier
accountType	String	"CURRENT", "SAVINGS", "BUSINESS"
openDate	DateTime	Account opening date
closedDate	DateTime	Account closure date (optional)
suspendedDate	DateTime	Account suspension date (optional)

Our extensions:

Property	Type	Description
trustScore	Float	Calculated trust score (0-100)
lastTransactionAt	DateTime	Most recent transaction timestamp

Transaction

Labels: :Transaction

Standard baseline properties:

Property	Type	Description
transactionId	String	Unique transaction identifier
amount	Float	Transaction amount (positive value)
currency	String	ISO currency code (e.g., "GBP", "EUR")
date	DateTime	Transaction timestamp
message	String	Description/reference text
type	String	"SWIFT", "FASTER_PAYMENT", "SEPA"

Our extension:

Property	Type	Description
riskScore	Float	Risk assessment score from our ML model

Customer

Labels: :Customer

Standard baseline properties:

Property	Type	Description
customerId	String	Unique customer identifier
firstName	String	Customer's first name (optional)
middleName	String	Customer's middle name (optional)
lastName	String	Customer's surname (optional)
dateOfBirth	Date	Customer's date of birth
placeOfBirth	String	Customer's place of birth
countryOfBirth	String	Customer's country of birth

Country

Labels: :Country

Property	Type	Description	Example
code	String	ISO 3166-1 alpha-2 code	"GB"
name	String	Full country name	"United Kingdom"

Relationships

Standard Baseline Relationships

-- Account initiates transaction
(a:Account)-[:PERFORMS]->(t:Transaction)

-- Transaction benefits receiving account
(t:Transaction)-[:BENEFITS_TO]->(a:Account)

-- Customer owns account
(c:Customer)-[:HAS_ACCOUNT]->(a:Account)

-- Account is domiciled in country
(a:Account)-[:IS_HOSTED]->(c:Country)

### Our Extensions

-- Account is held at institution (our addition)
(a:Account)-[:IS_HELD_AT]->(i:Institution)

2. How to model the data if it comes in out of order

Here is some cypher code that implements your comments below:

Where I'm stuck at the moment is how to model, define the Cypher code (for the Kafka Connect Neo4J sink) to edges/link the outbound Txn event with the outbound accountHolder (Payer) onto the inbound accountHolder (Payee) as a PaidFundsTo relationship, and then similar define the inbound event for the inbound accountHolder (Payee) originating from the outbound accountHolder (Payer) as a ReceivedFundsFrom Edge/Link.

Each transaction event have unique eventId's, but do share a common transactionID.
Each event additionally have an accountId and counterAccountId.

for the outbound we have an accountId which matches the Payer.accountID and a counterAccountId matching the Payee's accountID.

for the inbound we have a accountId which matches the inbound Payee.accountId and counterAccountId which now matches the payers.accountId.

Cypher:

//--------------------
// OUT-OF-ORDER EVENT PROCESSING EXAMPLE
// 
// KEY INSIGHT: "Data coming in the wrong way" is NOT a problem!
// This example proves that events can arrive in ANY order and still produce
// the correct graph structure using the BASELINE DATA MODEL.
// 
// BASELINE RELATIONSHIPS USED:
// - (Account)-[:PERFORMS]->(Transaction) - Account initiates transaction
// - (Transaction)-[:BENEFITS_TO]->(Account) - Transaction benefits receiving account
//--------------------

// ===== SCENARIO 1: TRANSACTION A - OUTBOUND EVENT ARRIVES FIRST =====

// Step 1A: Process OUTBOUND event for Transaction A
// Event data: eventId="EVT-OUT-A", transactionId="TXN-A", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
// 
// EXPLANATION: The outbound event knows:
// - Who is paying (accountId = TEST-PAYER) 
// - Who is receiving (counterAccountId = TEST-PAYEE)
// - The transaction details
//
// Create COMPLETE baseline model with Institution nodes
MERGE (internalBank:Institution {institutionId: "BANK-INTERNAL"})
ON CREATE SET
    internalBank.name = "Our Bank",
    internalBank.swiftCode = "OURBANK22",
    internalBank.institutionType = "BANK"

MERGE (externalBank:Institution {institutionId: "BANK-EXTERNAL"})
ON CREATE SET
    externalBank.name = "External Bank",
    externalBank.swiftCode = "EXTBANK33",
    externalBank.institutionType = "BANK"

MERGE (payerAccount:Account {accountNumber: "TEST-PAYER"})  // accountId from event
ON CREATE SET 
    payerAccount:Internal,
    payerAccount.accountType = "CURRENT",
    payerAccount.trustScore = 80.0

MERGE (payeeAccount:Account {accountNumber: "TEST-PAYEE"})  // counterAccountId from event
ON CREATE SET
    payeeAccount:External,
    payeeAccount.accountType = "CURRENT", 
    payeeAccount.trustScore = 70.0

// BASELINE MODEL: Connect accounts to their institutions
MERGE (payerAccount)-[:IS_HELD_AT]->(internalBank)
MERGE (payeeAccount)-[:IS_HELD_AT]->(externalBank)

// Create baseline transaction model
MERGE (txnA:Transaction {transactionId: "TXN-A"})
ON CREATE SET
    txnA.amount = 1000.00,
    txnA.currency = "GBP",
    txnA.date = datetime("2024-07-24T10:00:00"),
    txnA.message = "Payment A",
    txnA.riskScore = 15.0

// Standard baseline relationships
MERGE (payerAccount)-[:PERFORMS]->(txnA);

// BASELINE MODEL: The outbound event only establishes the PERFORMS relationship
// The payer account performs the transaction (this is what the outbound event knows)

// Step 2A: Process INBOUND event for Transaction A (ARRIVES LATER)
// Event data: eventId="EVT-IN-A", transactionId="TXN-A", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event knows:
// - Who received the funds (accountId = TEST-PAYEE)
// - Who sent the funds (counterAccountId = TEST-PAYER) 
// - The transaction details from the receiver's perspective
//
// IMPORTANT: All nodes already exist from the outbound event!
// We just complete the BENEFITS_TO relationship (what the inbound event knows)
MATCH (txnA_existing:Transaction {transactionId: "TXN-A"})
MATCH (payeeAccount_existing:Account {accountNumber: "TEST-PAYEE"})  // accountId from inbound event
MATCH (payerAccount_existing:Account {accountNumber: "TEST-PAYER"})  // counterAccountId from inbound event
WITH txnA_existing, payeeAccount_existing
// Complete the baseline transaction model
MERGE (txnA_existing)-[:BENEFITS_TO]->(payeeAccount_existing);

// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// The transaction benefits the payee account (this is what the inbound event knows)
// Note: No additional relationship needed - the Transaction node connects everything

// ===== SCENARIO 2: TRANSACTION B - INBOUND EVENT ARRIVES FIRST =====

// Step 1B: Process INBOUND event for Transaction B (ARRIVES FIRST!)
// Event data: eventId="EVT-IN-B", transactionId="TXN-B", accountId="TEST-PAYEE", counterAccountId="TEST-PAYER"
//
// EXPLANATION: The inbound event arrives first this time, but that's fine!
// - We know who received (accountId = TEST-PAYEE)
// - We know who sent (counterAccountId = TEST-PAYER)
// - We create what we know about and set up the BENEFITS_TO relationship
// Note: Institution nodes and IS_HELD_AT relationships already exist from Transaction A
MERGE (payeeAccount_b:Account {accountNumber: "TEST-PAYEE"})  // accountId from inbound event
MERGE (payerAccount_b:Account {accountNumber: "TEST-PAYER"})  // counterAccountId from inbound event

// Create baseline transaction (same transactionId will be used by outbound event later)
MERGE (txnB:Transaction {transactionId: "TXN-B"})
ON CREATE SET
    txnB.amount = 2000.00,
    txnB.currency = "GBP", 
    txnB.date = datetime("2024-07-24T11:00:00"),
    txnB.message = "Payment B",
    txnB.riskScore = 25.0

// Standard baseline relationship from inbound perspective
MERGE (txnB)-[:BENEFITS_TO]->(payeeAccount_b);

// BASELINE MODEL: The inbound event establishes the BENEFITS_TO relationship
// Even though this arrives first, we can still create the complete relationship
// The Transaction node serves as the hub connecting both accounts

// Step 2B: Process OUTBOUND event for Transaction B (ARRIVES LATER)
// Event data: eventId="EVT-OUT-B", transactionId="TXN-B", accountId="TEST-PAYER", counterAccountId="TEST-PAYEE"
//
// EXPLANATION: The outbound event arrives second, but finds everything already exists!
// - Transaction node exists
// - Both account nodes exist
// - We just add the missing PERFORMS relationship (baseline model)
MATCH (txnB_existing:Transaction {transactionId: "TXN-B"})
MATCH (payerAccount_b_existing:Account {accountNumber: "TEST-PAYER"})  // accountId from outbound event
MATCH (payeeAccount_b_existing:Account {accountNumber: "TEST-PAYEE"})  // counterAccountId from outbound event
WITH txnB_existing, payerAccount_b_existing
// Complete baseline transaction model
MERGE (payerAccount_b_existing)-[:PERFORMS]->(txnB_existing);

// BASELINE MODEL: The outbound event establishes the PERFORMS relationship
// This completes the baseline transaction model:
// (Payer)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Payee)

//--------------------
// VERIFICATION QUERIES
// These prove that both scenarios produce the same final graph structure
//--------------------

// Show final state of Transaction A (outbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-A"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction A Final State" as scenario,
       payerBank.name as payer_bank,
       payer.accountNumber as payer_account,
       payeeBank.name as payee_bank,
       payee.accountNumber as payee_account,
       txn.transactionId as transaction_id,
       txn.amount as amount,
       "OUTBOUND_FIRST" as arrival_order;

// Show final state of Transaction B (inbound first scenario) - COMPLETE BASELINE MODEL
MATCH (payerBank:Institution)<-[:IS_HELD_AT]-(payer:Account {accountNumber: "TEST-PAYER"})
MATCH (payeeBank:Institution)<-[:IS_HELD_AT]-(payee:Account {accountNumber: "TEST-PAYEE"})
MATCH (txn:Transaction {transactionId: "TXN-B"})
MATCH (payer)-[:PERFORMS]->(txn)-[:BENEFITS_TO]->(payee)
RETURN "Transaction B Final State" as scenario,
       payerBank.name as payer_bank,
       payer.accountNumber as payer_account,
       payeeBank.name as payee_bank,
       payee.accountNumber as payee_account,
       txn.transactionId as transaction_id,
       txn.amount as amount,
       "INBOUND_FIRST" as arrival_order;

//--------------------
// KEY TAKEAWAYS
//--------------------
// 1. Event arrival order DOES NOT MATTER
// 2. Each event creates the graph elements it knows about using BASELINE MODEL
// 3. MERGE operations ensure no duplicates
// 4. transactionId links both events together via the Transaction node
// 5. Final graph follows COMPLETE baseline model: (Institution)<-[:IS_HELD_AT]-(Account)-[:PERFORMS]->(Transaction)-[:BENEFITS_TO]->(Account)-[:IS_HELD_AT]->(Institution)
// 6. Transaction node serves as the central hub - no direct account-to-account relationships needed
// 7. Institution nodes provide complete institutional context for compliance and reporting
// 8. This approach naturally handles the distributed/async nature of Kafka while maintaining 100% data model standards

Hope this all makes sense!

georgelza · July 24, 2025, 3:43pm

Hi there

Thanks, will work through this, need to see how I map this as a target model into my source data, which is fixed... I need to pull out of there and see if i can push it into this.

appreciate.

G

michael.down · July 24, 2025, 3:54pm

You can do the translation in the cypher when writing to Neo4j.

If you give me an example record, I can give you some sample code to give you something to review.

georgelza · July 24, 2025, 4:07pm

taking if into chat, as I'm sharing ISV "propriety" data...

G

Topic		Replies	Views
Bank transaction analysis Introduce-Yourself	1	199	August 10, 2020
Neo4j Sink Connector: CDC ingestion with schema strategy not working properly Streaming (Kafka, Spark, Flint)	12	1898	November 29, 2019
Real time log integration Cypher	2	692	February 26, 2019
Graph dataset for Financial Fraud Cypher cypher , knowledge-base	2	692	June 22, 2020
Hello from Reid in Evanston Introduce-Yourself	0	226	March 30, 2020

Demystifying Neo4j UX Research

Project is something between tech explore and Blog covering what I've learned

1. How to model your data.

Constraints and Indices

Node Labels and Properties

Institution (Extension to baseline model)

Account

Transaction

Customer

Country

Relationships

Standard Baseline Relationships

2. How to model the data if it comes in out of order

Related topics