Node Classification with two node labels

philip · July 21, 2021, 12:54pm

Hello,

I am trying to run a node classification on a fraud dataset.
The relevant properties are splitted between two nodes: Customer [age, gender, fastRP) and Transaction [amount, fraud]. If I run the code, I get this error:

The feature properties ['age_group', 'amount', 'fastrp_embedding', 'gender_group'] are not present for all requested labels. Requested labels: ['Customer', 'Transaction']. Properties available on all requested labels: ['']

CALL gds.alpha.ml.nodeClassification.train('fraud_model_data', {
   nodeLabels: ['Transaction','Customer'],
   modelName: 'fraud-model-properties',
   featureProperties: ['age_group', 'fastrp_embedding', 'gender_group','amount'], 
   targetProperty: 'fraud',
   metrics: ['F1_WEIGHTED','ACCURACY'],
   holdoutFraction: 0.2,
   validationFolds: 5,
   randomSeed: 2,
   params: [
       {penalty: 0.0625, maxIterations: 1000},
       {penalty: 0.125, maxIterations: 1000},
       {penalty: 0.25, maxIterations: 1000},
       {penalty: 0.5, maxIterations: 1000},
       ]
    }) YIELD modelInfo

If I only select a single nodeLabel ('Transaction' or 'Customer'), I am able to see the properties of the selected node but not the properties from the other node.

This is the code to create the in-memory graph:

CALL gds.graph.create(
    'fraud_model_data', {
        Customer: { 
            label: 'Customer',
            properties: {
                fastrp_embedding:{property:'fastRPExtended-embedding', defaultValue:0},
                gender_group:{property:'gender_group', defaultValue:0},
                age_group:{property:'age_group', defaultValue:0}
            }
         },
        Transaction: { 
            label: 'Transaction',
            properties: {
                fraud:{property:'fraud', defaultValue:0},
                amount:{property:'amount', defaultValue:0},
                category_group:{property:'category_group', defaultValue:0}
            }
        },
        Bank: { 
            label: 'Bank',
            properties: {
            }
        }
    },
    '*'
)
YIELD graphName, nodeCount, relationshipCount;

Do you have any solution for this problem? Thank you very much!

alicia_frame1 · July 21, 2021, 8:41pm

You'll need to either:

create a mono-partite projection (so you only have customers) using a Cypher Projection or collapse path, or
pad the missing properties with default values when you load the graph (so Bank nodes have a fraud property but it's always 0, for example).

If you choose the second option, you'll likely need to post process your predictions, because there's no easy way to tell the node classification model not to predict banks could be fraudulent. Although, using bank nodes as part of your negative dataset - and making sure they aren't incorrectly predicted to be fraudsters - could be part of your model tuning and evaluation.

philip · July 22, 2021, 11:31am

Thanks @alicia_frame1 for the tips.

Unfortunately, I am not sure how to create a mono-partite projection since for example the same customer did 5 normal transactions and 1 fraud transaction. In theory, I would need to replicate the a customer node as often as they did a transaction and project every attribute of the Transaction node (fraud, amount) to the specific Customer node. Do I understand it correctly?

Sadly I don't know how to implement it in Neo4j - could you help me with this?

philip · August 3, 2021, 2:14pm

@alicia_frame1 do you have any advice fo me?

nuraishahzaidi01 · November 16, 2021, 5:24am

Hi Philip,

I encountered the same problem. I was wondering, did you ever resolved this issue? and if yes, may I know how you did it?

Thank you.

Topic		Replies	Views
GDS ML Node Classification givin errors on properties Graph Algorithms/Graph Data Science cypher , data-modeling	2	464	August 8, 2021
Graph embedding using GDS library Graph Algorithms/Graph Data Science embedding	5	1272	March 8, 2021
Node Classification using gdsl Neo4j Graph Platform migrated	6	196	August 18, 2022
Issue incorporating centrality features in gds node classification model Graph Algorithms/Graph Data Science operations	2	380	July 17, 2023
BankSim Fraud Detection - ML Comparison Graph Algorithms/Graph Data Science	5	661	October 8, 2021

Get Certified in June!

Node Classification with two node labels

Related topics