Error in graph projection

I have created a database:

Nodes are created containing data regarding monthly precipitation with properties such as node name and monthly precipitation amount, monthly temperatures with their properties, and the hydrothermal coefficient with its properties: the cht value and the type of drought.

MERGE(r2c:YRC{name:'Rainfall 2002 Center', Rainfall:604}); . . .
MERGE (t5c:YTC{name:'Temperature 2005 Center', temp:10.5}); . . .
MERGE (y5C:YCHTC{name:CHT 2005 Center', cht:1, drought:0}); . . .
MERGE(soy05:Productivity{name:'Productivity 2005', harvest:18}); . . .

Relationships are created between the nodes:

// :YRC→ :YCHTC.
MATCH ( r ), ( c ) WHERE Id( r )= 652 and id(c)= 611 MERGE ( r ) - [rel: DETERMINE]->( c );
// :YTC β†’ :YCHTC MATCH ( t ), ( c ) WHERE Id( t )= 672 and id(c)= 611 MERGE (t ) - [rel: DETERMINE]->( c );
// :YCHTC β†’ :Productivity
MATCH ( c ), ( p ) WHERE Id( c )= 611 and id(p)= 631 MERGE (c ) - [rel: DETERMINE] -> ( p );
. . .

  1. I developed a prediction model that uses this information to predict drought levels for future periods.

a) Creating a pipeline for training: CALL gds.beta.pipeline.nodeClassification.create('pipe')

Configuring the pipeline

b) Adding a node property step to the pipeline. Here, the input graph contains a cht property:

CALL gds.beta.pipeline.nodeClassification.addNodeProperty('pipe', 'alpha.scaleProperties', { nodeProperties: 'cht', scaler: 'L1Norm', mutateProperty:'scaledSizes' }) YIELD name, nodePropertySteps

c) Selecting features for the pipeline:

CALL gds.beta.pipeline.nodeClassification.selectFeatures('pipe', ['scaledSizes', 'cht']) YIELD name, featureProperties

// Adding a logistic regression model with default configuration:
CALL gds.beta.pipeline.nodeClassification.addLogisticRegression('pipe') YIELD parameterSpace;

// Adding a random forest model: CALL gds.alpha.pipeline.nodeClassification.addRandomForest('pipe', {numberOfDecisionTrees: 5}) YIELD parameterSpace;

// Adding a multi-layer perceptron model with weighted focal loss: CALL gds.alpha.pipeline.nodeClassification.addMLP('pipe', {classWeights: [0.4,0.3,0.3], focusWeight: 0.5}) YIELD parameterSpace;

// Adding a logistic regression model with an interval parameter:
CALL gds.beta.pipeline.nodeClassification.addLogisticRegression('pipe', {maxEpochs: 500, penalty: {range: [1e-4, 1e2]}}) YIELD parameterSpace RETURN parameterSpace.RandomForest AS randomForestSpace, parameterSpace.LogisticRegression AS logisticRegressionSpace, parameterSpace.MultilayerPerceptron AS MultilayerPerceptronSpace;

//I have configured autotuning:
CALL gds.alpha.pipeline.nodeClassification.configureAutoTuning('pipe', { maxTrials: 2 }) YIELD autoTuningConfig

Training the pipeline

The following statement will project a graph using a native projection and store it in the graph catalog under the name "dtGraph".
CALL gds.graph.project('dtGraph', { YCHTC: { properties: ['cht', 'drought'] }, YCHTCp: { properties: 'cht' } }, '*' )

   I got the following error:

Failed to invoke procedure gds.graph.project: Caused by: java.lang.UnsupportedOperationException: Cannot safely convert 0.80 into an long value.

It looks like it is having trouble converting 0.80 into a long. Does your 'cht' and/or 'drought' properties contain decimal values? I see from your merge code that you are as setting 'cht' and 'drought' with integer values. If you do have decimal values in your data, then try setting all your values to decimal values and not mix them, i.e. use 1.0 and 0.0 instead. This may be relevant based on the follow snippet from the manuals:

Maybe the projection when created is typing a property as long since it encounters a long first then errors when it loads a decimal value for the same property .

Just a guess, as I have not used these routines.

Thank you very much. The cht property had an integer value and I set it to real. But another error appeared:

Failed to invoke procedure gds.beta.pipeline.nodeClassification.train: Caused by: java.lang.IllegalArgumentException: Node with ID 651 has invalid feature property value NaN for property scaledSizes

╒═══════╀═════════════════════════════════════════════════╕
β”‚"id(n)"β”‚"n" β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════════════════════════════════════════════════║
β”‚651 β”‚{"drought":0,"name":"CHT 2002 Center","cht":1.0} β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚652 β”‚{"drought":1,"name":" CHT 2003 Center","cht":0.8}β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚653 β”‚{"drought":0,"name":"CHT 2004 Center","cht":1.2} β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚654 β”‚{"drought":0,"name":"CHT 2005 Center","cht":1.6} β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

CALL gds.beta.pipeline.nodeClassification.train('dtGraph', {
  pipeline: 'pipe',
  targetNodeLabels: ['YCHTC'],
  modelName: 'nc-pipeline-model',
  targetProperty: 'drought',
  randomSeed: 1337,
  metrics: ['ACCURACY', 'OUT_OF_BAG_ERROR']
}) YIELD modelInfo, modelSelectionStats
RETURN
  modelInfo.bestParameters AS winningModel,
  modelInfo.metrics.ACCURACY.train.avg AS avgTrainScore,
  modelInfo.metrics.ACCURACY.outerTrain AS outerTrainScore,
  modelInfo.metrics.ACCURACY.test AS testScore,
  [cand IN modelSelectionStats.modelCandidates | cand.metrics.ACCURACY.validation.avg] AS validationScores

I think in this call you are adding property β€˜scaledSizes’ to the model, but the nodes don’t seem to have a value fir this property.

CALL gds.beta.pipeline.nodeClassification.selectFeatures('pipe', ['scaledSizes', 'cht']) YIELD name, featureProperties