Link Prediction Pipeline - Node properties do not exist

Hi, I received the following error when training the link prediction pipeline:

Failed to invoke procedure gds.beta.pipeline.linkPrediction.train: Caused by: java.lang.IllegalArgumentException: Node properties [xxx1, xxx2] defined in the feature steps do not exist in the graph or part of the pipeline.

I ran the gds.pipeline.list and got the following- there are node properties in the feature steps, but the nodePropertySteps is empty. Does it mean I need to add node properties into nodePropertySteps? If so, what is the syntax to add existing node property into nodepropertysteps please?
{
"splitConfig": {
"testFraction": 0.2,
"validationFolds": 3,
"trainFraction": 0.4,
"negativeSamplingRatio": 2.0
},
"autoTuningConfig": {
"maxTrials": 10
},
"featurePipeline": {
"nodePropertySteps": ,
"featureSteps": [
{
"name": "HADAMARD",
"config": {
"nodeProperties": [
"embedding",
"ProposalEncoding"
]
}
}
]
},
"trainingParameterSpace": {
"MultilayerPerceptron": ,
"RandomForest": ,
"LogisticRegression": [
{
"minEpochs": 1,
"maxEpochs": 5000,
"focusWeight": 0.0,
"patience": 2,
"tolerance": 0.001,
"learningRate": 0.001,
"batchSize": 100,
"penalty": {
"range": [
0.0001,
10000.0
]
},
"methodName": "LogisticRegression",
"classWeights":
}
]
}
}

Hi @guanyuxiong2020 !

Would you be able to share your code for projecting the graph you are training on, and the creation of the pipeline you want to know?

As the error message says "Node properties [xxx1, xxx2] defined in the feature steps do not exist in the graph or part of the pipeline" your graph projection must either contain the node property steps, or you must have node property steps in your link prediction pipeline producing these properties (example: Configuring the pipeline - Neo4j Graph Data Science).

Regards,
Adam

Hi Adam @adam_schill-col !

could you also help me please? I have the same problem an the same error. But my properties are projected in graph.

my graph:

CALL gds.graph.project(
    'myGraph1',
    {
        Project: {
            label: 'Project',
            properties: [
                'projectPlannedRevenue', 
                'projectRevenueRatio', 
                'projectOverallSlippagePercent', 
                'projectCostRatio', 
                'projectTeamSize', 
                'projectDuration', 
                'totalRiskValue', 
                'avgRiskValue', 
                'riskCount', 
                'revenueToCostRatio',
                'embedding'
            ]
        },
        Risk: {
            label: 'Risk',
            properties: [
                'riskValue',  
                'riskDuration',
                'embedding'
            ]
        }
    },
    {
        HAS_RISK: {
            type: 'HAS_RISK',
            orientation: 'UNDIRECTED'
        }
    }
);

my query to train the pipeline:

CALL gds.beta.pipeline.linkPrediction.train('myGraph1', {
  pipeline: 'pipe',
  modelName: 'lp-pipeline-model',
  metrics: ['AUCPR', 'OUT_OF_BAG_ERROR'],
  targetRelationshipType: 'HAS_RISK',
  randomSeed: 12
}) YIELD modelInfo, modelSelectionStats
RETURN
  modelInfo.bestParameters AS winningModel,
  modelInfo.metrics.AUCPR.train.avg AS avgTrainScore,
  modelInfo.metrics.AUCPR.outerTrain AS outerTrainScore,
  modelInfo.metrics.AUCPR.test AS testScore,
  [cand IN modelSelectionStats.modelCandidates | cand.metrics.AUCPR.validation.avg] AS validationScores

(EDIT: also asked in Error by invoking gds.beta.pipeline.linkPrediction.train procedure. Node properties do not exist in the graph. But they do! - #3 by alfeeva.anastasia)