Using Neo4j for Heterogeneous nodes link prediction

I have a question, I want to use Neo4j for Heterogeneous nodes link prediction.
There are 2 types of node labels, and I need to predict the connections between these 2 types of nodes.
Do you have any actual code examples for this?"

I tested using the example from Training the pipeline - Neo4j Graph Data Science.
Then I used my own data (code as follows:), and made many changes (e.g. sourceNodeLabel: 'Person', targetNodeLabel: 'Product', targetRelationshipType: 'BUY').
However, the results seemed unreasonable, with all probabilities being 0.4999.

Could you please advise how I should modify the code? Where did I go wrong? Are there any other examples I can refer to? Thank you.

code: (below)

  (alice:Person {name: 'Alice', age: 38, gender:1}),
  (michael:Person {name: 'Michael', age: 67, gender:0}),
  (karin:Person {name: 'Karin', age: 30, gender:1}),
  (chris:Person {name: 'Chris', age: 52, gender:0}),
  (will:Person {name: 'Will', age: 18, gender:0}),
  (mark:Person {name: 'Mark', age: 20, gender:0}),
  (milk:Product {name: 'milk', price:80}),
  (apple:Product {name: 'apple', price:25}),
  (orange:Product {name: 'orange', price:35}),
  (piapple:Product {name: 'piapple', price:65}),
  (watermelon:Product {name: ' watermelon', price:45}),
  (cantaloupe:Product {name: 'cantaloupe', price:55}),
  (notebook:Product {name: 'notebook', price:25000}),
  (tv:Product {name: 'TV', price:19999}),
  (aircon:Product {name: 'aircondition', price:29000}),
  (shoe_air:Product {name: 'shoe_air', price:5000}),
  (runshoes:Product {name: 'runshoes', price:1999}),
  (sportshoes:Product {name: 'sportshoes', price:3900}),
  (goodshoes:Product {name: 'goodshoes', price:7900}),
  (underwear:Product {name: 'Underwear', price:390}),
  (pants:Product {name: 'Pants', price:499}),
  (jen:Product {name: 'Jen', price:600}),

  (fruit:fruit {name: 'fruit'}),
  (ele:ele {name: 'ele'}),
  (shoe:shoe {name: 'shoe'}),
  (clothe:clothe {name: 'clothe'}),


CALL gds.graph.project(
        age: {defaultValue: 1},
        price: {defaultValue: 1}
        age: {defaultValue: 1},
        price: {defaultValue: 1}
    BUY: {
      orientation: 'UNDIRECTED'
    FRIEND: {},
    BELONG: {}

CALL gds.beta.pipeline.linkPrediction.create('pipe-with-context')

CALL gds.beta.pipeline.linkPrediction.addNodeProperty('pipe-with-context', 'fastRP', {
  mutateProperty: 'embedding',
  embeddingDimension: 256,
  randomSeed: 42,
  contextRelationshipTypes: ['FRIEND', 'BELONG']

CALL gds.beta.pipeline.linkPrediction.addFeature('pipe-with-context', 'hadamard', {
  nodeProperties: ['embedding', 'age', 'price']

CALL gds.beta.pipeline.linkPrediction.configureSplit('pipe-with-context', {
  testFraction: 0.25,
  trainFraction: 0.6,
  validationFolds: 3

CALL gds.alpha.pipeline.linkPrediction.addMLP('pipe-with-context',
{hiddenLayerSizes: [4, 2], penalty: 1, patience: 2})

CALL gds.beta.pipeline.linkPrediction.train('fullGraph', {
  pipeline: 'pipe-with-context',
  modelName: 'lp-pipeline-model-filtered',
  metrics: ['AUCPR', 'OUT_OF_BAG_ERROR'],
  sourceNodeLabel: 'Person',
  targetNodeLabel: 'Product',
  targetRelationshipType: 'BUY',
  randomSeed: 12
}) YIELD modelInfo, modelSelectionStats
  modelInfo.bestParameters AS winningModel,
  modelInfo.metrics.AUCPR.train.avg AS avgTrainScore,
  modelInfo.metrics.AUCPR.outerTrain AS outerTrainScore,
  modelInfo.metrics.AUCPR.test AS testScore,
  [cand IN modelSelectionStats.modelCandidates | cand.metrics.AUCPR.validation.avg] AS validationScores
CALL'fullGraph', {
  modelName: 'lp-pipeline-model-filtered',
  topN: 50,
  threshold: 0
 YIELD node1, node2, probability
 RETURN gds.util.asNode(node1).name AS Person, gds.util.asNode(node2).name AS Product, probability
 ORDER BY Product

Result : (below)

│Person     │Product       │probability       │
│"Alice"    │" watermelon" │0.4999999999999998│
│"Alice"    │"cantaloupe"  │0.4999999999999998│
│"Alice"    │"TV"          │0.4999999999999998│
│"Alice"    │"shoe_air"    │0.4999999999999998│
│"Alice"    │"sportshoes"  │0.4999999999999998│
│"Alice"    │"runshoes"    │0.4999999999999998│
│"Alice"    │"aircondition"│0.4999999999999998│
│"Alice"    │"notebook"    │0.4999999999999998│
│"Chris"    │"milk"        │0.4999999999999998│
│"Chris"    │"cantaloupe"  │0.4999999999999998│
│"Chris"    │"TV"          │0.4999999999999998│
│"Chris"    │"shoe_air"    │0.4999999999999998│
│"Chris"    │"aircondition"│0.4999999999999998│
│"Chris"    │"notebook"    │0.4999999999999998│
│"Chris"    │" watermelon" │0.4999999999999998│
│"Chris"    │"piapple"     │0.4999999999999998│
│"Chris"    │"orange"      │0.4999999999999998│
│"Chris"    │"apple"       │0.4999999999999998│
│"Karin"    │"piapple"     │0.4999999999999998│
│"Karin"    │"cantaloupe"  │0.4999999999999998│
│"Karin"    │"TV"          │0.4999999999999998│
│"Karin"    │"shoe_air"    │0.4999999999999998│
│"Karin"    │"sportshoes"  │0.4999999999999998│
│"Karin"    │"runshoes"    │0.4999999999999998│
│"Karin"    │"aircondition"│0.4999999999999998│
│"Karin"    │"notebook"    │0.4999999999999998│
│"Karin"    │" watermelon" │0.4999999999999998│
│"Karin"    │"orange"      │0.4999999999999998│
│"Karin"    │"apple"       │0.4999999999999998│
│"Mark"     │"milk"        │0.4999999999999998│
│"Mark"     │"orange"      │0.4999999999999998│
│"Mark"     │"TV"          │0.4999999999999998│
│"Mark"     │"shoe_air"    │0.4999999999999998│
│"Mark"     │"runshoes"    │0.4999999999999998│
│"Mark"     │"sportshoes"  │0.4999999999999998│
│"Mark"     │"aircondition"│0.4999999999999998│
│"Mark"     │"notebook"    │0.4999999999999998│
│"Mark"     │"piapple"     │0.4999999999999998│
│"Mark"     │"apple"       │0.4999999999999998│
│"Pants"    │"Alice"       │0.4999999999999998│
│"Pants"    │"Michael"     │0.4999999999999998│
│"Pants"    │"Will"        │0.4999999999999998│
│"Pants"    │"Mark"        │0.4999999999999998│
│"Pants"    │"Chris"       │0.4999999999999998│
│"Underwear"│"Alice"       │0.4999999999999998│
│"Underwear"│"Michael"     │0.4999999999999998│
│"Underwear"│"Mark"        │0.4999999999999998│
│"Underwear"│"Will"        │0.4999999999999998│
│"Underwear"│"Chris"       │0.4999999999999998│
│"Will"     │"milk"        │0.4999999999999998│

Neo4j Desktop-1.5.7
Version 5.18.0Preformatted text

Hi @alanwtmec,

The output does indeed look a bit fishy. Unfortunately there are no other examples I'm aware of that does this.

What is the output of the call to gds.beta.pipeline.linkPrediction.train? Do the scores seem resonable at that point in the pipeline?

It looks like you're leaving out some node labels from your projection: fruit, ele, shoe, clothe. Should these perhaps share a node label instead? That should then also be included in the projection?

Apart from that the code looks ok to me. But it's very noisy to train a model on such a small graph, so hard to predict what you're going to get. To see if you can get interesting scores, here are some things to experiment with:

  • Use a different training method, like logistic regression or random forest
  • Try different parameters for the MLP
  • Try using fewer features for the pipeline, maybe only 'embedding'
  • Experiment with different default values for age and price

Below is output of gds.beta.pipeline.linkPrediction.train.

I have tried another dataset( the Hetionet dataset), but the answer is still all 0.49999.
Is there something I haven't noticed?
Thank you!

Hi again @alanwtmec,

So the train output looks reasonable, that's good.

It's concerning that you get strange scores also on the other dataset.

Is there something I haven't noticed?

I cannot see anything more than what I mentioned in my last post. Did you try any of the things I suggested as experiments? I wonder if there's something fishy going on with the MLP implementation.
