Does Neo4j GraphSage work for Heterogeneous graphs?

From my understanding, the original GraphSAGE algorithm only works for homogenous graphs.
For heterogenous graphs to work, a lot of changes have to be made to the message passing algorithms for different nodes.

Does Neo4j's GraphSage work for Heterogeneous graphs?

Hi, @shaowei !

Neo4j's GraphSAGE algorithm do supports multi-label mode (heterogenous graphs). You can enable the multi-label mode by specifying the projectedFeatureDimension configuration parameter of the algorithm.

Some important assumptions:

  • A requirement for multi-label mode is that each node belongs to exactly one label.
  • A GraphSAGE model trained in this mode must be applied on graphs with the same schema with regards to node labels and properties.

Neo4j GraphSAGE can be run on top of heterogeneous graph. The thing is, from my understanding, is that the algorithm does not differentiate between different node labels or relationship types, so basically it treats every heterogenous graph as homogeneous.

Thanks @bratanic_tomaz . @alejandropuerto , can you confirm what @bratanic_tomaz said is correct?

Sorry, I would say is not correct. To make this clear, I'll show you an example:
We have this example graph with two labels and two relationship types.
imagen

We can generate the graph with the following code:

MATCH
  (dan:Person {name: "Dan"}),
  (annie:Person {name: "Annie"}),
  (matt:Person {name: "Matt"}),
  (brie:Person {name: "Brie"}),
  (john:Person {name: "John"})
CREATE
  (guitar:Instrument {name: 'Guitar', cost: 1337.0}),
  (synth:Instrument {name: 'Synthesizer', cost: 1337.0}),
  (bongos:Instrument {name: 'Bongos', cost: 42.0}),
  (trumpet:Instrument {name: 'Trumpet', cost: 1337.0}),
  (dan)-[:LIKES]->(guitar),
  (dan)-[:LIKES]->(synth),
  (dan)-[:LIKES]->(bongos),
  (annie)-[:LIKES]->(guitar),
  (annie)-[:LIKES]->(synth),
  (matt)-[:LIKES]->(bongos),
  (brie)-[:LIKES]->(guitar),
  (brie)-[:LIKES]->(synth),
  (brie)-[:LIKES]->(bongos),
  (john)-[:LIKES]->(trumpet)

Now, we create the projection:

CALL gds.graph.create(
  'persons_with_instruments',
  {
    Person: {
      label: 'Person',
      properties: ['age', 'heightAndWeight']
    },
    Instrument: {
      label: 'Instrument',
      properties: ['cost']
    }
  }, {
    KNOWS: {
      type: 'KNOWS',
      orientation: 'UNDIRECTED'
    },
    LIKES: {
      type: 'LIKES',
      orientation: 'UNDIRECTED'
    }
})

Notice that the projection includes two different labels and two different types.

We can now run GraphSAGE in multi-label mode on that graph by specifying the projectedFeatureDimension parameter. Also, you must take into account the assumptions I mentioned in my first reply.

CALL gds.beta.graphSage.train(
  'persons_with_instruments',
  {
    modelName: 'multiLabelModel',
    featureProperties: ['age', 'heightAndWeight', 'cost'],
    projectedFeatureDimension: 4
  }
)

Based on this, we can observe that GraphSAGE works on heterogenous graphs and do differentiate between different node labels or relationship types.

Hope this helps. If it does, you can set this answer as the solution :slight_smile:

1 Like

For docs on running GraphSAGE on a multi-label graph, check these out: https://neo4j.com/docs/graph-data-science/current/algorithms/graph-sage/#_train_with_multiple_node_labels

We automatically encode the node labels for you and pad out the missing feature dimensions (eg. People have age and heightAndWeight properties, but not cost, while Instruments have cost but not age or heightAndWeight). We also modified the code for GraphSAGE, when you set the projectedFeatureDimension to appropriately handle the different feature sets :slight_smile: