cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 1: replicate users.

Graph embedding using GDS library

georg_kf_heiler
Node Clone

How do the embeddings of GDS handle categorical attributes and different relation or node types?

From the examples of 6.7. Node embeddings - Chapter 6. Algorithms it looks like the projection should contain equal types of nodes and relations only and numeric weights.

If this are the current limitations - are there any plans to address them? Especially for supporting multiple relation and node types?

5 REPLIES 5

GraphSAGE supports multiple node types - see the section 'train with multiple labels' , and as of 1.5 GraphSAGE also supports lists as input features.


If you have categorical data, you'll need to either encode it yourself or use our one hot encoding utility function.


We don't currently have support for multiple relationship types on our roadmap, but we'll look into options. Each embedding technique (FastRP, Node2Vec and GraphSAGE) is formulated for different types of graphs (mono- or multi-partite, weighted or unweighted, with and without properties.


Outside of Neo4j, I'd recommend looking at Zitnik et al's Decagon embedding as one approach.

Hi,
I am trying to use the gds.beta.graphSage.train module and having some challenges. Firstly with the multi-label approach. If I create my graph catalogue with the following code I should get two node labels (output & tx) along with relationship (PAYS):
CALL gds.graph.create(
'addresses_with_transactions',
{
output: {
label: 'output',
properties: ['onehotencode', 'pageRank', 'depth']
},
tx: {
label: 'tx',
properties: ['pageRank', 'depth', 'time_stamp']
}
}, {
PAYS: {
type: 'PAYS',
properties: ['amount','time_stamp']
}
})

When I try to train a model using these properties i get an error
First, I tried this way:
CALL gds.beta.graphSage.train(
'addresses_with_transactions',
{
modelName: 'TrainModel',
featureProperties: ['onehotencode', 'pageRank', 'depth', 'time_stamp', 'amount'],
projectedFeatureDimension: 5
}
)
I got this error:
Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.IllegalArgumentException: The feature properties ['amount'] are not present for all requested labels. Requested labels: ['output', 'tx']. Properties available on all requested labels: ['depth', 'pageRank']

then tried another way:

CALL gds.beta.graphSage.train(
'addresses_with_transactions',
{
modelName: 'weightedTrainedModel',
featureProperties: ['onehotencode', 'pageRank', 'depth', 'time_stamp'],
aggregator: 'mean',
activationFunction: 'sigmoid',
sampleSizes: [25, 10],
degreeAsProperty: true,
relationshipWeightProperty: 'amount',
nodeLabels: ['output', 'tx'],
relationshipTypes: ['PAYS']
}
)

and got the a similar error:

Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.IllegalArgumentException: The following node properties are not present for each label in the graph: [onehotencode, time_stamp]. Properties that exist for each label are [pageRank, depth]

So i decided to try it with just one node label (output) and just use the properties associated to 'output' in my catalogue, i used this:
CALL gds.beta.graphSage.train(
'addresses_with_transactions',
{
modelName: 'weightedTrainedModel',
featureProperties: ['onehotencode', 'pageRank', 'depth'],
aggregator: 'mean',
activationFunction: 'sigmoid',
sampleSizes: [25, 10],
degreeAsProperty: true,
relationshipWeightProperty: 'amount',
nodeLabels: ['output'],
relationshipTypes: ['PAYS']
}
)

The got an error:
Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.IllegalStateException: Unknown ValueType LONG_ARRAY
However, I thought this data structure type was ok to use?
according to this page: Node Properties - Neo4j Graph Data Science
The following table lists the supported property types, as well as, their corresponding fallback values.

** Long - Long.MIN_VALUE*
** Double - NaN*
** Long Array - null*
** Float Array - null*
** Double Array - null*

In sum, I must be doing something wrong I cannot get the multi-label mode working with my model and I am having a problem with the LongArray type....any help greatly appreciated 🙂

Thanks,
Adam

PS. Just to let you know the function works when i strip out the long array and just use:
CALL gds.beta.graphSage.train(
'addresses_with_transactions',
{
modelName: 'weightedTrainedModel',
featureProperties: ['pageRank'],
aggregator: 'mean',
activationFunction: 'sigmoid',
sampleSizes: [25, 10],
degreeAsProperty: true,
relationshipWeightProperty: 'amount',
nodeLabels: ['output'],
relationshipTypes: ['PAYS']
}
)

You'll want to add a defaultValue configuration parameter to your graph create statement so that no properties are blank. That's covered here

So taking your code snippet, you'll want to load each property separately, in order to be able to set a default value:

tx: {
   label: 'tx',
   properties: {
      pageRank: {
         property:'pageRank',
         defaultValue: 0
      },
      depth: {
         property: 'depth',
         defaultValue: 0
      },
      timeStamp: {
         property:'timeStamp',
         defaultValue: 0
      }
   }
}


That way, you're able to set a value for any instance where you're missing the property (it doesn't have to be 0, it could be whatever you want)

Thanks Alicia. I have updated my catalog and tried to re-run the training procedure. However, I think it is a limit in what data type the current implementation supports? As I get the following error:
Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.IllegalStateException: Unknown ValueType LONG_ARRAY

Digging around and I found the following reference:
https://towardsdatascience.com/using-graphsage-embeddings-for-downstream-classification-model-4492e0...
The current implementation of the GraphSAGE algorithm supports only node features that are of type Float.
This means I might have to follow the approach they have taken:
For this reason, you will include the decoupled node properties ranging from embedding_0 to embedding_49 in the graph projection instead of a single property embeddings_all, which holds all the node features in the form of a list of Floats.

If you're using GDS 1.5, we just added support for lists (of floats): GraphSAGE - Neo4j Graph Data Science

Hopefully that makes your life easier!