Hi, I'm doing something similar, although replicating the original proteins test done in the graphSAGE paper (following the directions from this tutorial). I'm able to get the model to train fine when I simply feed in each node property as a separate feature to the featureProperties
setting in gds.beta.graphSage.train()
. However, when I create a graph projection using only the property that is already a list of floats, it fails with the same MatrixDimensionException
error.
If I had to guess, likely the graphSAGE training algo is looking to the length of the featureProperties
list of strings to dictate the dimensionality of that input, instead of inspecting
the properties that that list of strings is pointing to. Note that this graph has effectively two copies of the "embeddings" list/vector stored as properties on each node: one stored as 'embeddings_all' that is a list of floats, and the other stored with each individual list element stored in node properties 'embedding_i' where i goes from 0 to 49. Here are the full code sets for what works and what doesn't:
What works, but requires each value of a list to be stored as separate properties (here called 'embedding_0' through 'embedding_49')
//Create a graph projection
UNWIND range(0,49) as i
WITH collect('embedding_' + toString(i)) as embeddings
CALL gds.graph.create('train_noList','Train',
{INTERACTS:{orientation:'UNDIRECTED'}},
{nodeProperties:embeddings})
YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount
Then
//Train the model
UNWIND range(0,49) as i
WITH collect('embedding_' + toString(i)) as embeddings
CALL gds.beta.graphSage.train('train_noList',{
modelName:'proteinModel',
aggregator:'pool',
batchSize:512,
activationFunction:'relu',
epochs:10,
sampleSizes:[25,10],
learningRate:0.0000001,
embeddingDimension:256,
featureProperties:embeddings,
degreeAsProperty: false})
YIELD modelInfo
RETURN modelInfo
What I 'd like to do, but fails
//Create graph projection
CALL gds.graph.create('train','Train',
{INTERACTS:{orientation:'UNDIRECTED'}},
{nodeProperties:'embeddings_all'})
YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount
Then
//Train the model
CALL gds.beta.graphSage.train('train',{
modelName:'proteinModel',
aggregator:'pool',
batchSize:512,
activationFunction:'relu',
epochs:10,
sampleSizes:[25,10],
learningRate:0.0000001,
embeddingDimension:256,
featureProperties:['embeddings_all'],
degreeAsProperty: false})
YIELD modelInfo
RETURN modelInfo
Returns
ClientError: [Procedure.ProcedureCallFailed] Failed to invoke procedure `gds.beta.graphSage.train`: Caused by: org.ejml.MatrixDimensionException: 50 != 1 The 'A' and 'B' matrices do not have compatible dimensions