graphSAGE with List<Float>

Hi, I'm new at neo4j, trying to use the graphSAGE with float list property.
I'm using GDS 1.5ver, so i thought it can be work.

Firstly, i tested with the example on the graphSAGE documentation.
:link: https://neo4j.com/docs/graph-data-science/current/algorithms/graph-sage/

I made node and relation using example cypher, created the graph.
Oh, i changed the creating person node little like (with float list)
" ( dan:Person {name: 'Dan', age: 20, heightAndWeight: [185.0, 75.0]}),"

But i got error when train with this graph. :sob:

The error message is
Failed to invoke procedure gds.beta.graphSage.train: Caused by: org.ejml.MatrixDimensionException: 4 != 3 The 'A' and 'B' matrices do not have compatible dimensions

why does the error occur and what should i do ?

2 Likes

Hi.

Can you please do me a favor and verify what happens if you run the example as is (without changing to a list of floats)?

Thanks!

1 Like

Hi, I'm doing something similar, although replicating the original proteins test done in the graphSAGE paper (following the directions from this tutorial). I'm able to get the model to train fine when I simply feed in each node property as a separate feature to the featureProperties setting in gds.beta.graphSage.train(). However, when I create a graph projection using only the property that is already a list of floats, it fails with the same MatrixDimensionException error.

If I had to guess, likely the graphSAGE training algo is looking to the length of the featureProperties list of strings to dictate the dimensionality of that input, instead of inspecting
the properties that that list of strings is pointing to. Note that this graph has effectively two copies of the "embeddings" list/vector stored as properties on each node: one stored as 'embeddings_all' that is a list of floats, and the other stored with each individual list element stored in node properties 'embedding_i' where i goes from 0 to 49. Here are the full code sets for what works and what doesn't:

What works, but requires each value of a list to be stored as separate properties (here called 'embedding_0' through 'embedding_49')

//Create a graph projection
UNWIND range(0,49) as i
WITH collect('embedding_' + toString(i)) as embeddings
CALL gds.graph.create('train_noList','Train',
  {INTERACTS:{orientation:'UNDIRECTED'}},
  {nodeProperties:embeddings}) 
YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount

Then

//Train the model
UNWIND range(0,49) as i
WITH collect('embedding_' + toString(i)) as embeddings
CALL gds.beta.graphSage.train('train_noList',{
  modelName:'proteinModel',
  aggregator:'pool',
  batchSize:512,
  activationFunction:'relu',
  epochs:10,
  sampleSizes:[25,10],
  learningRate:0.0000001,
  embeddingDimension:256,
  featureProperties:embeddings,
  degreeAsProperty: false})
YIELD modelInfo
RETURN modelInfo

What I 'd like to do, but fails

//Create graph projection
CALL gds.graph.create('train','Train',
  {INTERACTS:{orientation:'UNDIRECTED'}},
  {nodeProperties:'embeddings_all'}) 
YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount

Then

//Train the model
CALL gds.beta.graphSage.train('train',{
  modelName:'proteinModel',
  aggregator:'pool',
  batchSize:512,
  activationFunction:'relu',
  epochs:10,
  sampleSizes:[25,10],
  learningRate:0.0000001,
  embeddingDimension:256,
  featureProperties:['embeddings_all'],
  degreeAsProperty: false})
YIELD modelInfo
RETURN modelInfo

Returns

ClientError: [Procedure.ProcedureCallFailed] Failed to invoke procedure `gds.beta.graphSage.train`: Caused by: org.ejml.MatrixDimensionException: 50 != 1 The 'A' and 'B' matrices do not have compatible dimensions

Hello Clair,
yes, If i run the example as is,
ex. ( dan:Person {name: 'Dan', age: 20, heightAndWeight: [185, 75]}),

I can make nodes and create the graph with the example cypher.
But when I'm trying to train graphSAGE, the error occurs like below.
:small_red_triangle_down:
Neo.ClientError.Procedure.ProcedureCallFailed
It says:
Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.IllegalStateException: Unknown ValueType LONG_ARRAY

The error message says "Unknown Value Type LONG ARRAY",
i thought i need to make the type of "heightAndWeight" (which has list value) to float list.

So i fixed cypher like below, and create the graph with same code (as example).
( dan:Person {name: 'Dan', age: 20, heightAndWeight: [185.0, 75.0]})

And try to train graphSAGE with this,
I got Neo.ClientError.Procedure.ProcedureCallFailed error,
it says:
Failed to invoke procedure gds.beta.graphSage.train: Caused by: org.ejml.MatrixDimensionException: 4 != 3 The 'A' and 'B' matrices do not have compatible dimensions

Thanks for the follow-up info! Let me ping some people and get back to you...

Thanks for posting @song and @emigre459 - you uncovered a bug, and we've just issued a patch release :slight_smile:


You can grab the new version off our download center or on github - you'll want 1.5.2

1 Like

That's awesome, thanks @alicia_frame1 ! I'm doing dev testing on Neo4j Desktop right now and it doesn't seem to see the new plugin version (just telling me 1.5.1 is the newest available). Does that need to be indexed or something so that it will show up there? What is the likely timeframe for that if so?

hm, there might be a lag before it shows up in desktop.

What you can do - if you want it right away - is to grab the plugin from the download center (download it and unzip the .jar file) and install it manually. In desktop, on your project (with the DB stopped), select "Manage" then click on "open folder," and you can navigate to your plugins folder. Remove the 1.5.1 jar, and replace it with the 1.5.2 version, and then restart your database.

1 Like

Cool, that works. Thanks! A new question arises now that I'm running in GDS v1.5.2: I'm getting this error: Failed to invoke procedure gds.beta.graphSage.train: Caused by: java.lang.OutOfMemoryError: Java heap space.

Do you know why that may be? I'm using the same memory config (min of 1 GB heap, max of 4 GB) that I used to successfully train the non-list version of the model, but training the list version seems to use up more memory somehow. I'll test to make sure I can get the previously-working non-list version to work with the new version of GDS, just to be safe, and post here the results.

I've confirmed @alicia_frame1 that it runs out of memory for the old way without lists that worked in 1.5.1. Perhaps a memory leak was introduced somehow?

I don't think it's a memory leak, but I'll double check.

In general, lists take up quite a bit more space in memory than doubles. The best thing to do is increase your heap, if you can. If you run gds.graphSage.train.estimate (with everything else the same as if you were running .train) it should give you as estimate of how much memory it might consume.

Ah, thanks for the tip! Upon restarting Neo4j Desktop and adjusting the heap, I was able to get the list version to run. Weirdly, the memory estimate for the list version has a smaller max memory estimate when compared to the non-list version.

1 Like