Applying GraphSage to Heterogenous graph

Hi,

I have a graph with heterogenous nodes and relationships. I have 5 different node labels and 6 different relationship types.
I am applying the GraphSage algorithm in a multilabel model. However, when I generate and stream embeddings, I only get two columns with nodeIds and embeddings. How to interpret such a result for multiple nodes?

nodeId embedding
0 [0.1948958694445715, 0.0001800546782075261, 0.19406631559315193, 0.19493872366563952, 0.000705097004648917, 0.000003388974594228501, 0.19083289810989526, 0.000002014584449745825, 7.051919794437064e-7, 0.000006999474761642623, 0.19423149778931623, 0.19368949482178377, 0.002917233017627386, 0.19304021179085323, 0.00003119599340087245, 0.00008882960684746318, 0.1925640923719248, 0.19486124822647047, 0.19444818740201195, 0.017719491062357838, 9.611256840356973e-7, 0.0000016192473754360628, 0.0000030463409349330754, 2.7329713812546727e-7, 0.19363178132921924, 0.1946949389375109, 0.001968607945078218, 4.5727801354503396e-7, 0.0000017458586442969841, 8.848725513553522e-7, 3.324837029674128e-7, 0.000001356462587999999, 0.00045292929695413253, 0.19485227385862508, 3.4330067938633304e-7, 3.979646159503678e-8, 0.19459916188500834, 0.1948096174743879, 0.19437476565380093, 0.19439199220294745, 0.0901633916180792, 0.19304614785026583, 0.1949326164544649, 0.194935558724247, 9.107071592555197e-7, 2.848278838804885e-7, 0.000001955619799570531, 0.0003524311002902929, 0.009713880674989597, 0.19465093839812256, 0.19424193243023521, 0.19473764108228483, 0.0003059292348285726, 0.03795620207712272, 0.0000023390707642761213, 0.0000031972776815036726, 1.8109230756074904e-7, 0.0226481851339385, 0.19493755214313557, 0.09268200040512822, 0.1947008658348285, 0.19493731325147484, 6.906014403934934e-7, 0.000009713789533021416]
1 [0.19489660585375337, 0.00021976438633121483, 0.1938858283318875, 0.19494980737642087, 0.0007925734813813353, 0.0000029886229860751536, 0.18972849054036656, 0.000001831607063540768, 6.59616104709603e-7, 0.000006599911942566881, 0.19401633091617515, 0.19336799130910104, 0.003035813519195428, 0.19220538798634684, 0.000022142642267670633, 0.00010047104047833898, 0.19183325868757425, 0.19485365657086157, 0.19437125399595426, 0.023316289682309862, 8.187896804456681e-7, 0.000001505150823064471, 0.000002790863094787745, 2.287574336933576e-7, 0.1932631290783527, 0.19464484804983703, 0.0023829834717379534, 4.0724396122551104e-7, 0.0000016090668048522257, 8.093514938907032e-7, 3.105759346655915e-7, 0.000001258265502024247, 0.0003304535487910961, 0.19485709455534003, 2.9724275247797387e-7, 3.583908645688664e-8, 0.19449846920804384, 0.19480252764940226, 0.19425054008983375, 0.19427500979301568, 0.09574442857513082, 0.19238500218328222, 0.19494248134592224, 0.19494561441680872, 8.116472743864722e-7, 2.451041812317067e-7, 0.0000020736895969837454, 0.0004001666743944087, 0.007031296980850237, 0.194588961997742, 0.19404125084044718, 0.19470349250412428, 0.00036648979373229284, 0.032622165760702114, 0.000002621364332222492, 0.0000032309241989432535, 1.6227262970308154e-7, 0.01731015526929791, 0.19494807668035272, 0.09976893621536377, 0.19464958039983907, 0.19494818977864148, 6.373061438393313e-7, 0.000009662496414628383]

Thanks

Hi @kamalika.ray ,
I am not sure what you are asking for. Do you want to know more details on how GraphSage uses the different node labels or why the result does not contain the nodeLabels.

the embeddings are created using the node label information.
The algorithm does not return the node label as thats part of your input.

To return the embeddings for individual labels, you can first mutate the embeddings (instead of stream). And then use gds.graph.nodeProperties.stream(<graph>, <mutateProperty>, {listNodeLabels: true}.
Alternatively you can also use gds.util.asNode with the stream mode to attach the labels to the result.

CALL gds.beta.graphSage.stream(...) 
 YIELD nodeId, embedding 
 WITH nodeId, labels(gds.util.asNode(nodeId)) AS labels, embedding
 RETURN nodeId, labels, embedding 

Hi @florentin_dorre,

Thank you for your reply. I have two questions:

  1. I want to return the embeddings for all the individual labels. I don't see the stream method for a multilabel node model in the document for GraphSage. It would be very helpful if you could guide me to any source regarding the full process of applying GraphSage to a multilabel node model.
  2. How can I retrieve the properties for the nodeId and labels given in the embedding output?

Thanks

@kamalika.ray I dont understand whats missing in my previous answer.

for (1) the last part of my comment gives you the solution. There is no special stream mode for multilabel. We internally see that the model is multi-label and apply it correctly.

(2) The easiest is to use gds.util.asNode(nodeId)