Applying GraphSage to Heterogenous graph

kamalika.ray · August 8, 2023, 10:22am

Hi,

I have a graph with heterogenous nodes and relationships. I have 5 different node labels and 6 different relationship types.
I am applying the GraphSage algorithm in a multilabel model. However, when I generate and stream embeddings, I only get two columns with nodeIds and embeddings. How to interpret such a result for multiple nodes?

nodeId

embedding

0

[0.1948958694445715, 0.0001800546782075261, 0.19406631559315193, 0.19493872366563952, 0.000705097004648917, 0.000003388974594228501, 0.19083289810989526, 0.000002014584449745825, 7.051919794437064e-7, 0.000006999474761642623, 0.19423149778931623, 0.19368949482178377, 0.002917233017627386, 0.19304021179085323, 0.00003119599340087245, 0.00008882960684746318, 0.1925640923719248, 0.19486124822647047, 0.19444818740201195, 0.017719491062357838, 9.611256840356973e-7, 0.0000016192473754360628, 0.0000030463409349330754, 2.7329713812546727e-7, 0.19363178132921924, 0.1946949389375109, 0.001968607945078218, 4.5727801354503396e-7, 0.0000017458586442969841, 8.848725513553522e-7, 3.324837029674128e-7, 0.000001356462587999999, 0.00045292929695413253, 0.19485227385862508, 3.4330067938633304e-7, 3.979646159503678e-8, 0.19459916188500834, 0.1948096174743879, 0.19437476565380093, 0.19439199220294745, 0.0901633916180792, 0.19304614785026583, 0.1949326164544649, 0.194935558724247, 9.107071592555197e-7, 2.848278838804885e-7, 0.000001955619799570531, 0.0003524311002902929, 0.009713880674989597, 0.19465093839812256, 0.19424193243023521, 0.19473764108228483, 0.0003059292348285726, 0.03795620207712272, 0.0000023390707642761213, 0.0000031972776815036726, 1.8109230756074904e-7, 0.0226481851339385, 0.19493755214313557, 0.09268200040512822, 0.1947008658348285, 0.19493731325147484, 6.906014403934934e-7, 0.000009713789533021416]

1

[0.19489660585375337, 0.00021976438633121483, 0.1938858283318875, 0.19494980737642087, 0.0007925734813813353, 0.0000029886229860751536, 0.18972849054036656, 0.000001831607063540768, 6.59616104709603e-7, 0.000006599911942566881, 0.19401633091617515, 0.19336799130910104, 0.003035813519195428, 0.19220538798634684, 0.000022142642267670633, 0.00010047104047833898, 0.19183325868757425, 0.19485365657086157, 0.19437125399595426, 0.023316289682309862, 8.187896804456681e-7, 0.000001505150823064471, 0.000002790863094787745, 2.287574336933576e-7, 0.1932631290783527, 0.19464484804983703, 0.0023829834717379534, 4.0724396122551104e-7, 0.0000016090668048522257, 8.093514938907032e-7, 3.105759346655915e-7, 0.000001258265502024247, 0.0003304535487910961, 0.19485709455534003, 2.9724275247797387e-7, 3.583908645688664e-8, 0.19449846920804384, 0.19480252764940226, 0.19425054008983375, 0.19427500979301568, 0.09574442857513082, 0.19238500218328222, 0.19494248134592224, 0.19494561441680872, 8.116472743864722e-7, 2.451041812317067e-7, 0.0000020736895969837454, 0.0004001666743944087, 0.007031296980850237, 0.194588961997742, 0.19404125084044718, 0.19470349250412428, 0.00036648979373229284, 0.032622165760702114, 0.000002621364332222492, 0.0000032309241989432535, 1.6227262970308154e-7, 0.01731015526929791, 0.19494807668035272, 0.09976893621536377, 0.19464958039983907, 0.19494818977864148, 6.373061438393313e-7, 0.000009662496414628383]

Thanks

florentin_dorre · August 8, 2023, 10:37am

Hi @kamalika.ray ,
I am not sure what you are asking for. Do you want to know more details on how GraphSage uses the different node labels or why the result does not contain the nodeLabels.

the embeddings are created using the node label information.
The algorithm does not return the node label as thats part of your input.

To return the embeddings for individual labels, you can first mutate the embeddings (instead of stream). And then use gds.graph.nodeProperties.stream(<graph>, <mutateProperty>, {listNodeLabels: true}.
Alternatively you can also use gds.util.asNode with the stream mode to attach the labels to the result.

CALL gds.beta.graphSage.stream(...) 
 YIELD nodeId, embedding 
 WITH nodeId, labels(gds.util.asNode(nodeId)) AS labels, embedding
 RETURN nodeId, labels, embedding

kamalika.ray · August 8, 2023, 10:54am

Hi @florentin_dorre,

Thank you for your reply. I have two questions:

I want to return the embeddings for all the individual labels. I don't see the stream method for a multilabel node model in the document for GraphSage. It would be very helpful if you could guide me to any source regarding the full process of applying GraphSage to a multilabel node model.
How can I retrieve the properties for the nodeId and labels given in the embedding output?

Thanks

florentin_dorre · August 10, 2023, 3:16pm

@kamalika.ray I dont understand whats missing in my previous answer.

for (1) the last part of my comment gives you the solution. There is no special stream mode for multilabel. We internally see that the model is multi-label and apply it correctly.

(2) The easiest is to use gds.util.asNode(nodeId)

Topic		Replies	Views
Does Neo4j GraphSage work for Heterogeneous graphs? Graph Algorithms/Graph Data Science	5	678	October 18, 2021
Graph embedding using GDS library Graph Algorithms/Graph Data Science embedding	5	1262	March 8, 2021
GraphSAGE for heterogenous graph Graph Algorithms/Graph Data Science	8	952	February 10, 2022
Accurate GraphSage Embedding Generation Graph Algorithms/Graph Data Science performance , neo4j-desktop	1	395	March 11, 2022
Is it possible to apply GraphSage algorithm in Graph with edges properties? Graph Algorithms/Graph Data Science	5	385	February 24, 2022

Applying GraphSage to Heterogenous graph

Related topics