GDS results combination

  • neo4j 5.5, gds 2.4.5

I have a GDS project with several nodes, and I get both node embedding and centrality results on this project by doing:

call gds.beta.graphSage.train(
    'Project101',
    {
        modelName:'Model101',
        featureProperties: ['label'],
        aggregator: 'mean',
        activationFunction: 'sigmoid',
        randomSeed: 1337,
        sampleSizes: [3, 3]
    }
)

CALL gds.beta.graphSage.stream(
     'Project101',
    {
     modelName: 'Model101'
    }
)
YIELD nodeId, embedding

which gave me the embedding for each node,

nodeId | score
101    | 0.95
102    | 0.90

and:

CALL gds.eigenvector.stream(
     'Project101'
)
YIELD nodeId, score

which gave me the centrality score for each node

nodeId | embedding
101    | [0.02, 0.015, 0.01 ...]
102    | [0.03, 0.013, 0.04 ...]

Now I am trying to output a dataframe like

nodeId | score | embedding
101    | 0.95  | [0.02, 0.015, 0.01 ...]
102    | 0.90  | [0.03, 0.013, 0.04 ...]

by combining both outputs above together. How should I deal with my Cypher?

P.S. Please do not use write mode and try only complete it on neo4j without any other coding platform.

I would suggest to use the mutate mode and use gds.graph.nodeProperties.stream to combine them.

In general I would advise you to look into our Python client for GDS if you want a pandas dataframe.

1 Like

Thank you for this cool combination method!
Using mutate mode and use gds.graph.nodeProperties.stream, I now get

nodeId | nodeProperty| propertyValue
101    | "score "    | 0.95
101    | "embedding" | [0.02, 0.015, 0.01 ...]
102    | "score "    | 0.90
102    | "embedding" | [0.03, 0.013, 0.04 ...]

However there is still one more step left to the form I want.

nodeId | score | embedding
101    | 0.95  | [0.02, 0.015, 0.01 ...]
102    | 0.90  | [0.03, 0.013, 0.04 ...]

Sure I can use pandas or other python packages to reform it. However, my neo4j server is much more powerful then my python server, hence I hope to finish it on neo4j only.
Would there be any Cypher for such dataframe transform?

Figured out this following Cypher, workable:

CALL gds.graph.nodeProperties.stream(
        'Project101', 
        ['embedding', 'score']
)
YIELD nodeId, nodeProperty, propertyValue
MATCH (n) 
WHERE id(n)=nodeId
WITH 
nodeId, 
CASE WHEN nodeProperty = "score" THEN propertyValue ELSE [] END AS score, 
CASE WHEN nodeProperty = "embedding" THEN propertyValue ELSE [] END AS embedding
WITH nodeId, collect(score)[0] AS embedding, collect(score)[1] AS score
RETURN nodeId, score, embedding

Glad you got something to run

on the python client we offer a parameter called separate_property_columns were we do this transformation for you.