GDS node embedding aggregation

- neo4j 5.5
- gds 2.4.5

In my GDS project, I've got embeddings for each node through GraphSage. Now I am looking forward to aggregate all these node embeddings into one, considered as my graph embedding, therefore I can check the cosine similarity between two graph embeddings.

My current solution is to output my node embedding results to my python client-end. Yet both the output procesure and my python aggregation takes time.

Does GDS (or any other neo4j plugins) have any built-in functions for such aggregation? If not, any suggestions for aggregation methods?

Hi @yuzr1,

Did you try using APOC's aggregation functionality?

If that doesn't suit your use-case there are at least two ways to speed up the export of the node embeddings a great deal:

  • Fastest: (Requires GDS Enterprise Edition) Enable the GDS Arrow server (enabled by default in AuraDS), and then use gds.graph.nodeProperty.stream with the GDS Python client. So you'd first have to mutate your graph with the GraphSAGE embeddings, and then call the node streaming method
  • Faster than regular streaming: Install the rust extension for the Neo4j Python driver. This should speed up any streaming done using the Python driver or the GDS client which uses the driver under the hood

Hope this is helpful,
Adam