How does FastRPExtended from gds work?

I have been working with FastRP and have started to use the extended version of the algorithm which is able to handle node properties. However, the original paper for FastRP ( does not have this capability, and I am struggling to understand how this works from the documentation (Fast Random Projection - Neo4j Graph Data Science).

Does anyone have a source as to how this is implimented behind the scenes (including mathematical details), or if not can anyone explain how this works. I understand that the property portion of the fastrpextended embedding comes from a linear combination of randomly generated property vectors, but why would we expect this to work usefully theoretically? Furthermore, if an attribute of the nodes is a vector which encodes similarity between the nodes in a certain domain (e.g. text embedding associated with each node), is fastrpextended able to make use of this notion of similarity in the resulting embeddings? Essentially I would just like to understand how this algorithm works and what it achieves in greater detail.

Good question! We're updating our documentation to cover this, but for now...

The FastRPExtended embedding concatenates an embedding describing the graph topology with an embedding that describes a node's properties, and the properties of the surrounding graph. When you set embeddingDimension to X, and propertyDimension to Y, the first (X-Y) entries of the embeddings and all intermediate embeddings created during the run behave like classic fastrp and the last Y encode the feature embedding. The two are almost independent except we do some L2 normalizations after each iteration, and then entire vectors are normalized across all of the 128 entries.

Explaining how it works probably makes the most sense with an example. Let's say you're encoding an embedding of length 128, and you have set propertyDimension to 64, and you have 5 properties. What fastRPExtended will do is:

  1. your first node property out of the 5 will get assigned a 64 dimensional random vector. the same will happen for the other 4 node properties, they also get 64-dimensional random vector
  2. if you have a node whose values are (n {p1: 1, p2: 2 , p3:3 , p4: 4, p5:5}) , then we construct a random vector for that node which is 1 * <the random vector for p1> + 2* <random vector for p2> + ... + 5*< the random vector for p5> , where the "random vector for p" are obtained in steps 1&2
  3. take the vector from step 3 and use it to fill the last 64 entries of the "initial vector" for the node
  4. continue running normal fastrp ignoring that the initial vectors for the nodes have been initialized in a different way