How does FastRPExtended from gds work?

ben_akres · October 26, 2021, 6:02am

I have been working with FastRP and have started to use the extended version of the algorithm which is able to handle node properties. However, the original paper for FastRP (https://arxiv.org/pdf/1908.11512.pdf) does not have this capability, and I am struggling to understand how this works from the documentation (Fast Random Projection - Neo4j Graph Data Science).

Does anyone have a source as to how this is implimented behind the scenes (including mathematical details), or if not can anyone explain how this works. I understand that the property portion of the fastrpextended embedding comes from a linear combination of randomly generated property vectors, but why would we expect this to work usefully theoretically? Furthermore, if an attribute of the nodes is a vector which encodes similarity between the nodes in a certain domain (e.g. text embedding associated with each node), is fastrpextended able to make use of this notion of similarity in the resulting embeddings? Essentially I would just like to understand how this algorithm works and what it achieves in greater detail.

alicia_frame1 · November 1, 2021, 5:13pm

Good question! We're updating our documentation to cover this, but for now...

The FastRPExtended embedding concatenates an embedding describing the graph topology with an embedding that describes a node's properties, and the properties of the surrounding graph. When you set embeddingDimension to X, and propertyDimension to Y, the first (X-Y) entries of the embeddings and all intermediate embeddings created during the run behave like classic fastrp and the last Y encode the feature embedding. The two are almost independent except we do some L2 normalizations after each iteration, and then entire vectors are normalized across all of the 128 entries.

Explaining how it works probably makes the most sense with an example. Let's say you're encoding an embedding of length 128, and you have set propertyDimension to 64, and you have 5 properties. What fastRPExtended will do is:

your first node property out of the 5 will get assigned a 64 dimensional random vector. the same will happen for the other 4 node properties, they also get 64-dimensional random vector
if you have a node whose values are (n {p1: 1, p2: 2 , p3:3 , p4: 4, p5:5}) , then we construct a random vector for that node which is 1 * <the random vector for p1> + 2* <random vector for p2> + ... + 5*< the random vector for p5> , where the "random vector for p" are obtained in steps 1&2
take the vector from step 3 and use it to fill the last 64 entries of the "initial vector" for the node
continue running normal fastrp ignoring that the initial vectors for the nodes have been initialized in a different way

Topic		Replies	Views
How does FastRPExtended from gds work? Neo4j Graph Platform migrated	1	175	January 12, 2023
FastRP Embedding does not change when I change the value of the property Graph Data Science / Graph Analytics	1	418	March 17, 2022
Any suggestion on how to improve fastRP/fastExtendedRP algorithm's performance? Neo4j Graph Platform	0	295	May 25, 2021
FastRP Extended...Extended? Help with textual node property compatibility Graph Data Science / Graph Analytics	0	257	April 7, 2022
Unipartite graph projection required for FastRPExtended Graph Data Science / Graph Analytics	4	537	November 4, 2021

How does FastRPExtended from gds work?

Related topics