Unipartite graph projection required for FastRPExtended

stu_v_kerr · July 1, 2021, 4:29am

The topic title is basically my question. I reviewed the fastRP documentation and it appears (implicitly) that a unipartite projection is required for both fastRP and fastRPExtended algorithms. Is this true?
Even though the General configuration for algorithm execution on a named graph has the option to include Node Labels.

alicia_frame1 · July 1, 2021, 4:05pm

FastRP (and FastRPExtended) were originally designed with monopartite graphs in mind (1 node type / 1 relationship type).

It's possible to run them on heterogeneous graphs - theoretically, the embeddings themselves should encode that the nodes are different because the topology around different types of nodes is significantly different (eg. people & restaurants - people visit a few different restaurants, each restaurant is visited by many people).

You can also one hot encode the labels themselves as properties - which we do in graph sage. You'd add a property like NodeLabel with a vector of potential labels - [0 1] for restaurant, [1 0] for person etc. However, if you take this approach, you'll also need to pad out missing features (so say, people have ages, but restaurants don't, restaurants have a number of stars, people don't) by loading them into the in memory graph with a default value (like 0).

You'll have to tune them to make sure they distinguish the different types (try using node classification for a quick check, and adjusting accordingly), but they should be able to encode the information. Becuase they're not explicitly built for heterogeneous graphs (like ReScal or ComplEx), they won't be perfect, but they're a start - and much faster!

stu_v_kerr · July 2, 2021, 3:57am

Thanks Alicia.
I have a follow-up question regarding one hot encoding.
Say I have a categorical property for a guest node, call the property guest.tier

There are multiple levels for tier (e.g., gold, silver, platinum, etc.,). I one hot encoded those tiers.

`MATCH (g:Guest) 
WITH collect(distinct g.tier) as tiers
MATCH (g1:Guest) 
SET g1.tierEmbedding = gds.alpha.ml.oneHotEncoding(tiers, [g1.tier])
RETURN count(*)`

So far so good. However, I want to use that one hot encoded property and others as featureProperties for the fastRPExtended algorithm. The fastRPExtended documentation notes that:
All property names must exist in the in-memory graph and be of type Float or List<Float>.
Is there a convenient way in cypher to convert the list of integers generated by the one hot encoding into floats so that it can be used as a fastRPExtended featureProperty?

I brute forced it this way:

MATCH (g:Guest) 
UNWIND g.tierEmbedding as x
with toFLoat(x) as newEmbeddingValue,g
with collect(newEmbeddingValue) as newEmbedding,g
set g.tierEmbedding = newEmbedding

frank.deviney · November 3, 2021, 2:53pm

I'd also like to know if casting to float is necessary.

alicia_frame1 · November 4, 2021, 4:48pm

I think that will be fixed in our next release - available next thursday

Topic		Replies	Views
How does FastRPExtended from gds work? Graph Algorithms/Graph Data Science	1	358	November 1, 2021
How does FastRPExtended from gds work? Neo4j Graph Platform migrated	1	152	January 12, 2023
FastRP - Different value but same embedding Graph Algorithms/Graph Data Science	1	414	March 19, 2022
Any suggestion on how to improve fastRP/fastExtendedRP algorithm's performance? Neo4j Graph Platform	0	273	May 25, 2021
FastRP Embedding - Tuning Suggestions to preserve Monopartite Hop Distance Graph Algorithms/Graph Data Science fastrp	3	386	September 7, 2023

Unipartite graph projection required for FastRPExtended

Related topics