The topic title is basically my question. I reviewed the fastRP documentation and it appears (implicitly) that a unipartite projection is required for both fastRP and fastRPExtended algorithms. Is this true?
Even though the General configuration for algorithm execution on a named graph has the option to include Node Labels.
FastRP (and FastRPExtended) were originally designed with monopartite graphs in mind (1 node type / 1 relationship type).
It's possible to run them on heterogeneous graphs - theoretically, the embeddings themselves should encode that the nodes are different because the topology around different types of nodes is significantly different (eg. people & restaurants - people visit a few different restaurants, each restaurant is visited by many people).
You can also one hot encode the labels themselves as properties - which we do in graph sage. You'd add a property like NodeLabel
with a vector of potential labels - [0 1]
for restaurant, [1 0]
for person etc. However, if you take this approach, you'll also need to pad out missing features (so say, people have ages, but restaurants don't, restaurants have a number of stars, people don't) by loading them into the in memory graph with a default value (like 0).
You'll have to tune them to make sure they distinguish the different types (try using node classification for a quick check, and adjusting accordingly), but they should be able to encode the information. Becuase they're not explicitly built for heterogeneous graphs (like ReScal or ComplEx), they won't be perfect, but they're a start - and much faster!
Thanks Alicia.
I have a follow-up question regarding one hot encoding.
Say I have a categorical property for a guest node, call the property guest.tier
There are multiple levels for tier (e.g., gold, silver, platinum, etc.,). I one hot encoded those tiers.
`MATCH (g:Guest)
WITH collect(distinct g.tier) as tiers
MATCH (g1:Guest)
SET g1.tierEmbedding = gds.alpha.ml.oneHotEncoding(tiers, [g1.tier])
RETURN count(*)`
So far so good. However, I want to use that one hot encoded property and others as featureProperties
for the fastRPExtended
algorithm. The fastRPExtended
documentation notes that:
All property names must exist in the in-memory graph and be of type Float or List<Float>.
Is there a convenient way in cypher to convert the list of integers generated by the one hot encoding into floats so that it can be used as a fastRPExtended
featureProperty
?
I brute forced it this way:
MATCH (g:Guest)
UNWIND g.tierEmbedding as x
with toFLoat(x) as newEmbeddingValue,g
with collect(newEmbeddingValue) as newEmbedding,g
set g.tierEmbedding = newEmbedding
I'd also like to know if casting to float is necessary.
I think that will be fixed in our next release - available next thursday