Let me quickly describe me graph.
I have a graph that links every species by its taxon for mammals. See below small example for Hominoidea:
There are five organisms (HSA, PPS, PTR, GGO, PON) at the end of this lineage. Only organisms that are at the end of the lineage have the property of kegg=kegg_genome_id. Each of these nodes has relationships to a different node type labelled as KO (functional orthologs). See the example below just for two organisms. The same KO nodes can link to many mammalian organisms like elephant, human or a mouse (or even to all mammals),
This results in a network with 337 (111 are organisms) taxa nodes and 12142 Ko nodes and over 1,200,000 relations.
Now i want to build a model that would predict based on KO whenever a given species belongs toEuarchontoglires. Every organism node that is linked to Euarchontoglires has a property category=1. The rest of the organisms have the property category=0.
This was just an introduction.
What I want to know is how I can calculate node2vec ONLY for these organism nodes. We do not want to have embeddings for KO nodes.
I have a projected graph:
I do not know how to write gds.beta.node2vec.write only for the nodes that I will later use for ML.
Can u guide me?
You can probably rather use cypher-projection where you can use arbitrary filters and pattern matches to determine the nodes and relationships to be projected into the in-memory graph.