I have a graph that links every species by its taxon for mammals. See below small example for Hominoidea:
There are five organisms (HSA, PPS, PTR, GGO, PON) at the end of this lineage. Only organisms that are at the end of the lineage have the property of kegg=kegg_genome_id. Each of these nodes has relationships to a different node type labelled as KO (functional orthologs). See the example below just for two organisms. The same KO nodes can link to many mammalian organisms like elephant, human or a mouse (or even to all mammals),
This results in a network with 337 (111 are organisms) taxa nodes and 12142 Ko nodes and over 1,200,000 relations.
Now i want to build a model that would predict based on KO whenever a given species belongs toEuarchontoglires. Every organism node that is linked to Euarchontoglires has a property category=1. The rest of the organisms have the property category=0.
This was just an introduction.
What I want to know is how I can calculate node2vec ONLY for these organism nodes. We do not want to have embeddings for KO nodes.
I have a projected graph:
I do not know how to write gds.beta.node2vec.write only for the nodes that I will later use for ML.
MATCH (n:Taxa) WHERE n.kegg is not null RETURN n.name, n.category, n.n2v_all_nodes