I am working on a huge graph consisting of 230M nodes and 600M edges. I trained embeddings with using built-in FastRP algorithm. The final objective is finding similar nodes with reference pivot embedding. When evaluating embeddings, cosine similarities are observed very similar even nodes are not similar in real life perspective. When I got similarity with using euclidean distance results have become more logical. I dont understand which approach is better for finding similar nodes ? And, Why cosine similarities are too similar? Is it related to training algorithm ?
That's a great question. FastRP unlike methods like Node2Vec and GraphSAGE, does not explicitly minimize for low cosine (dot product really) similarity between node embeddings of far between nodes. Therefore it definitely seems possible that two topologically very far between nodes still by chance could have a high cosine similarity, though I'm still a bit surprised considering the initial sparsity of the projection.
Having said that, kNN with cosine as similarity measure is the metric they use in the FastRP paper [0] to measure the quality of the node embeddings (it's also commonly used with word embeddings). The key observation here is that for each node we look at it's most similar kNN neighbors and if those are reasonable, then the embedding is deemed good. So note that using this metric it's still possible to have a "good embedding" while topologically far between nodes can have a somewhat high cosine similarity, so long as the similarity of actually relevant nodes to each of those nodes are even higher. Hope that makes sense. I would also like to highlight that it's perfectly possible to try this metric for yourself as GDS supports kNN with cosine similarity https://neo4j.com/docs/graph-data-science/current/algorithms/knn/
As for euclidean distance, considering how the internals of the FastRP algorithm works, it does seem like a metric that the algorithm would do well on. Since at each iteration a node's embedding is a weighted summation of its neighbors' embeddings, it should be geometrically close to those topologically near it, especially after more iterations. And the sparsity of the random initialization probably makes it unlikely that far between nodes have similar embeddings if not too many iterations are run. This would be my intuition at least.
Embedding vectors are normalized and order or ranking of similarity should be the same for both distance metrics. There may be a difference in the magnitude because of squared values but the order doesn't change.
Embedding dimensions and the iteration vector length and magnitude of individual elements in the iteration vector make a difference between keeping similar nodes (ones that are close by in the graph) closer in the embedding space and dissimilar nodes (ones that are few hops away and not in the vicinity of the target node) farther than the similar nodes.