This website uses cookies. By clicking Accept, you consent to the use of cookies. Click Here to learn more about how we use cookies.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forΒ

**Head's Up!** Site migration is underway. Phase 1: replicate users.

- Neo4j
- Technical Discussions
- Neo4j Graph Platform
- Which is better for evaluating FastRP embeddings s...

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Options

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β05-26-2022 08:24 AM

Hi everyone,

I am working on a huge graph consisting of 230M nodes and 600M edges. I trained embeddings with using built-in FastRP algorithm. The final objective is finding similar nodes with reference pivot embedding. When evaluating embeddings, cosine similarities are observed very similar even nodes are not similar in real life perspective. When I got similarity with using euclidean distance results have become more logical. I dont understand which approach is better for finding similar nodes ? And, Why cosine similarities are too similar? Is it related to training algorithm ?

Thanks.

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β06-03-2022 01:44 AM - edited β06-03-2022 01:51 AM

That's a great question. FastRP unlike methods like Node2Vec and GraphSAGE, does not explicitly minimize for low cosine (dot product really) similarity between node embeddings of far between nodes. Therefore it definitely seems possible that two topologically very far between nodes still by chance could have a high cosine similarity, though I'm still a bit surprised considering the initial sparsity of the projection.

Having said that, kNN with cosine as similarity measure is the metric they use in the FastRP paper [0] to measure the quality of the node embeddings (it's also commonly used with word embeddings). The key observation here is that for each node we look at it's most similar kNN neighbors and if those are reasonable, then the embedding is deemed good. So note that using this metric it's still possible to have a "good embedding" while topologically far between nodes can have a somewhat high cosine similarity, so long as the similarity of actually relevant nodes to each of those nodes are even higher. Hope that makes sense. I would also like to highlight that it's perfectly possible to try this metric for yourself as GDS supports kNN with cosine similarity https://neo4j.com/docs/graph-data-science/current/algorithms/knn/ π

As for euclidean distance, considering how the internals of the FastRP algorithm works, it does seem like a metric that the algorithm would do well on. Since at each iteration a node's embedding is a weighted summation of its neighbors' embeddings, it should be geometrically close to those topologically near it, especially after more iterations. And the sparsity of the random initialization probably makes it unlikely that far between nodes have similar embeddings if not too many iterations are run. This would be my intuition at least.

[0]: https://arxiv.org/abs/1908.11512

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β06-12-2022 10:03 PM - edited β06-12-2022 10:05 PM

Embedding dimensions and the iteration vector length and magnitude of individual elements in the iteration vector make a difference between keeping similar nodes (ones that are close by in the graph) closer in the embedding space and dissimilar nodes (ones that are few hops away and not in the vicinity of the target node) farther than the similar nodes.

Nodes 2022

All the sessions of the conference are now available online