cancel
Showing results for 
Search instead for 
Did you mean: 

Graph Algorithms Book Not Finding Similarity Algorithms

ameyasoft
Graph Maven

In Graph_Algorithms_Neo4j book, I am not finding the Similarity Algorithms like Jaccard topic in the Index page. Am I missing something? Please let me know.
Thanks

1 ACCEPTED SOLUTION

I don't believe these are covered in the book. I believe the explanation was that although they can be useful in a graph context, they aren't really graph algorithms (they work on vector inputs). This omission is mentioned in the Other Algorithms section:

Other Algorithms

Many algorithms can be used with graph data. In this book, we’ve focused on those that are most representative of classic graph algorithms and those of most use to application developers. Some algorithms, such as coloring and heuristics, have been omitted because they are either of more interest in academic cases or can be easily derived.

Other algorithms, such as edge-based community detection, are interesting but have yet to be implemented in Neo4j or Apache Spark. We expect the list of graph algo‐ rithms used in both platforms to increase as the use of graph analytics grows.

There are also categories of algorithms that are used with graphs but aren’t strictly graphy in nature. For example, we looked at a few algorithms used in the context of machine learning in Chapter 8. Another area of note is similarity algorithms, which are often applied to recommendations and link prediction. Similarity algorithms work out which nodes most resemble each other by using various methods to com‐ pare items like node attributes.

So there's not really anything special about Jaccard or other similarity algorithms in a graph vs any other usage, I think, you just match on two nodes and use property vectors for those nodes in the similarity algorithm. The only really graphy thing about it is the similarity relationships you would create between the nodes being compared (provided you want to do so...it would probably be best to only add those relationships when they're above a certain similarity threshold), and subsequent querying for similar nodes using the relationships that were added as a consequence of running the similarity algorithms previously.

Or, more succinctly:

  1. Establish how you want to pair off nodes for running the similarity calculation (cartesian product (excluding mirrored pairings), or filtered down in some other way?), and what properties you want to include for the similarity calculation.
  2. Figure out if you just want to stream the results, or actually write out the relationships based on the similarity calculations.
  3. Run the query that does that, and use the similarity algorithm of choice, with the approach of choice (to stream or not to stream, to write the rels or not the write).
  4. Use the results (or later, query for similar nodes based on the similar relationships you wrote to the graph)

More info in our graph algo documentation, and here's the section for Jaccard similarity

View solution in original post

2 REPLIES 2

I don't believe these are covered in the book. I believe the explanation was that although they can be useful in a graph context, they aren't really graph algorithms (they work on vector inputs). This omission is mentioned in the Other Algorithms section:

Other Algorithms

Many algorithms can be used with graph data. In this book, we’ve focused on those that are most representative of classic graph algorithms and those of most use to application developers. Some algorithms, such as coloring and heuristics, have been omitted because they are either of more interest in academic cases or can be easily derived.

Other algorithms, such as edge-based community detection, are interesting but have yet to be implemented in Neo4j or Apache Spark. We expect the list of graph algo‐ rithms used in both platforms to increase as the use of graph analytics grows.

There are also categories of algorithms that are used with graphs but aren’t strictly graphy in nature. For example, we looked at a few algorithms used in the context of machine learning in Chapter 8. Another area of note is similarity algorithms, which are often applied to recommendations and link prediction. Similarity algorithms work out which nodes most resemble each other by using various methods to com‐ pare items like node attributes.

So there's not really anything special about Jaccard or other similarity algorithms in a graph vs any other usage, I think, you just match on two nodes and use property vectors for those nodes in the similarity algorithm. The only really graphy thing about it is the similarity relationships you would create between the nodes being compared (provided you want to do so...it would probably be best to only add those relationships when they're above a certain similarity threshold), and subsequent querying for similar nodes using the relationships that were added as a consequence of running the similarity algorithms previously.

Or, more succinctly:

  1. Establish how you want to pair off nodes for running the similarity calculation (cartesian product (excluding mirrored pairings), or filtered down in some other way?), and what properties you want to include for the similarity calculation.
  2. Figure out if you just want to stream the results, or actually write out the relationships based on the similarity calculations.
  3. Run the query that does that, and use the similarity algorithm of choice, with the approach of choice (to stream or not to stream, to write the rels or not the write).
  4. Use the results (or later, query for similar nodes based on the similar relationships you wrote to the graph)

More info in our graph algo documentation, and here's the section for Jaccard similarity

Thanks for your reply.