Jaccard in Alpha forever

So I was a big fan of Neo4j v 3.4.12 and the state of graph algorithms there; in particular Jaccard Similarity. Ever since upgrading, the gds library has had Jaccard in Alpha status.

Question for Alicia and her team: Are you moving everyone over to Node Similarity? And Jaccard will eventually be deprecated?


Check this link especially recent update by abk.

I think we've found that the node similarity procedure solved better solved the problem that users had than the Jaccard one.

With Jaccard you had to build up lists of arrays before computing, whereas with node similarity it computes it based on the graph structure. And the majority of users can solve their problems with node similarity.

Do you have some old code that uses Jaccard n Graph Algos and you're trying to translate it to GDS? Perhaps I can help you translate it if you share the query.

1 Like

We do consider Jaccard part of the algorithms library, but as you correctly guess, and @markhneedham explained - for performance at scale, Node Similarity or KNN are better choices than Jaccard or Cosine similarity. Node similarity uses the jaccard similarity metric, but it leverages neighboring nodes instead of properties or lists.

We don't have any plans to deprecate or remove Jaccard, but we're not currently working on promoting it to the beta tier. Do you have a use case that can't be addressed with Node Similarity?


Mark: Here is the query. Determining similarity via Jaccard of "Song" nodes....

MATCH (Guest{member:'Purple'})-[:PLAYS_SONG]->(t:Song)
WITH {item:id(t), categories: collect(id(Guest))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {similarityCutoff:0.2, write:true, writeRelationshipType:'SIMILAR_PURPLE', writeProperty:'score_purple'})
YIELD nodes, similarityPairs, stdDev, p25, p50, p75, p90, p95
RETURN nodes, similarityPairs, stdDev, p25, p50, p75, p90, p95

Also, since now i will be using gds 1.1.1 and using graph projections, I don't seem to be able to create a graph projection where i reduce the size of the projection filtering on node properties. I see the template below, but i seem unable to

...continuing. using the template below, i seem unable to filter the projection based upon node properties:

CALL gds.graph.create(
    'my-graph', {
        City: {
            properties: {
                stateId: {
                    property: 'stateId' SAY WHERE 'STATEID' = "CA"
                population: {
                    property: 'population'
YIELD graphName, nodeCount, relationshipCount;

So for your first query, you can compute the similarity between all guests like this:

CALL gds.nodeSimilarity.stream({
  nodeProjection: ['Guest', 'Song'],
  relationshipProjection: "PLAYS_SONG"
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1) AS node1,
       gds.util.asNode(node2) AS node2,

And then if you wanted to create a graph first it'd look like this instead:

Create graph:

CALL gds.graph.create(
    ['Guest', 'Song'],

Run algorithm:

CALL gds.nodeSimilarity.stream("myGraph")
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1) AS node1,
       gds.util.asNode(node2) AS node2,

Thank you very much Mark. One other issue:
In our graph model, there exists > 2.6 million Guests. To reduce the size of a graph projection and reduce the memory footprint, we wish to filter the Guests based on a particular node property (call it 'tier'). I have been reviewing the documentation and the only possible method is to use a parameter identifying the value of 'tier' that we wish to filter? Is this correct or is there another way?
Thank you again.