Calculate similarity for Nodes in the same level and calculate similarity betweeen two sub-graph depths

In my graph I have nodes called Suite (6 nodes), Test (18 nodes) ,Keyword(600) which have relationships each other for example a Test calls a Keyword(sub-test meaning). I would like to find similarities of the same type of nodes and at the same time to find similarity of the length of the sub-graph depth for each node.
I initially began my investigation using the Node Similarity algorithm with this procedure call to create a virtual graph

CALL gds.graph.create(
    'myGraph1',
    ['Test', 'Keyword'],
    {
        CALLS: {
            type: 'CALLS'
            
        }
    }
);

and then

CALL gds.nodeSimilarity.stream('myGraph1')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS From, gds.util.asNode(node2).name AS To, similarity
ORDER BY similarity DESCENDING,From, To

with no result.

Also I tried the following:

CALL gds.graph.create.cypher(
    'my-cypher-graph_8',
    'MATCH (t:Test) RETURN id(t) AS id',
    'MATCH (t:Test)-[r:NEXT]->(k:Test) RETURN id(t) AS source, id(k) AS target'
)

giving as a result


When running the similarity algorithm, while it seems syntactically correct I get no result with the following:

CALL gds.nodeSimilarity.stream('my-cypher-graph_8')
YIELD node1, node2, similarity
RETURN *

Could you please if I can find the similariy of the same kind of nodes comparing also the sub-graph sequense and how can improve my queries?

I run neo4j 4.2.7 as container and use apoc plugin.

Thank you in advance.

It shouldn't return nothing - can you try the following to try to debug what's going on:

  1. What are the metrics returned when you run gds.graph.create - how many nodes and how many relationships are loaded? If it's 0 relationships, that's a sign that the data you loaded is incorrect.

  2. Try dropping the YIELD statement and just running: CALL gds.nodeSimilarity.stream('myGraph1')

  3. Run statistics mode - gds.nodeSimilarity.stats to see how many nodes are compared.

Off the top of my head, you may end up with no results due to the directionality of the relationships (node similarity is built for a bipartite graph, where you'd have (:Test)-[:CALLS]->(:Keyword), or

Thank you Alicia for your reply. I made the debug steps proposed but no good news.
Regarding bullet 1. It ssems that in gds.graph.createm, some nodes and relationships are loaded.Like in the following picture:

Regarding bullet 2: Even without yield statement I have "no changes no records" result
Regarding bullet 3:

I add another question that already posted to #cypher channel. Besides the above case(that I need to investigate too). I would like to ask if there is a way to compare graphs/subgraphs that don't share common nodes based on their properties in order to calculate similarities. Node similarity algorithms Similarity algorithms - Neo4j Graph Data Science seem to not match in my case as the graph nodes I want to compare don't share common nodes.

Probably the reason you're not getting any results is, then, due to the fact that your nodes don't have any similarity (set similarityCutoff to 0 to test the hypothesis). We calculate similarity between pairs of nodes based on the number of common neighbors (using Jaccard). If no nodes have common neighbors, then they're not similar.

We don't have anything out of the box to compare the similarity of entire graphs. You can use graph algorithms and compare, for example, average number of communities or average number of nodes per community, but we don't offer - for example - full graph embeddings, or graph isomorphism.

One option, if you don't have nodes with common neighbors, but you still want to look at similarity, is the Node2Vec embedding - which can encode structural similarity, as well as topological - in combination with KNN (cosine similarity). Check out this blog post: Bringing traditional ML to your Neo4j Graph with node2vec | Dave Voutila

1 Like

Indeed, Alicia setting similarityCutoff to 0 gives 0.0 similarity as a result. Thank you for your answer, I will check the possibilities proposed.