cancel
Showing results for 
Search instead for 
Did you mean: 

Text Similarity: Compare text property of one node to all other nodes and create relationship

sagarhowal
Node Link

I have a graph database which will be populated with nodes containing text messages. Every time a node is saved, I need to calculate the similarity with respect to other nodes. the similarity metric can be any of these [https://neo4j.com/docs/labs/apoc/current/misc/text-functions/#text-functions-text-similarity] available within APOC. When the similarity is more than (say) 0.5, the query should establish a relationship SIMILAR_TO among those nodes compared.

My graph looks kind of like this:

As of now, this is a learning project/PoC.
I am looking for a cypher query or a stored procedure.
Can someone give me pointers on how to structure the query and anything else I must know before doing this?

I am aware that the complexity will increase exponentially as the nodes increase. But for now, I am not worrying about that.

I am using Neo4j version: 4.0.3 and python driver to create nodes.

Thanks.

2 REPLIES 2

You can just when you create your node, after insertion do the comparision and create the relationship.

CREATE (m:Message {...})
MATCH (o:Message) 
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

sagarhowal
Node Link

Thank you Micheal.

This is the error I was getting.

Neo.ClientError.Statement.SyntaxError

WITH is required between CREATE and MATCH (line 2, column 1 (offset: 48)) "MATCH (o:Message)"

I added a WITH

CREATE (m:Message {...})
WITH m #Edit
MATCH (o:Message) 
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

This creates a relationship of the created node with itself too.

So then I matched the node first and then created the node like this:

MATCH (o:Message) 
WITH o
CREATE (m:Message {...})
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

But this creates 2 additional nodes which I don't seem to get how that would happen.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.