I have a graph database which will be populated with nodes containing text messages. Every time a node is saved, I need to calculate the similarity with respect to other nodes. the similarity metric can be any of these [https://neo4j.com/docs/labs/apoc/current/misc/text-functions/#text-functions-text-similarity] available within APOC. When the similarity is more than (say) 0.5, the query should establish a relationship SIMILAR_TO among those nodes compared.
My graph looks kind of like this:
As of now, this is a learning project/PoC.
I am looking for a cypher query or a stored procedure.
Can someone give me pointers on how to structure the query and anything else I must know before doing this?
I am aware that the complexity will increase exponentially as the nodes increase. But for now, I am not worrying about that.
I am using Neo4j version: 4.0.3 and python driver to create nodes.
Thanks.
You can just when you create your node, after insertion do the comparision and create the relationship.
CREATE (m:Message {...})
MATCH (o:Message)
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)
1 Like
Thank you Micheal.
This is the error I was getting.
Neo.ClientError.Statement.SyntaxError
WITH is required between CREATE and MATCH (line 2, column 1 (offset: 48)) "MATCH (o:Message)"
I added a WITH
CREATE (m:Message {...})
WITH m #Edit
MATCH (o:Message)
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)
This creates a relationship of the created node with itself too.
So then I matched the node first and then created the node like this:
MATCH (o:Message)
WITH o
CREATE (m:Message {...})
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)
But this creates 2 additional nodes which I don't seem to get how that would happen.