Similarity Query using a string compare it to property on a node

peggyw · March 25, 2020, 9:27pm

I have a sequence string 'TTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCT'

I have nodes with the label Sequence and property seqFull which contains a large DNA String.

Want to return the nodes and the similarity score where the similarity score is greater the .75 (75%) where the input string finds a similar strings within a larger string on a Node in Neo4J

Not looking for exact match using the term CONTAINS but something like CONTAINS but not exact match but matches at 75% or greater

ameyasoft · March 26, 2020, 1:38am

You can use apoc.text.jaroWinklerDistance to get the similarity and this gives a much better similarity. 
I am using this in a production database for different purpose. Need to use APOC library.

Here is an example with two sequence strings that I got from internet:

with "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaaccattg" as seq1 ,     
"gaaccgccaatagacaacatatgtaacatatttaggatatacctcgaaaataataaaccg" as seq2
return toInteger(apoc.text.jaroWinklerDistance(seq1, seq2) * 100) as similarity

Result:
similarity: 78

with "gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg" as seq1,
"gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg" as seq2

return toInteger(apoc.text.jaroWinklerDistance(seq1, seq2) * 100) as similarity

Result
similarity: 80

peggyw · September 30, 2020, 9:36pm

Thank you - sorry been a long time to respond. Got on a new project but this is exactly what I am looking for

ameyasoft · October 1, 2020, 4:29am

Thanks for your appreciation. During my previous era I worked on biomembranes and surfactant-oil miscibility. By these studies, I developed lot of environment friendly solutions. THOSE WERE THE DAYS!! LIFE GOES ON..!

Now I am purely into Neo4j!
Let me know if you need any help and am very happy to help.

Thanks

Topic		Replies	Views
Text similarity Cypher	2	354	September 12, 2021
Text Similarity: Compare text property of one node to all other nodes and create relationship Cypher apoc , cypher , stored-procedures	2	1619	June 18, 2020
How to calculate similarity based on some properties in neo4j? Neo4j Graph Platform migrated	3	445	October 16, 2022
Fuzzy matching of node properties? Procedures & APOC	1	273	March 22, 2022
How to use Similarity ? GDS (2.6.5) in neo4j desktop (5.15.0) Graph Algorithms/Graph Data Science cypher	2	237	April 29, 2024

August 🏄 🏖️ 🏊

Similarity Query using a string compare it to property on a node

Related topics