Calculate Jaccard similarity

abhik1368 · November 16, 2022, 12:49am

I have a file with NODE IDs and a property called MACCS with 0 and 1. I want to calculate jaccard similarity . What is the efficient way to do it ? I have attached the file linke here . I want to load the file , query i am using is gph_conn is the connection. Any

gph_conn.query("""
// USING PERIODIC COMMIT 100
LOAD CSV WITH HEADERS FROM 'file:///D:/Github/dbtest.csv' AS row
UNWIND SPLIT(row.MACCS, ',') AS i
CREATE (m:Mol {DrugBank_ID: row.DrugBank_ID,
MACCS:toInteger(i)
}
)
""")

Then i want to call the gds.similarity.jaccard to perform similarity between one node to rest of the other nodes . Below doesn't work becasue of format of the

MATCH (n1:Mol {DrugBank_ID: 'DB00146'})
WITH n1, collect(n1:MACCS) AS fp1
MATCH (n2:Mol)
WITH n2, collect(n2:MACCS) as fp2
RETURN n1,n2,
gds.similarity.jaccard(toIntegerList(n1.ECFP4), toIntegerList(n2.ECFP4)) AS jaccard;

Above should retuirn similarity values. Is there is a way to calculate similarity faster with indexes ?I want to do 10 million rows .

abhik1368 · November 18, 2022, 5:28pm

##The correct query is below
MATCH (n1:Mol {DrugBank_ID: 'DB00146'})
WITH n1, collect(n1:MACCS) AS fp1
MATCH (n2:Mol)
WITH n2, collect(n2:MACCS) as fp2
RETURN n1,n2,
gds.similarity.jaccard(toIntegerList(n1.MACCS), toIntegerList(n2.MACCS)) AS jaccard;

glilienfield · November 18, 2022, 8:41pm

There doesn’t seem a need to collect fp1 and f p2, since they are not used and they should be empty

Topic		Replies	Views
How to use Jaccard similarity algorithm in neo4j to find the similar nodes Procedures & APOC cypher	17	4337	January 17, 2019
Comparing Jaccard Similarity (Neo4J 3.4) to Node Similarity on Neo4j 3.5 and GDS 1.1.1 Graph Algorithms/Graph Data Science	8	630	April 22, 2021
Does anyone know where Jaccard similarity algorithm is? Neo4j Graph Platform	5	306	October 16, 2021
How to calculate multiple similarity scores and sort them based on the sum of similarity scores? Neo4j Graph Platform migrated , cypher-tagged , jaccard	5	178	January 27, 2023
Algo.similarity.jaccard.stream takes more than 3 minutes Graph Algorithms/Graph Data Science	14	2343	February 19, 2019

Get Certified in June!

Calculate Jaccard similarity

Related topics