De Dup with Neo4G

tal.shainfeld · August 14, 2020, 4:47am

Hi
I would like to solve some challenges I have on Customers De Dup.
I thought about Graph DB as a candidate for the challenge.
Found this reference but unfortunately, the full article is missing.

Does anyone have any information /Ideas about this issue?

Thanks
Tal

Joel · August 14, 2020, 9:12pm

I think you are looking for information on a topic of research within Natural Language Processing
check out the Neo4j chapter on NLP

and the sub section on APOC NLP

These procedures support entity extraction, key phrase extraction, sentiment analysis, and document classification.

APOC NLP

Other examples

http://www.nltk.org/book/ch07.html

tal.shainfeld · August 16, 2020, 9:25am

Hi Joel,
Sorry, No. I'm not looking for NLP. The Article i'm looking for is about de-duplication challenge. The link I added to an article dealing about it but I can't find the extension
Thanks Tal

Joel · August 19, 2020, 2:46pm

Hi Tal, I'll rephrase, de-duplicating names, is an NLP challenge, and there are NLP solutions built to solve it. Best of luck! Regards, Joel

tal.shainfeld · August 23, 2020, 10:59am

Hi Joel
On this article the issue is to build data structure that support matching AFTER the process:
meaning to deiced if combination of matches are bring to the same person: for example , day of birth, gender, address together are found the same - meaning it is the same person.
I'm looking for the article discuss that since when i open full article found 404 ...
Tx
Tal

bhoopalkiranappmajix · August 24, 2020, 5:41am

Mindmajix Neo4j onineTraining describes what a graph database is, how to install Neo4j, how to query graphs in Neo4j with a query language, Cypher, and how to add and manipulate data. All these topics are well covered in the training curriculum to help learners get better insight.

ameyasoft · August 24, 2020, 6:46am

If your situation is similar to the scenario mentioned in your referenced article, then this is same as in money laundering schemes.

In these scenarios, you can build a similarity relationships between the two customer names. For this use, Jaro-Winkler similarity.

To explain this in simple terms:

Consider two simple words: 'coronavirus' and 'cornivorus'. They both are of same length and contain same alphabets but rearranged differently in cornivorus.

Now we know the invisible connection between these two words!

Here comes the similarity:

with "coronavirus" as norm1, "cornivorus" as norm2
return toInteger(apoc.text.jaroWinklerDistance(norm1, norm2) * 100) as similarity

Result: 96

You can setup your own similarity limit to build a similarity relationship and this should help you to address de-duplication scenarios.

Topic		Replies	Views
Neo4j Live: Entity Resolution and Deduplication with Neo4j and GenAI Conferences, Meetups, & Events	0	156	March 1, 2024
Duplicate checking Cypher cypher	2	582	March 21, 2023
Customer De Duplication using Neo4j Projects & Collaboration	0	380	October 13, 2020
To merge similar named nodes in neo4j Cypher	1	1936	July 23, 2019
Matching near-duplicates? Cypher	2	278	May 3, 2021

Take the Course Then Join The Aura Agent Hackathon

De Dup with Neo4G

Related topics

Take the Course Then Join
The Aura Agent Hackathon