De Dup with Neo4G

Hi
I would like to solve some challenges I have on Customers De Dup.
I thought about Graph DB as a candidate for the challenge.
Found this reference but unfortunately, the full article is missing.

Does anyone have any information /Ideas about this issue?

Thanks
Tal

I think you are looking for information on a topic of research within Natural Language Processing
check out the Neo4j chapter on NLP


and the sub section on APOC NLP

These procedures support entity extraction, key phrase extraction, sentiment analysis, and document classification.

APOC NLP

Other examples


http://www.nltk.org/book/ch07.html

Hi Joel,
Sorry, No. I'm not looking for NLP. The Article i'm looking for is about de-duplication challenge. The link I added to an article dealing about it but I can't find the extension
Thanks Tal

Hi Tal, I'll rephrase, de-duplicating names, is an NLP challenge, and there are NLP solutions built to solve it. Best of luck! Regards, Joel

Hi Joel
On this article the issue is to build data structure that support matching AFTER the process:
meaning to deiced if combination of matches are bring to the same person: for example , day of birth, gender, address together are found the same - meaning it is the same person.
I'm looking for the article discuss that since when i open full article found 404 ...
Tx
Tal

Mindmajix Neo4j onineTraining describes what a graph database is, how to install Neo4j, how to query graphs in Neo4j with a query language, Cypher, and how to add and manipulate data. All these topics are well covered in the training curriculum to help learners get better insight.

If your situation is similar to the scenario mentioned in your referenced article, then this is same as in money laundering schemes.

In these scenarios, you can build a similarity relationships between the two customer names. For this use, Jaro-Winkler similarity.

To explain this in simple terms:

Consider two simple words: 'coronavirus' and 'cornivorus'. They both are of same length and contain same alphabets but rearranged differently in cornivorus.

Now we know the invisible connection between these two words!

Here comes the similarity:

with "coronavirus" as norm1, "cornivorus" as norm2
return toInteger(apoc.text.jaroWinklerDistance(norm1, norm2) * 100) as similarity

Result: 96

You can setup your own similarity limit to build a similarity relationship and this should help you to address de-duplication scenarios.