Matching issue with the 'mutual friend problem'

I am trying to collect all of the Person nodes that have one or more matching Personas. Think of Personas as identifiers like a driver license, passport number, ssn, etc.

For each collection of matching Person nodes, I want to create a node (SamePerson) with a [:SAME] relationship from all of the Person nodes in a collection to (SamePerson) like in the image below.

//Person
//UNWIND $rows AS row
UNWIND
[
{ dsrecordid: 1, personid: 1, personname: 'Lucas', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 2, personid: 2, personname: 'Lucas', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 3, personid: 3, personname: 'Franco', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 5, personid: 5, personname: 'Franco', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 4, personid: 4, personname: 'Lucas', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 6, personid: 6, personname: 'Lucas', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 7, personid: 7, personname: 'Franco', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 8, personid: 8, personname: 'Lucas', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 9, personid: 9, personname: 'Franco', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 10, personid: 10, personname: 'Ajesh', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 11, personid: 11, personname: 'Ajesh', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 12, personid: 12, personname: 'Ajesh', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 13, personid: 13, personname: 'Ajesh', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 14, personid: 14, personname: 'Bruce', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 15, personid: 15, personname: 'Bruce', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 16, personid: 16, personname: 'Bruce', dscreateddts: '2022-10-31T19:53:37.655337Z' }
] as row
WITH row
CREATE (n:Person {
dsRecordId: row.dsrecordid,
personId: row.personid,
personName: row.personname,
dsCreatedDts: row.dscreateddts})

//Persona
//UNWIND $rows AS row
UNWIND
[
{ dsrecordid: 1, personaid: 1, personid: 1, personatypeid: 7, personasourcevalue: '222', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 2, personaid: 2, personid: 2, personatypeid: 6, personasourcevalue: '321', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 3, personaid: 3, personid: 3, personatypeid: 6, personasourcevalue: '123', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 4, personaid: 4, personid: 4, personatypeid: 7, personasourcevalue: '222', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 5, personaid: 5, personid: 4, personatypeid: 6, personasourcevalue: '321', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 6, personaid: 6, personid: 5, personatypeid: 6, personasourcevalue: '123', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 7, personaid: 7, personid: 6, personatypeid: 6, personasourcevalue: '321', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 8, personaid: 8, personid: 7, personatypeid: 6, personasourcevalue: '123', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 9, personaid: 9, personid: 8, personatypeid: 6, personasourcevalue: '321', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 10, personaid: 10, personid: 9, personatypeid: 6, personasourcevalue: '123', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 11, personaid: 11, personid: 10, personatypeid: 6, personasourcevalue: '345', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 12, personaid: 12, personid: 10, personatypeid: 7, personasourcevalue: '333', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 13, personaid: 13, personid: 11, personatypeid: 7, personasourcevalue: '333', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 14, personaid: 14, personid: 12, personatypeid: 7, personasourcevalue: '333', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 15, personaid: 15, personid: 13, personatypeid: 6, personasourcevalue: '345', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 16, personaid: 16, personid: 14, personatypeid: 6, personasourcevalue: '678', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 17, personaid: 17, personid: 15, personatypeid: 6, personasourcevalue: '678', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 18, personaid: 18, personid: 16, personatypeid: 6, personasourcevalue: '678', dscreateddts: '2022-10-31T19:53:37.655337Z' },
{ dsrecordid: 19, personaid: 19, personid: 16, personatypeid: 7, personasourcevalue: '444', dscreateddts: '2022-10-31T19:53:37.655337Z' }
] as row
CREATE (n:Persona {
dsRecordId: row.dsrecordid,
personaId: row.personaid,
personId: row.personid,
personaTypeId: row.personatypeid,
personaSourceValue: row.personasourcevalue,
dsCreatedDts: row.dscreateddts})
MERGE (p:Person {personId: row.personid})
MERGE (p)-[:HAS_PERSONA]->(n)

I tried this query but it creates a SamePerson node for each combination of Person and Persona.

MATCH (p:Person)-[:HAS_PERSONA]->(pa:Persona)
WITH pa.personaTypeId AS typeId, pa.personaSourceValue AS sourceValue, COLLECT(DISTINCT p) AS people
MERGE (m:SamePerson { personaTypeId: typeId, personaSourceValue: sourceValue })
WITH m, people
UNWIND people AS person
MERGE (person)-[:SAME]->(m)
RETURN DISTINCT m;

hi, you need to first collect personas with same typeId,sourceValue
then collect all person attached to them

[update: do not use this suggestion that makes no sense - see next post for details]

MATCH (pa:Persona)
WITH DISTINCT pa.personaTypeId AS typeId, pa.personaSourceValue AS sourceValue, COLLECT(pa) AS personas
WITH typeId,sourceValue,personas
UNWIND personas as persona
MATCH (persona)<-[:HAS_PERSONA]-(p:Person)
WITH typeId,sourceValue,collect(p) as people
CREATE (m:SamePerson { personaTypeId: typeId, personaSourceValue: sourceValue })
WITH m,people
UNWIND people AS person
MERGE (person)-[:SAME]->(m)
RETURN DISTINCT m;

@v2belleville, I think your query gives the same result.

@bruce.macarthur, I am not sure I understand what you are looking for. Your example graph does not match your query's result. What is the grouping criteria? Both of the provided queries are grouping by pa.personaTypeId and pa.personaSourceValue. This does not seem what you are looking for, as the Person nodes with two personas would be related to two SamePerson nodes, but your graph shows patient Lucas with two personas, but only one SAME relationship to a SamePerson node.

Can you clarify the grouping criteira?

@glilienfield you are right ... + DISTINCT makes no sense since the collect() aggregation implies it and of course, in what bruce tried in the first place, collect() aggregates both over pa and p :woman_facepalming:

now also wondering about what the request is...

1 Like

First of all, thank you so much for your replies, all of you.

Secondly, I want to find all the Person nodes where ANY of their related Personas match on personaTypeId and PersonaSourceValue. So Gary has 5 Person nodes representing him from 5 different accounts and in each account he has 1 or more Personas for ssn or passport or driver license or ....
A Person with multiple Personas can act as a bridge to find matches and collect all the Person nodes that represents. Once I have done that, I want to create the SamePerson node to connect them all.

And if my response doesn't make sense, please let me know.

This image shows all the Person nodes for Lucas matched to different SamePerson nodes 222 and 321. This is because one Lucas Person has two different Personas. I want to use that as a bridge to collect all the Person nodes that match 1 or more available Personas, with that collection, I want to create one SamePerson node that will consolidate all the Person nodes.

What do you mean? Are you asking to have all five of the blue nodes being connected to one SamePerson node? If so, what would the properties of the SamePerson node be?

I truly don’t understand this data model. Do these specific collection of blue Person nodes actually represent the same person? If so, why have different nodes? Why not one node with properties for their personas? Is this some type of fraud detection where people share emails, driver licenses, etc? If so how about one node will relationships to the shared personas?

Sorry for my confusion.

The individual Person nodes represent different accounts coming from different data sources. I want the SamePerson node to have a unique identifier like a uuid but that's it. I am trying to keep this secure. Once I can validate all the Person nodes are consolidated to a SamePerson node, I can safely query them all as long as i can positively identify one of the Persons, after they are all connected to SamePerson of course.

I AM asking for all five blue nodes (Person) to be connected to that one SamePerson node. It would just have a uuid for a unique identifier. It exists to be a physical foreign key that links all the Person nodes.

That specific collection of blue Person nodes DO represent the same Person. One of those Person nodes has two Personas with different personaTypeId and personaSourceValue. So one could be a passport# and one could be ssn. Separately, those Personas can match with other Personas with the same personaTypeId AND personaSourceValue. The one (or more) Person with multiple Personas acts as a bridge to find and collect all the matching Personas.


This explains what I want to accomplish

What are you using to group these Person nodes into the same SamePerson node? This collection of Person nodes do not have a common pattern with the values of personaTypeId and personaSourceValue values.