How to treat the nodes with different labels but same property as same node?

Hi,


_
I import a .csv file into Neo4j. And there are three columns there: Company_Name, legal_address, and physical_address. And the above picture shows the relationship of these three columns.

My question is: if the legal_address and the physical _address is same for one company, is it possible to treat these two address nodes as one node for that company? (Like below)

To merge your Address and Address_2 nodes were they have the same value,

MATCH (a:Address)<-[:physically_located_at]-(c:Company)-[:legally_located_at]->(b:Address_2)
WHERE a.address = b.address
WITH [a,b] as addresses
CALL apoc.refactor.mergeNodes(addresses) yield node
RETURN node

Is there a reason you need both Address and Address_2 labels in the graph? If not, you may be able to avoid duplicating address nodes on ingest by using MERGE instead of CREATE.

Hi ss,

Thanks for your response. I need both address_1 and address_2 labels is because, in my dataset, most of companies have different legal and physical address. So i am thinking for the company which has different legal and physical address, i want it to look like the picture 1 which i showed before, and for the company which has same physical and legal address, i want it to look like picture 2 which i showed before. Could i make it happen?

Thanks,
Sharon

Hi Sharon,

Yes, it is possible. Something along these lines should do the job:

CREATE (c:Company)
CREATE (a1:Address {<address_details>})
MERGE (a2:Address {<address_details})
CREATE (c)-[:physically_located_at]->(a1)
CREATE (c)-[:legally_located_at]->(a2)

If a1 and a2 have the same details, only one address will be created, pointed to by both relationships from the company node. If they are different, two nodes will be created.

Put an index on the address node's details. This will speed up the merge when the second address node is added.

1 Like

Yes, your approach is good and it works. It worked for me in one of money laundering scenario. Point both relations to the appropriate address node.

Thank you so much! it works for me. Could you take a look my code below? I have few more questions:
(1) i use "merge" instead of "create" for my code. is it okay?
(2) is there an easier way to handle empty value? i wan to treat the empty value as Null.

load csv with headers from "file:///company_person_sample.csv" as line

merge (v1:Company {Company_name:line.Company})

FOREACH (ignoreMe in CASE WHEN exists(line.physical_address) THEN [1] ELSE [] END |MERGE (v5:address {address:line.physical_address}) merge (v1)-[:physically_located_at ]->(v5))

FOREACH (ignoreMe in CASE WHEN exists(line.legal_address) THEN [1] ELSE [] END |MERGE (v6:address {address:line.legal_address}) merge (v1)-[:legally_located_at ]->(v6))

Thanks,
Sharon

Yep. Thanks. I just try and it works. I am just checking if these is a easier way to treat empty cell as Null in Neo4j.

Yes kit will work for null values. Put this condition in the FORTEACH ands it will work.

FOREACH (ignoreMe in CASE WHEN line.physical_address is null THEN [1] ELSE [] END |MERGE 
code......
)

FOREACH (ignoreMe in CASE WHEN line.physical_address is not null THEN [1] ELSE [] END |MERGE 
code.......
)

Hi ameyasoft,

Yep. But since the variable context within the FOREACH parenthesis is separate from the one outside it. This means that if you CREATE a node variable within a FOREACH , you will not be able to use it outside of the foreach statement, unless you match to find it.

So how should i deal with the case that if there are some empty cell for the field "Company"?

merge (v1:Company {Company_name:line.Company})

FOREACH (ignoreMe in CASE WHEN exists(line.physical_address) THEN [1] ELSE [] END |MERGE (v5:address {address:line.physical_address}) merge (v1)-[:physically_located_at ]->(v5))

Thanks,
Sharon

You can use COALESCE (line.Company, 'NA') this will set null value as 'NA' if company name is null otherwise takes the correct value. You can give any value in place of NA

merge (v1:Company {Company_name:COALESCE(line.Company, 'NA')})
For null values: match (v1:Company {Company_name: "NA"}

Thank you! I will try it.

Best wishes,
Sharon

Hi ameyasoft,

if we use "COALESCE (line.Company, 'NA')", then all the companies which have empty value will have the same value "NA". When we use "match" later, all the nodes/relationships related to "NA" will connected together.. which means different addresses will be linked to the same node "NA", right?

Thanks,
Sharon

Yes, that's correct. Let me know if this fulfills your objective.