_
I import a .csv file into Neo4j. And there are three columns there: Company_Name, legal_address, and physical_address. And the above picture shows the relationship of these three columns.
My question is: if the legal_address and the physical _address is same for one company, is it possible to treat these two address nodes as one node for that company? (Like below)
To merge your Address and Address_2 nodes were they have the same value,
MATCH (a:Address)<-[:physically_located_at]-(c:Company)-[:legally_located_at]->(b:Address_2)
WHERE a.address = b.address
WITH [a,b] as addresses
CALL apoc.refactor.mergeNodes(addresses) yield node
RETURN node
Is there a reason you need both Address and Address_2 labels in the graph? If not, you may be able to avoid duplicating address nodes on ingest by using MERGE instead of CREATE.
Thanks for your response. I need both address_1 and address_2 labels is because, in my dataset, most of companies have different legal and physical address. So i am thinking for the company which has different legal and physical address, i want it to look like the picture 1 which i showed before, and for the company which has same physical and legal address, i want it to look like picture 2 which i showed before. Could i make it happen?
If a1 and a2 have the same details, only one address will be created, pointed to by both relationships from the company node. If they are different, two nodes will be created.
Put an index on the address node's details. This will speed up the merge when the second address node is added.
Thank you so much! it works for me. Could you take a look my code below? I have few more questions:
(1) i use "merge" instead of "create" for my code. is it okay?
(2) is there an easier way to handle empty value? i wan to treat the empty value as Null.
load csv with headers from "file:///company_person_sample.csv" as line
merge (v1:Company {Company_name:line.Company})
FOREACH (ignoreMe in CASE WHEN exists(line.physical_address) THEN [1] ELSE [] END |MERGE (v5:address {address:line.physical_address}) merge (v1)-[:physically_located_at ]->(v5))
FOREACH (ignoreMe in CASE WHEN exists(line.legal_address) THEN [1] ELSE [] END |MERGE (v6:address {address:line.legal_address}) merge (v1)-[:legally_located_at ]->(v6))
Yes kit will work for null values. Put this condition in the FORTEACH ands it will work.
FOREACH (ignoreMe in CASE WHEN line.physical_address is null THEN [1] ELSE [] END |MERGE
code......
)
FOREACH (ignoreMe in CASE WHEN line.physical_address is not null THEN [1] ELSE [] END |MERGE
code.......
)
Yep. But since the variable context within the FOREACH parenthesis is separate from the one outside it. This means that if you CREATE a node variable within a FOREACH , you will not be able to use it outside of the foreach statement, unless you match to find it.
So how should i deal with the case that if there are some empty cell for the field "Company"?
merge (v1:Company {Company_name:line.Company})
FOREACH (ignoreMe in CASE WHEN exists(line.physical_address) THEN [1] ELSE [] END |MERGE (v5:address {address:line.physical_address}) merge (v1)-[:physically_located_at ]->(v5))
You can use COALESCE (line.Company, 'NA') this will set null value as 'NA' if company name is null otherwise takes the correct value. You can give any value in place of NA
merge (v1:Company {Company_name:COALESCE(line.Company, 'NA')})
For null values: match (v1:Company {Company_name: "NA"}
if we use "COALESCE (line.Company, 'NA')", then all the companies which have empty value will have the same value "NA". When we use "match" later, all the nodes/relationships related to "NA" will connected together.. which means different addresses will be linked to the same node "NA", right?