Matching and Merging data from multiple csv files

Hi!, Given that I have created nodes and relationships from the "example.csv" file as below. However, I have a new file, "other_names.csv", with the family_names in example.csv as column heads with a series of first_names under each column.

Any HINT/SUGGESTION on how to –> MATCH AND MERGE<– without individually creating nodes for each column, like row.Dickens , row.Wang etc. ? Thanks

"LOAD CSV WITH HEADERS FROM $path AS row "
"MERGE (f:family_name {name: row.family_name}) "
"MERGE (fn:first_name {name: row.first_name}) "
"MERGE (h:hometown {name: row.hometown}) "
"MERGE (r:race {name: row.race}) "
"MERGE (a:age {name: row.age}) "
"MERGE (o:occupation {name: row.occupation}) "
"MERGE (f)-[:first_name_is]->(fn) "
"MERGE (fn)-[:commonly_from]->(h) "
"MERGE (f)-[:often_associated_the_race]->(r) "
"MERGE (fn)-[:age_is]-(a) "
"MERGE (fn)-[:works_as]->(o) "

Reuben_0-1670427445609.png

#Neo4j #CSV #GraphDatabase

how do you want the new data related to the existing data?

Thanks @glilienfield

the first line should be this instead, right?

load csv from "file:///example.csv" as row

-

using this I get the semantic error:

Cannot merge the following node because of null property value for 'name': (:family_name {name: null})

So, I changed the range to :

---

load csv from "file:///trial/other_names.csv" as row

with row as headerRow

limit 1

load csv from "file:///trial/other_names.csv" as row

with headerRow, row

skip 1

unwind range(1,size(headerRow)-1) as index

merge(f:family_name{name:headerRow[index]})

merge(fn:first_name{name:row[index]})

merge (f)-[:first_name_is]->(fn)

I want to update this relationship "MERGE (f)-[:first_name_is]->(fn) " from the file other_names.csv

exa.png

Do you want to related each first name in each column to their corresponding family name node specified in the column header?

`that's correct. `those column headers have already been established as nodes in example.csv so i need to match and merge them from this new CSV file

Try this:

load csv from "file:///other_names.csv" as row
with row as headerRow
limit 1
load csv from "file:///other_names.csv" as row
with headerRow, row
skip 1
unwind range(0,size(headerRow)-1) as index
merge(f:family_name{name:headerRow[index]})
merge(fn:first_name{name:row[index]})
merge (f)-[:first_name_is]->(fn)

@glilienfield pardon me, how come when you paste your codes, they appear in a block form (image kind of)? any trick there? :grimacing:

yes...you need to paste the code in the a code block. First, click on the ellipse symbol in the tool bar (far left - see figure)

Next, click on the code block symbol '</>':

Alright, let me try, :sunglasses:

MATCH (f:flows)-[r]->(f2)
UNWIND keys({mat: 's', pro: 'p', feat: 'd'}) AS keyword
WITH f, f2, r, keyword
WHERE({mat: 's', pro: 'p', feat: 'd'})[keyword] <> '' AND type(r) = keyword AND f2.name CONTAINS {mat: 's', pro: 'p', feat: 'd'}[keyword]
WITH f.name as text, count(*) AS c ORDER BY c DESC, size(text) ASC
RETURN text

@glilienfield with reference to these files”—-Considering that I want to skip all the null values , what’s the best way to tackle it. I tried use the “is null approach” but failed to get it right.

You had change the starting index to one. Is there a column of row numbers in the first column? If so, this column does not have a header and would explain that null value error, which occurred for the family name (header values).

Are there null values in the row data?

The case I presented earlier was just a sample case to help do a knowledge transfer. The real data I am working on looks something like this (48 columns):

There are three approach I can think of for dealing with null values during import: 1) coalesce to set a default value, 2) call subquery testing for null condition, and 3) apoc.do.when

https://neo4j.com/labs/apoc/4.1/overview/apoc.do/apoc.do.when/

do you want to work on the actually query?

yeah, it worked. thanks @glilienfield