Creating new property (from a different label name) and add property to existing nodes

I found this post (How do I use LOAD CSV to update/set properties of existing nodes - Knowledge Base), which would answer my question if the movies dataset being imported actually had a different label name than "movie."

So how can I import a csv with a different label name ("id") which contains the column "age", which I want to make properties of nodes ("source") I have already imported.

In this case, I create the source node from one csv file, and then I want to attach to that node the properties from another csv file (which has "id" which corresponds to "source" and "age," which is the property I want to add). It would be super easy to go into python and join the dataframes, or even change the "id" name in excel to "source", but that would break up the workflow and I'm just missing something (a function or concept) here.

This is what I have, but the ages are mostly 9s, so I'm not sure what is going wrong but certainly isn't setting properties as I expected.

LOAD CSV WITH HEADERS FROM "file:///sheep_edges.csv" AS row
MERGE (s:source {source: row.source})
MERGE (t:target {target:row.target})
MERGE (s) - [w:weight{prop:row.weight}] -> (t)
Return s,w,t;

//somehow I probably need to match "id" and "source"--this wouldn't require me to make a node of "id" would it? Since I won't be using that again.
LOAD CSV WITH HEADERS FROM "file:///sheep_age.csv" AS row
MATCH s:source
ON CREATE SET s.age = row.age
RETURN s;


I am a little confused of your situation. Can you provide a few lines of both csv files?

If you are trying to update the 'source' nodes, then you need some unique identifier to match against to get the node to update its age. Is the property 'source' for 'source' nodes a unique identifier?

Yes, here you go (source is equal here to id):

Assuming the first spreadsheet represents the age you want to set for each 'source' node uniquely identified using 'id', then the following query will find the source node and set its age.

Note, if 'source' and 'target' properties represent their unique identifier, it would be more obvious to use 'id' instead.

LOAD CSV WITH HEADERS FROM "file:///Book1.csv" AS row
MATCH (s:source{source:row.id})
SET s.age = row.age

Thanks for the response! This is helpful!

Where you write "Note, if 'source' and 'target' properties represent their unique identifier, it would be more obvious to use 'id' instead."

are you saying to just use id for both of them? I thought about that...in this case it's a dataset from a biologist looking at bighorn sheep who fought or were fought by another sheep, so I was thinking about both situations, and thought that maybe two separate nodes made sense to represent who began the fight, but maybe that is better represent by edge labels ("fought" and "fought by")

many times there are multiple ways to model data. The best way is the one that allows you to answer efficiently the questions you will need to answer from the data.

One data model could be to label them all 'Sheep' and give each a unique 'id' so you can query on that unique identifier. You would want to create an index on the Sheep(id) label/property pair to improve query performance. As you suggested, you can then use a relationship to indicate which sheep a particular sheep began fighting with. You could add properties to the relationship if needed, such as date of the fight, outcome of the fight, etc.

Okay, thanks!...so if I wanted to go with each sheep as a unique id, I wrote the following code, which gives me pretty much everything I want, except it makes 500 relationships, rather than the 250. It's this line that is the problem MERGE (d) - [n:was_fought{wins:row.weight}] -> (i) because if I leave out that line, I'll get the 250 relationships, and I can see incoming and outgoing arrows (but all the relationships are just defined as fought). So how do how do I need to change that line to create a "was_fought" relationship, if both "source" and "target" are defined as id?

Or maybe, this is just my mistake and there would need to be 500 relationships because inevitably, in a fight, there is a "fought" and a "was_fought"?

LOAD CSV WITH HEADERS FROM "file:///sheep_edges.csv" AS row
MERGE (i:id {id: row.source})
MERGE (d:id {id: row.target})
MERGE (i) - [f:fought{wins:row.weight}] -> (d)
MERGE (d) - [n:was_fought{wins:row.weight}] -> (i)
Return i,f,d;

LOAD CSV WITH HEADERS FROM "file:///age.csv" AS row
MERGE (i:id {id: row.source})
SET i.age = row.age
RETURN i;

I would use something other than 'id' for the label. The label is usually a noun, as it represents an entity. In your case, it could be 'Sheep'. Keep in mind, you can have multiple labels on a node, so you could have a general label, such as 'Animal', and a label specifying the type, such as, 'Sheep'.

Merge works by matching the pattern and creating the entire pattern if it does not exactly match. It will not create the parts that are represented by binding variables, as those are given. In your case, you have specified a property in the relationships, so if it does not find a 'was_fought' relationships with a wins property equal to row.weight, it will create that relationship between 'd' and 'i'.

Note, when merging, you want to include only the attributes that uniquely identify the node or relationship you are trying to match, as the merge's match will look for all the attributes included in the to be true, otherwise a match is not find and a new entity is created with those attributes and values. The way to set the remaining attributes outside the match criteria is to use any of the set options: 'set', 'on match set', and 'on create set'. You can use any combination of these to meet your needs.

If you are tracking each fight, then you need a relationships between the pair of sheep for each fight. The source could be the instigator and the target the victim. You could add a date/time property to the relationships, as well as the outcome, etc. If you do this, you will always be creating a relationships, so you would use 'create' instead of 'merge'. I don't think you need a relationships in both directions for each fight, if you define the source node as the instigator and the target as the victim (or the other way around, just be consistent). The relationship type should express that relationship. For example, (a)-[:STARTED_FIGHT_WITH]->(b) would easily be understood as 'a' started a fight with 'b'.

The best data model is the one that is easily understood and allows you to answer your questions through queries. You can always refactor your data as you learn more about your domain.

Thanks for this!

I don't entirely understand what you're saying below (in the context of our discussion).

Blockquote Merge works by matching the pattern and creating the entire pattern if it does not exactly match. It will not create the parts that are represented by binding variables, as those are given. In your case, you have specified a property in the relationships, so if it does not find a 'was_fought' relationships with a wins property equal to row.weight, it will create that relationship between 'd' and 'i'.

Below, you are saying use the minimal amount necessary to identify the node and then use "set" or other variations, to make sure I am not creating new nodes that I don't want only because there is a new attribute? (I thought that merge would just create that new attribute under that node, but I guess I'm wrong).

BlockquoteNote, when merging, you want to include only the attributes that uniquely identify the node or relationship you are trying to match, as the merge's match will look for all the attributes included in the to be true, otherwise a match is not find and a new entity is created with those attributes and values. The way to set the remaining attributes outside the match criteria is to use any of the set options: 'set', 'on match set', and 'on create set'. You can use any combination of these to meet your needs.

In this example you give below, there would be no way of creating this without using two separate nodes for each sheep (an instigator and a target) correct? And in this case, there would be 2 * 250 edges, right?

Blockquote If you are tracking each fight, then you need a relationships between the pair of sheep for each fight. The source could be the instigator and the target the victim. You could add a date/time property to the relationships, as well as the outcome, etc. If you do this, you will always be creating a relationships, so you would use 'create' instead of 'merge'. I don't think you need a relationships in both directions for each fight, if you define the source node as the instigator and the target as the victim (or the other way around, just be consistent). The relationship type should express that relationship. For example, (a)-[:STARTED_FIGHT_WITH]->(b) would easily be understood as 'a' started a fight with 'b'.

Yeah, in this particular case the data is merely to play with and learn more about the nuances of graph DBs and Neo4j, but point taken about learning about the domain.

The following references discusses how to use merge and what things to look for. It covers the ideas I reference, but it has examples and descriptions.

I don't really know you domain well enough to design a data model. I am just making suggestions to provoke thought. The best data model is the one that allows you to efficiently answer your questions about the data through queries. Data models can also evolve as you learn more our have requirement changes.

Some of these references may help: