Defining a hierarchy relationship with multiple paths and null fields

I have a hierarchy relationship file that represent a tree structure of taxonomy, I am having issue to upload file, create the nodes with defined relationships.

Here is my sample format of file

The graph I am trying to build is as below, how can I write the cypher script to upload this csv file with current structure and create the graph as below?

This CSV is actually not a good way to model hierarchical data because you need an extra column for every additional depth in your hierarchy. This isn't something that's easy to process or to maintain.

Your first 3 columns are fine (though you'll need to use APOC Procedures to add the dynamic labels from your first two columns).

Your 3rd column is fine, every entry needs its own ID.

But forget about the rest.

Instead, use only two additional columns.

One of these, "Name" can be the name value that you'll use for the name property.

The other one can be something like "ParentID", which will be the ID of the entry's parent.
So in this case, line 4 would have:
Label1:"DomainA", Label2:"Object", ID:"DA4", Name:"Desktop_keyboard", ParentID:"DA1"

This way the depth of your hierarchy can be arbitrary, and the depth is independent of the number of columns in the hierarchy, each line only has properties for a single object, and references the id of the object's parent.

With this CSV structure, loading becomes trivial, though it would be best to do two passes over the CSV, one to create all the nodes and a second to create the relationships (though you would want to create indexes and/or unique constraints first).

Using APOC Procedures, the first pass would be (after adding your csv to the import folder):

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line
CREATE (n:Node {ID:line.ID, Name:line.Name})
CALL apoc.create.addLabels(n, [line.Label1, line.Label2]) YIELD node
RETURN count(node)

We're using a :Node base label here because index lookup in the next query will require knowing the label / property combination, and this can't be dynamic.

Next you would create unique constraints and/or indexes so lookup by certain properties becomes fast.

CREATE CONSTRAINT ON (n:Node) ASSERT n.ID IS UNIQUE

And lastly you would do one more pass to connect the relationships:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line
MATCH (n:Node {ID:line.ID})
MATCH (parent:Node {ID:line.ParentID})
CREATE (n)-[:Belongs_To]->(parent)

You may also want to rethink your idea of using labels to represent at what depth an object is in the hierarchy, if you never plan on querying using those labels. It's usually enough to simply model the relationships and have a common label on all of the nodes.

Thanks a lot for such detail solution. I do need to query the nodes/labels. The full data set actually have three more layers.

I tried to run the above first importing script but got the error below:

I tried to add a line "WITH n" between CREATE and CALL, still got error message. Could you help take another look? Thank you!

Insert WITH n, parent above the CREATE statement and this should work.

Here is the script I ran:
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line
WITH n
CREATE (n:Node {ID:line.ID, Name:line.Name})
CALL apoc.create.addLabels(n,[line.Label1, line.Label2]) YIELD node
RETURN count(node)

Still same error message. If you could give further help, that'd be great. Thanks!

Actually this is the one I used:
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line
CREATE (n:Node {ID:line.ID, Name:line.Name})
WITH n
CALL apoc.create.addLabels(n,[line.Label1, line.Label2]) YIELD node
WITH node
RETURN count(node)

And now I get the error as:

Thank you!

It works now. Add WITH n, line

Thanks for your help!

Here is a solution using your data structure as depicted in your post. With that data file you can achieve the results as shown in your post.

I created a .csv file with data shown and used that file to import. As Andrew suggested you need to run two times: one time to create the nodes and another run to add the relationships.

  1. Create nodes:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:/wzhang.csv' AS line
WITH line

FOREACH(ignoreMe IN CASE WHEN line.Label2 = "Category" THEN [1] ELSE END |

MERGE (c:Category{id:line.ID, name: line.Group_L1, domain: line.Label1})

)

FOREACH(ignoreMe IN CASE WHEN line.Label2 = "Object" THEN [1] ELSE END |

MERGE (o:Object{id:line.ID, name: line.Group_L2, domain: line.Label1, category: line.Group_L1})

)

FOREACH(ignoreMe IN CASE WHEN line.Label2 = "Subobject" THEN [1] ELSE END |

MERGE (s:Subobject{id:line.ID, name: line.Group_L3, domain: line.Label1, object: line.Group_L2})

);

  1. Add relationships

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:/wzhang.csv' AS line
WITH line

FOREACH(ignoreMe IN CASE WHEN line.Group_L2 IS NOT NULL THEN [1] ELSE END |

FOREACH(ignoreMe IN CASE WHEN line.Group_L1 <> "Accessory" THEN [1] ELSE END |

MERGE (c:Category {name: line.Group_L1, domain: line.Label1})

MERGE (o:Object {name: line.Group_L2, domain: line.Label1 })

MERGE (o)-[:Belongs_To]->(c)

MERGE (s:Subobject {object: line.Group_L2, domain: line.Label1 })

MERGE (s)-[:Belongs_To]->(o)
)

FOREACH(ignoreMe IN CASE WHEN line.Group_L1 = "Accessory" THEN [1] ELSE END |

MERGE (c:Category {name: line.Group_L1, domain: line.Label1})

MERGE (o:Object {id: line.ID})

MERGE (o)-[:Belongs_To]->(c)

)

);

Final result:

wzhang

Thank you both! I restructured the my csv files and used the simpler script as Andrew suggested above, now it works perforce with my full data files WITH jut one issue, now all the nodes are showing sample color. I have thousands of nodes with different multiple labels, can I force the color for nodes with specific labels? For example, force the nodes with label of 'CATEGORY' as a fixed color, and nodes with label of "OBJECT" with another color?

Thanks again for your kind help!