Data modelling in Neo4j

Hi all,

I started with Neo4j by creating some sample graphs on my own and trying out the various cypher statements to understand Neo4j better.
I used Neo4j ETL tool to import data from other databases to Neo4j,but i wanted to know if the data modelling can be automated when data in csv files are imported using LOAD CSV(i.e., to check for common properties between them by their name or its data).
I also wanted to know if any user-defined procedures can be created to overcome the Data modelling issue.

Thanks in advance

Recently, I have had to import some data from a poorly designed schema. I had a CSV file where the rows were duplicated EXCEPT one column (Mixed) was alternating between strings and floats.

With Cypher, I had a conditional so that if the string in Mixed started with a digit, then do a MERGE on the row data with a separate property name aFloat for a numeric value and using toFloat() to convert the string to a numeric. If the string didn't start with a digit, then do a MERGE, but with a different property name for the string.

In another case, I had a CSV file, but one column was an Action. The actions were add, remove, and change. So, when the action was add, I created a new Node with the data from the rest of the row. When the action was remove, I matched and then removed the match. When it was change, I changed some of the properties of an existing node (after doing a match.)

As I was a newbie, I did it in three passes instead of doing a conditional.

The other thing I do, is make a dry run where I assemble the data and RETURN the data to make sure I got it right before actually running it. So, I'd do this first:

LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row FIELDTERMINATOR ','
WITH row.Action AS action, row.Category AS catName
WHERE action = 'Remove'
MATCH (c:Category {Name:catName})
// DETACH DELETE (c)
RETURN action, c.Name, c.otherProperty  // return info about all the nodes to be deleted.
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row FIELDTERMINATOR ','
WITH row.Action AS action, row.Category AS catName
WHERE action = 'Remove'
MATCH (c:Category {Name:catName})
DETACH DELETE (c) // do the actual DELETE
// RETURN action, c.Name, c.otherProperty

If you're situation is really complicated, you could always use Python (or any other language that there is a driver for) to massage the data before calling executing the Cypher code you want.

I hope that helps.

Hi @tharun270297, can you please explain " if the data modelling can be automated when data in csv files are imported using LOAD CSV".

Data Model is a predefined Schema, created and proposed by Data Architects/ Modelers. Once designed and implemented the structures are shared with the development team for coding.

Hi @dominicvivek06,

My objective is to establish a relationship between a node label(existing data in Neo4j) and the new data that is imported from a csv.

This relationship has to be automatically created after the import by comparing the existing node properties irrespective of labels with the properties of newly created labels.

The comparison may be done on the node property names or the node property values within them.But all this has to be done automatically after inserting the new data without manually having the relationships created using Cypher query.

Any info regarding this will be helpful.

Hi @tharun270297, I guess you were mentioning about Data Loading.

I don't know your data, so I created a very simple demo. This may vary from your use case.

emp.csv

emp_id,emp_name
1,Dominic
2,Vivek
3,Ravi
4,Rajesh

dept.csv

emp_id,dept_name
4,IT
3,Finance
1,HR
2,Admin
match (n) detach delete(n);


//Below query loads master golden records of employee table.
:auto
USING PERIODIC COMMIT 10 LOAD CSV WITH HEADERS FROM "file:///emp.csv" AS line FIELDTERMINATOR ','
MERGE (e:employee{emp_id:line.emp_id,name:line.emp_name})
return e;

//Below query loads departments and then creates a relationship with existing employee id.
:auto
USING PERIODIC COMMIT 10 LOAD CSV WITH HEADERS FROM "file:///dept.csv" AS line FIELDTERMINATOR ','
MERGE (d:department{dept_name:line.dept_name})
WITH d,line
MATCH (e:employee {emp_id:line.emp_id})
MERGE (d)-[r:has_route]->(e)
return d,r,e;

Neo4j WITH clause -

The loader script can be one script also, its all based on the your dataset. If you can share the column header, I can further assist you.

Hi @dominicvivek06,

Thanks for your example.

In the above sample data,the employee node labels have properties emp_id and emp_name.
During the loading of dept.csv file,we know that there is a common column emp_id in both the files,so we can specify the particular property and establish a relationship between the two node labels.

But my scenario is,when i load the dept.csv file,it automatically has to capture the common column 'emp_id' without myself manually providing the relationship using Cypher.(i.e.,when i just load the dept.csv into Neo4j,there should be a relationship established between employee and department labels as they have a common property called 'emp_id')

And the other expectation is even if the property name in Department table is different (i.e., 'id' instead of 'emp_id')the relationship should be formed by comparing the data within the node properties since the id property in both labels contain same kind of data.

I find it difficult to explain.But if you understand what i meant and have any suggestion for this,i would be happy to get a reply.

Thanks in advance.

Without the header or full context of the data dictionary, its difficult to answer your question. If you want we can have a chat through my skype dominicvivek06.