cancel
Showing results for 
Search instead for 
Did you mean: 

Datalineage with neo4j

I have csv file as shown below

columnname tablename databasename systemname
name emp db1 s1
location emp db1 s1
address emp db1 s1
name emp db2 s2
location emp db2 s2
address emp db2 s2
phonenumber emp db2 s2
email emp db2 s2

I've imported and created nodes and relationships for the above csv file in neo4j

CREATE CONSTRAINT ON (c:columnname) ASSERT c.columnname IS UNIQUE;
CREATE CONSTRAINT ON (c:tablename) ASSERT c.tablename IS UNIQUE;
CREATE CONSTRAINT ON (c:databasename) ASSERT c.databasename IS UNIQUE;
CREATE CONSTRAINT ON (c:systemname) ASSERT c.systemname IS UNIQUE;

LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line

MERGE (colunname:columnname {columnname: line.columnname})
MERGE (tablename:tablename {tablename:line.tablename})
MERGE (databasename: databasename {databasename:line.databasename})
MERGE (systemname: systmname {systemname:line.systemname})
MERGE (colunname)-[:iscolumnof]->(tablename)
MERGE (tablename)-[:istableof]->(databasename)
MERGE (databasename)-[:isdatabaseof ]->(systemname)

In the above the columnnames and tablenames are same but database and systemname are different.

2X_a_aa003ec51dc81941b972df5b72c10e692851ce01.png

In the image i attached. It shows the flow. But, we cannot visualize the complete dataflow.
For eg :- If we consider the column email. We cannot visualize to which database r system it refers.

How can we visualize the complete data flow?

5 REPLIES 5

ameyasoft
Graph Maven
Try this:

LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line

MERGE (colunname:columnname {columnname: line.columnname, tablename:line.tablename})
MERGE (tablename:tablename {tablename:line.tablename, databasename:line.databasename})
MERGE (databasename: databasename {databasename:line.databasename, system: line.systemname})
MERGE (systemname: systmname {systemname:line.systemname})

MERGE (colunname)<-[:iscolumnof]-(tablename)
MERGE (tablename)<-[:istableof]-(databasename)
MERGE (databasename)<-[:isdatabaseof ]-(systemname)

Result:

anthapu
Graph Fellow

Try this query

LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line

MERGE (systemname: systemname {systemname:line.systemname})
MERGE (systemname)<-[: isdatabaseof]-(databasename: databasename {databasename:line.databasename})
MERGE (databasename)<-[:istableof]-(tablename:tablename {tablename:line.tablename})
MERGE (ablename)<-[: iscolumnof]-(colunname:columnname {columnname: line.columnname})

This method makes sure DB name is distinct under System, Table name is distinct under DB etc..

This avoids the need to use the multi-key uniqueness requirement

Hi @anthapu / @ameyasoft,
Thanks for your quick response.
I'm working on one of the POC on DataLinage with Neo4j. We would like to review our model we've created. It would be great if we can connect based on your availability.

I can help you with your POC. You can communicate with me directly by sending email to ameyasoft@gmail.com
Thanks

Hi,

Sorry for the delay. I can help you with your POC. Send me an email to ameyasoft@gmail.com and will answer all your questions.
Thanks,
-Ameyasoft