Hello everyone!
I’m looking for a solution regarding my phd research where i need to do some data wrangling, ie. transform my dataset into format suitable for graph/network analysis or to be more precise - make relationship from the data based on the list of existing nodes and their temporal properties.
Anyways, I’ve written a cypher query where I’m loading my csv into a Neo4j DB and simultaneously updating “EndOfExport” column - whenever “X” is found, today’s date is inserted. Instead of today, i would like to know how to get the last day of the current month?
LOAD CSV WITH HEADERS FROM "file:///Test - input.csv" AS csvLine FIELDTERMINATOR ';'
CREATE (e:Country {CountryID: csvLine.CountryID, StartOfExport: date(csvLine.StartOfExport), EndOfExport: date(CASE csvLine.EndOfExport WHEN "X" THEN date() ELSE csvLine.EndOfExport END)})
Furthermore, the main question lies in the following: i want to create relationships between all the nodes (countries) who traded in the same period in time.
That means that the following node list input
CountryID
StartOfExport
EndOfExport
55
2008-10-16
2014-01-31
47
2010-04-19
2014-09-15
73
2010-08-09
2022-07-18
61
2010-08-10
2013-04-30
should result in the following relationships:
CountryID_01
CountryID_02
StartOfExport
EndOfExport
55
47
2010-04-19
2014-01-31
55
73
2010-08-09
2014-01-31
55
61
2010-08-10
2013-04-30
47
73
2010-08-09
2014-09-15
47
61
2010-08-10
2013-04-30
73
61
2010-08-10
2013-04-30
That means that i need to do row by row comparison in the loop until the end of the table. Each iteration should start form the next row and make comparison with the each row until the end of table (top-bottom).
Comparison is based on the CountryID and i’m comparing
1.) StartOfExport of CountryID in the row X with the StartOfExport
of CountryID in the row X+1 where i need to take out the larger date;
2.) EndOfExport of CountryID in the row X with the EndOfExport
of CountryID in the row X+1 where i need to take out the smaller date;
The question is: should i do this kind of data wrangling before loading the data into Neo4j? In that case i would load separately both the node and relationship lists. Alternatively, is something like that possible to do with Cypher/APOC procedures and only while using the mentioned node list?
The ultimate goal would be to calculate centrality metrics in GDS for each year separately.
I’m working on a short notice so any advice/workflow would be very much appreciated!