Parsing date and time separately when loading CSV file

ahmedfazal405 · June 18, 2020, 8:19am

Hey guys. I am using the following code to load a csv file and parse the data abnd time columns.

CALL apoc.periodic.iterate(

'CALL apoc.load.csv("file:///newfile.csv") yield map as row'
,
'MERGE (s:Sender{from_send:row.From}) 
MERGE (r:Receiver{to_send:row.To}) 
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}), 
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})), 
subject:row.Subject, message_id : row.MessageID}]->(r)'
,
{batchSize:10000, iterateList:True, parallel:false}
)

but Im getting a syntax error,I can't figure out what Im doing wrong. Can anyone please help out. Thanks

NOTE, the "Date" column is in the format "08/06/2020"
while the "Time"column is in the form "23:59:52"

Ch3xMat3 · June 19, 2020, 7:13pm

Initial thoughts when looking at your script you are running is you need to use different quotes for your strings in the script and the quotes that are around your entire script. I would suggest changing the quotes that are around the entire cypher script to double quotes and the ones around the strings to single quotes (I like using double quotes around my whole script, you could do it the other way if you prefer). This changes the script to look like this...

CALL apoc.periodic.iterate(
"CALL apoc.load.csv('file:///newfile.csv') yield map as row",
"MERGE (s:Sender{from_send:row.From})
MERGE (r:Receiver{to_send:row.To})
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}),
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})),
subject:row.Subject, message_id : row.MessageID}]->(r)",
{batchSize:10000, iterateList:True, parallel:false}
)

Let me know if this does not fix the issue.

ahmedfazal405 · June 23, 2020, 10:36am

Thank you for the answer. The query seems to load and throws no errors, but the execution is taking too long. It still hadn't completed in 4+ hours. The previous file (300k rows) completed in about an hour. This file however contains (700k rows). Can you also guide as to what else should I do? I have already increased the heap size and reduced the batch size further.

Cobra · June 23, 2020, 3:48pm

Hello @ahmedfazal405

I think your query is taking time because of the date and time conversion. Did you try to load without the parsing?

Regards,
Cobra

Ch3xMat3 · June 23, 2020, 5:53pm

@Cobra I was wondering about the parsing of the date and time. Do you have another way to get this data properly from the csv to the database as a datetime without the parsing?

ahmedfazal405 · June 23, 2020, 6:54pm

Hey @Cobra

I did load the previous csv file (300k rows) without parsing the date and time. But I require queries which would be able to handle the date, time columns in their appropriate format and not as "Strings". Is there another way around if this if taking too much time. Can these columns be parsed later on?

ameyasoft · June 23, 2020, 7:22pm

If possible, please send me the actual value for the date as in your .csv file.

Cobra · June 23, 2020, 8:11pm

To be honest, I avoid to format my data in Cypher, I always format the data in Python and after I laod them

You can do this in a few seconds and a few lines of code in Python

Topic		Replies	Views
Import csv format issue: convert str date to datetime format Import / Export	2	143	June 17, 2024
Loading csv with date fields and default values Cypher apoc , cypher , import	3	92	November 14, 2024
How to turn string Date/Time into something usable Newbie Questions	9	831	January 20, 2020
How to read date time value not separated by T? Cypher apoc	2	940	January 5, 2019
Transforming data format when using LOAD CSV Desktop load-csv	4	3555	November 27, 2021

July Summer Fun!

Parsing date and time separately when loading CSV file

Related topics