I used the following command on neo4j desktop, my csv file has approximately 220000 records and the process take really long and never finish
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from "file:///file_name.csv" AS line
WITH line.vin AS carid, line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
Merge (c:Car {carID: carid})
Merge (e:Event {eventType: event, start_time:st, end_time:et, probe_id:sid})
Create (c)-[:takePlace]->(e)
CREATE INDEX index_car_id IF NOT EXISTS FOR (n:Car) ON (n.carID)
CREATE INDEX index_event_id IF NOT EXISTS FOR (n:Event) ON (n.probe_id)
Then, you load Car nodes:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH DISTINCT line.vin AS carid
MERGE (c:Car {carID: carid})
Then, you load Event nodes:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
MERGE (e:Event {eventType: event, start_time: st, end_time: et, probe_id: sid})
Then, you load relations:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.vin AS carid, line.sub_id as sid
MATCH (c:Car {carID: carid})
MATCH (e:Event {probe_id: sid})
CREATE (c)-[:takePlace]->(e)
Hi Cobra, Thanks for your advice. The first step is fast and now I'm running the second one (loading event) and it runs for hours. I think it would be the same error for the second step if there is no exception
Did you create the UNIQUE CONSTRAINT for Event nodes?
What is the RAM allocated to Neo4j?
Try this:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
MERGE (e:Event {probe_id: sid})
ON CREATE SET eventType = event, start_time = st, end_time = et,
Not really, the event node is hard to find an unique constrain in my case, but the car do have.
The memory allocation setting is
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=10G
dbms.memory.pagecache.size=2G
CREATE INDEX index_event_id IF NOT EXISTS FOR (n:Event) ON (n.eventID)
Then, load Event nodes:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event AS event, line.sub_id AS sid, toString(line.event) + "_" + toString(line.start_time) + "_" + toString(line.end_time) + "_" + toString(line.sub_id) AS eventID
MERGE (e:Event {eventID: eventID})
ON CREATE SET eventType = event, start_time = st, end_time = et, probe_id = sid
Finally, load relations:
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.vin AS carID, toString(line.event) + "_" + toString(line.start_time) + "_" + toString(line.end_time) + "_" + toString(line.sub_id) AS eventID
MATCH (c:Car {carID: carID})
MATCH (e:Event {eventID: eventID})
CREATE (c)-[:takePlace]->(e)
That works, thanks Cobra. May I ask a question about ordering the event by the start time? So, suppose I would retrive the event that next happen from another one
First you should cast start_time and end_time into datetime. Then create indexes for search performance on these both properties. Finally, in your queries, you will be available to compare these properties thanks to operators.
So the case might be I have a car and there is a sequence order by only start time (event 1, event 1, event2, event2, event1, event 2) something like this and what I would know how may transitions from event 1 to event2, something like this. or I can retrive all the cars which contains transitions that event1 to event2. Are there any examples I could follow? Thanks a lot
I don't have examples right now but I'm sure you can find some on Internet. If I'm right, each event is linked to a situation so you can retrieve all events for a specific situation then sort them by start_date then create relationships between them (e.g. Event1 -> Event2 -> Event3).
Yes, I already got the sequence by just simply order by the start time and I get stuck on the transition capture step. Nevermind,I will search and maybe post another question on community. Thanks