"WebSocket connection failure" When loading csv for hours

I used the following command on neo4j desktop, my csv file has approximately 220000 records and the process take really long and never finish

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS from "file:///file_name.csv" AS line
WITH line.vin AS carid, line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
Merge (c:Car {carID: carid})
Merge (e:Event {eventType: event, start_time:st, end_time:et, probe_id:sid})
Create (c)-[:takePlace]->(e)

After few hours, this error message shows up

Any ideas about this?
Thanks in advance
Oli

Hello @oli :slight_smile:

I think it's because your query is too long.

First, you should create UNIQUE CONSTRAINTS on nodes:

CREATE INDEX index_car_id IF NOT EXISTS FOR (n:Car) ON (n.carID)
CREATE INDEX index_event_id IF NOT EXISTS FOR (n:Event) ON (n.probe_id)

Then, you load Car nodes:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH DISTINCT line.vin AS carid
MERGE (c:Car {carID: carid})

Then, you load Event nodes:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
MERGE (e:Event {eventType: event, start_time: st, end_time: et, probe_id: sid})

Then, you load relations:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.vin AS carid, line.sub_id as sid
MATCH (c:Car {carID: carid})
MATCH (e:Event {probe_id: sid})
CREATE (c)-[:takePlace]->(e)

You can also have a look at these posts:

Regards,
Cobra

Hi Cobra, Thanks for your advice. The first step is fast and now I'm running the second one (loading event) and it runs for hours. I think it would be the same error for the second step if there is no exception

  • Is line.sub_id unique for each Event?
  • Did you create the UNIQUE CONSTRAINT for Event nodes?
  • What is the RAM allocated to Neo4j?

Try this:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event as event, line.sub_id as sid
MERGE (e:Event {probe_id: sid})
ON CREATE SET eventType = event, start_time = st, end_time = et, 

Not really, the event node is hard to find an unique constrain in my case, but the car do have.
The memory allocation setting is
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=10G
dbms.memory.pagecache.size=2G

  • I see but what is the aim of the property probe_id?
  • Can you add a unique ID to each event?

it's just a sub_class id, I think I may need to generate an event it by using the string ("event_st_et_sub_id")?

Yes, it's a good idea.

First, delete the constraint on probe_id:

DROP CONSTRAINT index_event_id

Then, create a new one:

CREATE INDEX index_event_id IF NOT EXISTS FOR (n:Event) ON (n.eventID)

Then, load Event nodes:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.start_time AS st, line.end_time AS et, line.event AS event, line.sub_id AS sid, toString(line.event) + "_" + toString(line.start_time) + "_" + toString(line.end_time) + "_" + toString(line.sub_id) AS eventID
MERGE (e:Event {eventID: eventID})
ON CREATE SET eventType = event, start_time = st, end_time = et, probe_id = sid

Finally, load relations:

:auto USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///file_name.csv" AS line
WITH line.vin AS carID, toString(line.event) + "_" + toString(line.start_time) + "_" + toString(line.end_time) + "_" + toString(line.sub_id) AS eventID
MATCH (c:Car {carID: carID})
MATCH (e:Event {eventID: eventID})
CREATE (c)-[:takePlace]->(e)

Cannot merge the following node because of null property value for 'eventID': (:Event {eventID: null})

I think it generates null for the eventID

You shoud add an uuid for each event in your CSV file.

That works, thanks Cobra. May I ask a question about ordering the event by the start time? So, suppose I would retrive the event that next happen from another one

First you should cast start_time and end_time into datetime. Then create indexes for search performance on these both properties. Finally, in your queries, you will be available to compare these properties thanks to operators.

So the case might be I have a car and there is a sequence order by only start time (event 1, event 1, event2, event2, event1, event 2) something like this and what I would know how may transitions from event 1 to event2, something like this. or I can retrive all the cars which contains transitions that event1 to event2. Are there any examples I could follow? Thanks a lot

I don't have examples right now but I'm sure you can find some on Internet. If I'm right, each event is linked to a situation so you can retrieve all events for a specific situation then sort them by start_date then create relationships between them (e.g. Event1 -> Event2 -> Event3).

Yes, I already got the sequence by just simply order by the start time and I get stuck on the transition capture step. Nevermind,I will search and maybe post another question on community. Thanks

1 Like