How to batch load set of data with a lot of empty properties (Fast Batched Updates of Graph Structures with Neo4j and Cypher)

paolodipietro58 · December 20, 2018, 12:34am

Following Michael Hunger article and andrew bowman suggestions
andrew.bowman suggestions, I finally have a parameter file definition that loads without problems.

My problem, is that the properties matrix is full of null values, so I'm not able to write a query which bypass those empty values.

How can define a conditional merge?

andrew_bowman · December 20, 2018, 12:39am

Usually when you're merging in nodes you have the equivalent of a primary key or set of properties as a node key which would uniquely identify that node among others of its type (label).

We'd recommend adding a unique constraint (or a Node Key constraint, if multiple properties rather than a singel property on the node signify uniqueness, and you're using Enterprise Edition) to help out here.

When performing the MERGE, you should include on the node just the property(or properties) that are unique for that node (these must not be null). The rest you can add on via SET, or ON CREATE SET if you only want to add them in the event that the node gets created. When using SET or ON CREATE SET then the values are allowed to be null.

paolodipietro58 · December 20, 2018, 1:45am

Well, you suggest a very complex solution in my use case !!!

I have a large excel table, full of empty values

The primary key could be the Codice column, or the concatenation of the Cognome,Nome,Data di Nascita

Also, the Luogo di nascita/birth place/col 3 is not mandatory, and the Stato/Country /col 4 is mandatory only for foreign birth places.

Via/Street, Cap/Zip code, City is the address, (all the field combinations are optional!)

There should be a cellulare/mobile phone AND/OR an email

Finally, there should be some annotations.

The cherry on the cake is the label: this is a Person and is also a Patient, so the label should be Person:Patient, but there should be also doctors, and the label should then be Person:Doctor or Person:Patient:Doctor in the case the doctor is also a patient (but the Person properties and basic relationships remains the same!)

As you can see, this simple example explain a very basic situation that is completely different with respect to the one you suggested.

Maybe a conditional statement into the query would solve everything, but an if statement is not available.

And this is why I generated the old very long query: each fragment was slightly different by the other, creating nodes with only the defined properties for that node.

How can we go out of this knot?

andrew_bowman · December 20, 2018, 2:41am

Okay, so this CSV you're loading, is every row meant to represent a unique person, or could the same person's data be present in multiple rows? Also, are any of these persons already in the graph?

If this is new data, and the persons are unique in the CSV, then feel free to use CREATE instead of MERGE when creating the person, then set the values you need from the CSV.

Otherwise, if the persons may already be present in the graph, or the same person's data may be duplicated in the CSV, you need to determine what uniquely identifies a person, whether it's a single property or multiple. That is the thing you would need to create a constraint (or node key) on, and that's what you would need to supply when you MERGE on it before you set the remaining properties from the CSV.

Topic		Replies	Views
Create new node/relationship based on availability of certain attributes in LOAD CSV Cypher load-csv	4	734	July 4, 2020
How to check empty strings for all property values when loading csv? Neo4j Graph Platform	2	2318	November 9, 2020
Load CSV with empty cells Cypher	14	8609	December 3, 2019
Need to create node for all blank key properties but properties might add in near future Neo4j Graph Platform migrated	1	186	January 29, 2023
Conditionally add a property to a node when importing CSV with empty spaces Cypher	8	976	November 5, 2020

Get Certified in June!

How to batch load set of data with a lot of empty properties (Fast Batched Updates of Graph Structures with Neo4j and Cypher)

Related topics