How to batch load set of data with a lot of empty properties (Fast Batched Updates of Graph Structures with Neo4j and Cypher)

import
batching

(Paolodipietro58) #1

Following Michael Hunger article and andrew bowman suggestions
andrew.bowman suggestions, I finally have a parameter file definition that loads without problems.

My problem, is that the properties matrix is full of null values, so I'm not able to write a query which bypass those empty values.

How can define a conditional merge?


(Andrew Bowman) #2

Usually when you're merging in nodes you have the equivalent of a primary key or set of properties as a node key which would uniquely identify that node among others of its type (label).

We'd recommend adding a unique constraint (or a Node Key constraint, if multiple properties rather than a singel property on the node signify uniqueness, and you're using Enterprise Edition) to help out here.

When performing the MERGE, you should include on the node just the property(or properties) that are unique for that node (these must not be null). The rest you can add on via SET, or ON CREATE SET if you only want to add them in the event that the node gets created. When using SET or ON CREATE SET then the values are allowed to be null.


(Paolodipietro58) #3

Well, you suggest a very complex solution in my use case !!!

I have a large excel table, full of empty values

The primary key could be the Codice column, or the concatenation of the Cognome,Nome,Data di Nascita

Also, the Luogo di nascita/birth place/col 3 is not mandatory, and the Stato/Country /col 4 is mandatory only for foreign birth places.

Via/Street, Cap/Zip code, City is the address, (all the field combinations are optional!)

There should be a cellulare/mobile phone AND/OR an email

Finally, there should be some annotations.

The cherry on the cake is the label: this is a Person and is also a Patient, so the label should be Person:Patient, but there should be also doctors, and the label should then be Person:Doctor or Person:Patient:Doctor in the case the doctor is also a patient (but the Person properties and basic relationships remains the same!)


As you can see, this simple example explain a very basic situation that is completely different with respect to the one you suggested.

Maybe a conditional statement into the query would solve everything, but an if statement is not available.

And this is why I generated the old very long query: each fragment was slightly different by the other, creating nodes with only the defined properties for that node.

How can we go out of this knot?


(Andrew Bowman) #4

Okay, so this CSV you're loading, is every row meant to represent a unique person, or could the same person's data be present in multiple rows? Also, are any of these persons already in the graph?

If this is new data, and the persons are unique in the CSV, then feel free to use CREATE instead of MERGE when creating the person, then set the values you need from the CSV.

Otherwise, if the persons may already be present in the graph, or the same person's data may be duplicated in the CSV, you need to determine what uniquely identifies a person, whether it's a single property or multiple. That is the thing you would need to create a constraint (or node key) on, and that's what you would need to supply when you MERGE on it before you set the remaining properties from the CSV.