Generalised batch import of nodes and relationships

We import approx 250,000 records into our Neo4J database on a daily basis. The records are split into approx 30 datasets each of which focusses on a single type of data. eg: Systems, People or Servers.

The processing was not a fast as we would have liked!

In an earlier attempt to improve performance we ran a read-before-write process to avoid writing unchanged records. Unfortunately that lead us to use individual writes. We now just use a 'batched' process where each CYPHER transaction contains approx 1000 records.

We have gained a 100x performance increase but we were wondering if we have written the optimum CYPHER for those 'batch' transactions.

A typical batch of data looks like this. We chose to split properties from relationships so we can use UNWIND:

[{
	_type: 'Testing',
	code: `01-parent-batch-1000`,
	properties: {
		name: `01-parent-name-batch-1000`,
		description: `this is batch test 01 of 1000 snippets`,
		lifecycleStage: `01-preproduction`,
		serviceTier: `01-unsupported`,
	},
	relationships: [
		{
			name: 'CONNECTED_TO',
			_type: 'Testing',
			code: `01-connected-child-batch-1000`,
			rich: {
				propOne: `first rich prop for 01`,
				propTwo: `second rich prop for 01`
			}
		},
		{
			name: 'ALSO_TO',
			_type: 'Testing',
			code: `01-also-child-batch-1000`,
		},
	],
},.....]

Our CYPHER looks like this:

UNWIND $payloads AS payload
CALL apoc.merge.node([payload._type], {code:payload.code})
YIELD node AS p
SET p += payload.properties
WITH p, payload
UNWIND payload.relationships as relationship
CALL apoc.merge.node([relationship._type], {code:relationship.code})
YIELD node AS c
WITH p,c,relationship
CALL apoc.merge.relationship(p,relationship.name,relationship.rich,null,c,null)
YIELD rel
RETURN p,rel,c

The key reason why we have questioned the syntax/performance of the above is that it doesnt handle deletions of relationships - it keeps merging relationships instead of replacing the existing relationships with the newer ones.

Has anyone developed a generic importer or can see any improvements to the above?

Thanks
Geoff

Do you have indexes created on property “code” for each node label that merges on “code”

Are you merging on the same relationship type and same values in “rich” each time, or they nay be different? These have to be the same to have a match, otherwise a new relationship will be created. What behavior do you want?

Thanks for the quick reply and for the reminder about indexes. We are checking them now.

Our ideal behaviour is for the payloads we send to replace the nodes/relationships which already exist.

As you can see the CYPHER does not currently contain any deletes as we know the nodes and their properties can be replaced (we dont have to delete them individually). However, there does not appear to be a simple 'replace' relationships concept so it looks like we need to explicitly delete the existing relationship before we create the ones in the payloads. :thinking:

The APOC merge relationship looks for existing relationships between the two nodes with the specified type AND that have the properties and values specified in the identProps map. If neither the type nor the properties in the identProps map do not match, the a new relationship will be created. If a match is found, the matching relationships will be updated with the properties in the onMatchProps map.

If you want it to make every time based only on the relationship type, then use the following instead:

CALL apoc.merge.relationship(p,relationship.name,{},{},c,{})

Thanks for your help. We have been approaching this from multiple angles, and yes the indexes and the parameters within apoc.merge.relationship have helped. Thanks.

For a more complete picture of what we have been trying to do and what we have done .... and the key bit of syntax we missed please take a look at this stackoverflow question: graph - Replacing all relationships between nodes in Neo4j - Stack Overflow