How to use apoc.create.nodes()

Am trying to create new nodes from properties of existing nodes but keep getting errors and there is 0 documentation. Could anyone please provide some pointers?

MATCH (n1:IpAddress) 
WHERE NOT n1.GeoLocationIdObject is null 
WITH collect(n1) as items 
CALL apoc.create.nodes(['GeoLocation:Tag:Geo'], [{
IdObject: items.GeoLocationIdObject, 
IdUnique: apoc.create.uuid(), 
Name: 'geo location', 
IdDatastore: items.IdDatastore, 
GeoLatitude: items.GeoLatitude, 
GeoLongitude: items.GeoLongitude, 
TimeCreated: items.TimeCreated, 
TimeUpdated: datetime(), 
Source: 'graph-refactor-6'
}])
YIELD node
RETURN *

Returns the following error ...

Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Any, Map, Node, Relationship, Point, Duration, Date, Time, LocalTime, LocalDateTime or DateTime but was List<Node> (line 5, column 12 (offset: 182))
"{IdObject: items.GeoLocationIdObject}, "

Which I think is weird because according to the documentation here it does expect a list, no?

CALL apoc.create.nodes(['Label'], [{key:value,…​}])

I got this to work using apoc.periodic.iterate() with batchSize:1 but that seems clunky ... and I cannot create relationships at the same time ...

CALL apoc.periodic.iterate("
MATCH (n1:IpAddress) 
WHERE NOT n1.GeoLocationIdObject is null 
RETURN n1", "
CREATE (n3:GeoLocation:Tag:Geo:VendorIpStack:PlatformIpStack)
SET n3.IdObject = n1.GeoLocationIdObject 
SET n3.IdUnique = apoc.create.uuid() 
SET n3.Name = 'geo location' 
SET n3.IdDatastore = n1.IdDatastore 
SET n3.GeoLatitude = n1.GeoLatitude 
SET n3.GeoLongitude = n1.GeoLongitude 
SET n3.TimeCreated = n1.TimeCreated  
SET n3.TimeUpdated = datetime() 
SET n3.Source = 'graph-refactor-6' 
RETURN *", 
{batchSize:1, iterateList:true, parallel:true})

Ideally I would like to run the following to create thousands of new nodes and relationships from existing nodes but instead I just create 1 new node each time ... for some reason I cannot iterate through the list ... or it does and overwrites the new nodes again and again so only 1 is left ...

MATCH (n1:IpAddress)-[r1:LOCATED_IN]-(n2:Geo) 
WHERE NOT n1.GeoLocationIdObject is null 
WITH n1, r1, n2 
MERGE (n3:GeoLocation:Tag:Geo:VendorIpStack:PlatformIpStack) 
ON CREATE SET n3.IdObject = n1.GeoLocationIdObject 
ON CREATE SET n3.IdUnique = apoc.create.uuid() 
ON CREATE SET n3.Name = 'geo location' 
ON CREATE SET n3.IdDatastore = n1.IdDatastore 
ON CREATE SET n3.GeoLatitude = n1.GeoLatitude 
ON CREATE SET n3.GeoLongitude = n1.GeoLongitude 
ON CREATE SET n3.TimeCreated = n1.TimeCreated  
ON CREATE SET n3.TimeUpdated = datetime() 
ON CREATE SET n3.Source = 'graph-refactor-6' 
MERGE (n1)-[r2:LOCATED_IN]->(n3)-[r3:LOCATED_IN]->(n2) 
ON CREATE SET r2.IdUnique = apoc.create.uuid() 
ON CREATE SET r2.TimeCreated = r1.TimeCreated 
ON CREATE SET r2.TimeUpdated = datetime() 
ON CREATE SET r2.Source = 'graph-refactor-6' 
ON CREATE SET r3.IdUnique = apoc.create.uuid() 
ON CREATE SET r3.TimeCreated = r1.TimeCreated 
ON CREATE SET r3.TimeUpdated = datetime() 
ON CREATE SET r3.Source = 'graph-refactor-6' 
RETURN n3

First of all, see my answer to question Failure to add calculated property to 60MM nodes using apoc.periodic.iterate - #2 by shan to see how you can get a status report of apoc.periodic.iterate that shows failed operations.

Secondly, in your second query (where you don't use apoc.periodic.iterate), the main problem is this line:

MERGE (n3:GeoLocation:Tag:Geo:VendorIpStack:PlatformIpStack) 

This means you only create one node and for the next ones, MERGE always results in a MATCH and not a CREATE. Move your id property (e.g., IdObject) of the GeoLocation to inside the MERGE command and that should fix it. Or use CREATE depending on your usecase.

By the way, when you set parallel:true in apoc.periodic.iterate, if you have some operations inside iterate that lock nodes, then you may end up with some failed operations. That is usually the case when you create relationships inside iterate. Let's say a is going to be connected to b and c in 2 different batches. in the first one, both a and b will be locked first which means the second batch won't be able to obtain a lock on a to connect it to c. If you go to the link I sent above, you will see how you can see failed operations in the result.
If you really want to set parallel to true for performance purposes, you can possibly create all nodes first in one iterate block with parallel:true and a large batch size, and in a second iterate block, create relationships with parallel:false.

1 Like