I want to clone a graph and set the nodes attributes of the new copy based on a array of dictionary. I did the following query:
MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
nodes,
[rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
{ standinNodes:[[rootSkill, rootStudent]],
skipProperties:['id'] })
YIELD input, output, error
WITH collect(output) as nodes, rootStudent
UNWIND nodes as node
UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
WITH DISTINCT score, node, rootStudent
WHERE node.name = score.skill
SET node.id = apoc.create.uuid(), node.score = score.value
WITH rootStudent
MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
DELETE relation
RETURN DISTINCT learned;
It works perfectly, but the problem is that when the array is large
[{skill: "name 1", value: 0.9999999863}, {skill: "name 2", value: 0.3}, ...]
I got a MemoryPoolOutOfMemoryError. I am using neo4j Aura and I can't change the neo4j.conf file. Is there anyway that I can optimize this query? Can i split it in multiple parts?
Look into apoc.periodic.iterate and break the query up into two pieces -- the first enumerates what you have to do and the second takes that action on batches. This query probably won't work exactly, but it'll give you the right general process:
CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
nodes,
[rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
{ standinNodes:[[rootSkill, rootStudent]],
skipProperties:['id'] })
YIELD input, output, error
RETURN output, rootStudent AS node",
"UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
WITH DISTINCT score, node, rootStudent
WHERE node.name = score.skill
SET node.id = apoc.create.uuid(), node.score = score.value
WITH rootStudent
MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
DELETE relation
RETURN DISTINCT learned;
",
{ batchSize: 1000, parallel: false });
Note I got rid of one of your UNWINDs. The first query feeds a stream of results to the second mutating query.
1 Like
Hi @david_allen ! Thanks for the answer.
I tried the following:
CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
nodes,
[rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
{ standinNodes:[[rootSkill, rootStudent]],
skipProperties:['id'] })
YIELD input, output, error
RETURN output as item, rootStudent",
"UNWIND
[{skill: 'Skill 1', value: 0.9999999863}, {skill: 'Skill 2', value: 0.3}]
as score
WITH DISTINCT score, item, rootStudent
WHERE item.name = score.skill
SET item.id = apoc.create.uuid(), item.score = score.value
WITH rootStudent
MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
CREATE (student)-[learned:LEARNED {version: 5, created_at: timestamp()}]->(s)
DELETE relation
RETURN DISTINCT learned;
",
{ batchSize: 1000, parallel: false });
it ran with no errors but no node was created.
I tried to return each node from the cloneSubgraph output and pass to the next iterate to process and update accordantly. Any ideas why it doesn't work? Thanks