How to clone a graph and set attribute of new subgraph efficiently?

I want to clone a graph and set the nodes attributes of the new copy based on a array of dictionary. I did the following query:

MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
WITH collect(output) as nodes, rootStudent
UNWIND nodes as node
UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;

It works perfectly, but the problem is that when the array is large

[{skill: "name 1", value: 0.9999999863}, {skill: "name 2", value: 0.3}, ...]

I got a MemoryPoolOutOfMemoryError. I am using neo4j Aura and I can't change the neo4j.conf file. Is there anyway that I can optimize this query? Can i split it in multiple parts?

Look into apoc.periodic.iterate and break the query up into two pieces -- the first enumerates what you have to do and the second takes that action on batches. This query probably won't work exactly, but it'll give you the right general process:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output, rootStudent AS node",

"UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

Note I got rid of one of your UNWINDs. The first query feeds a stream of results to the second mutating query.

1 Like

Hi @david_allen ! Thanks for the answer.

I tried the following:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output as item, rootStudent",

"UNWIND
[{skill: 'Skill 1', value: 0.9999999863}, {skill: 'Skill 2', value: 0.3}]

as score
    WITH DISTINCT score, item, rootStudent
    WHERE item.name = score.skill
    SET item.id = apoc.create.uuid(), item.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 5, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

it ran with no errors but no node was created.
I tried to return each node from the cloneSubgraph output and pass to the next iterate to process and update accordantly. Any ideas why it doesn't work? Thanks