cancel
Showing results for 
Search instead for 
Did you mean: 

How to clone a graph and set attribute of new subgraph efficiently?

I want to clone a graph and set the nodes attributes of the new copy based on a array of dictionary. I did the following query:

MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
WITH collect(output) as nodes, rootStudent
UNWIND nodes as node
UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;

It works perfectly, but the problem is that when the array is large

[{skill: "name 1", value: 0.9999999863}, {skill: "name 2", value: 0.3}, ...]

I got a MemoryPoolOutOfMemoryError. I am using neo4j Aura and I can't change the neo4j.conf file. Is there anyway that I can optimize this query? Can i split it in multiple parts?

2 REPLIES 2

david_allen
Neo4j
Neo4j

Look into apoc.periodic.iterate and break the query up into two pieces -- the first enumerates what you have to do and the second takes that action on batches. This query probably won't work exactly, but it'll give you the right general process:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output, rootStudent AS node",

"UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

Note I got rid of one of your UNWINDs. The first query feeds a stream of results to the second mutating query.

Hi @david.allen ! Thanks for the answer.

I tried the following:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output as item, rootStudent",

"UNWIND
[{skill: 'Skill 1', value: 0.9999999863}, {skill: 'Skill 2', value: 0.3}]

as score
    WITH DISTINCT score, item, rootStudent
    WHERE item.name = score.skill
    SET item.id = apoc.create.uuid(), item.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 5, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

it ran with no errors but no node was created.
I tried to return each node from the cloneSubgraph output and pass to the next iterate to process and update accordantly. Any ideas why it doesn't work? Thanks

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online