How to clone a graph and set attribute of new subgraph efficiently?

lucca.zenobio · June 30, 2021, 12:44am

I want to clone a graph and set the nodes attributes of the new copy based on a array of dictionary. I did the following query:

MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
WITH collect(output) as nodes, rootStudent
UNWIND nodes as node
UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;

It works perfectly, but the problem is that when the array is large

[{skill: "name 1", value: 0.9999999863}, {skill: "name 2", value: 0.3}, ...]

I got a MemoryPoolOutOfMemoryError. I am using neo4j Aura and I can't change the neo4j.conf file. Is there anyway that I can optimize this query? Can i split it in multiple parts?

david_allen · July 7, 2021, 12:41pm

Look into apoc.periodic.iterate and break the query up into two pieces -- the first enumerates what you have to do and the second takes that action on batches. This query probably won't work exactly, but it'll give you the right general process:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output, rootStudent AS node",

"UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

Note I got rid of one of your UNWINDs. The first query feeds a stream of results to the second mutating query.

lucca.zenobio · July 11, 2021, 1:28pm

Hi @david_allen ! Thanks for the answer.

I tried the following:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output as item, rootStudent",

"UNWIND
[{skill: 'Skill 1', value: 0.9999999863}, {skill: 'Skill 2', value: 0.3}]

as score
    WITH DISTINCT score, item, rootStudent
    WHERE item.name = score.skill
    SET item.id = apoc.create.uuid(), item.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 5, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

it ran with no errors but no node was created.
I tried to return each node from the cloneSubgraph output and pass to the next iterate to process and update accordantly. Any ideas why it doesn't work? Thanks

Topic		Replies	Views
Is it possible to use apoc.cloneSubgraphFromPaths (or similar) to clone a large graph in batches to prevent OOM? Neo4j Graph Platform apoc , performance , migrated , cypher-tagged , procedures-and-apoc	2	361	January 18, 2023
Is it possible to use apoc.cloneSubgraphFromPaths (or similar) to clone a large graph in batches to prevent OOM? Procedures & APOC apoc , performance , cypher	0	315	July 28, 2020
Duplicate a subgraph Cypher apoc , performance , cypher	25	634	May 19, 2022
Cone Sub graph with new label and relationship Procedures & APOC apoc , cypher , relationship	2	258	April 13, 2021
Facing difficulties to create a subgraph from original one in NEO4j Cypher networkx	15	1873	November 19, 2019

August Summer Fun!

How to clone a graph and set attribute of new subgraph efficiently?

Related topics