Working of aggreagations with periodic iterate

aman.negi · April 19, 2023, 10:08am

Hey community ,

I am working with a cypher query as follows :

CALL apoc.load.csv("file:///temp/cms_data/cms_data.csv", {header:true, sep : '|', ignore:['label']}) YIELD map as row
MATCH (patent:PATENT {app_num : row.app_num})
CREATE (file:FILE_NODE {id : row.id})
SET file += apoc.map.clean(row, [], [])
CREATE (patent)-[:HAS_FILE]->(file)
WITH collect(file.document_code) as docs, count(file) as file_count,patent
CREATE (patent)-[:UPDATED_CMS]->(up:UPDATE_META_CMS)
SET up.added_files = file_count , up.added_docs = docs

now this works fine when the CSV is not as big as it can be.
So I wanted to use apoc.periodic.iterate for doing this stuff in batches.

Now here is my question :
The last node that I am creating i.e the UPDATE_META_CMS takes values by aggregating the details of files added to a specific PATENT node and then attaching this new UPDATE_META_CMS node, will this work fine as now the process is running in batches ?

NOTE : the CSV is made such that all the files related to a patent comes as a cluster i.e files related to a patent will come in succession i.e after the patent number changes no further files of the previous patent will come moving forward.
I don't think we can use this knowledge to enahance the process but if we can , then please do share it.

TIA,
Aman

glilienfield · April 19, 2023, 10:54am

You could in the first query in apoc.periodic.iterate collect all the files for a patent and return the patent and collection of a patent’s files. Then in the update query, unwind the files and create the file nodes and other relationships. In this way, you will have all a patent’s files when processing the batch.

aman.negi · April 19, 2023, 11:14am

hey @glilienfield Can you please give a rough idea that how will the query look like ?

glilienfield · April 19, 2023, 11:27pm

You can try this. Sorry, I don't have data to test it. I think you will be safe executing in parallel, since each patent is processed together and all a patent's files have been collected with the patent. Anyways, give it a try.

CALL apoc.periodic.iterate(
"
    CALL apoc.load.csv('file:///temp/cms_data/cms_data.csv', 
    {header: true, sep: '|', ignore: ['label']}) YIELD map as row
    WITH row.app_num as patent_num, collect(apoc.map.clean(row, [], [])) as file_data
    MATCH (patent:PATENT {app_num: patent_num})
    RETURN patent, file_data
",
"
    forEach(row in file_data | 
        CREATE (file:FILE_NODE {id : row.id})
        SET file = row
        CREATE (patent)-[:HAS_FILE]->(file)
    )
    CREATE (patent)-[:UPDATED_CMS]->(up:UPDATE_META_CMS)
    SET up.added_files = size(file_data) , up.added_docs = [i in file_data | i.document_code]
",
{batchSize:1000, parallel:true})

aman.negi · April 21, 2023, 9:36am

Hey @glilienfield I tested it out . It worked, thank you so much.

Regards,
Aman

glilienfield · April 21, 2023, 7:16pm

You are welcome. An FYI, Your use of apoc.map.clean does not do anything, since both lists are empty. Calling it like this will return the original list, 'row' in your case.

Topic		Replies	Views
What's wrong with apoc.periodic.iterate.sub-batching.cypher example code? Import / Export apoc , import	3	1264	October 1, 2018
Apoc.periodic.iterate - syntax issue Procedures & APOC apoc , cypher	4	347	November 11, 2020
Apoc.periodic.iterate with apoc.export.csv.data General migrated	6	259	September 27, 2022
Cypher Query using apoc.do.when inside apoc.periodic.iterate does not work as intended? Cypher	1	345	November 24, 2020
Struggling with apoc.periodic.iterate in a big Query from python code Cypher apoc , cypher , apocperiodiciterate	12	5284	May 8, 2019

July Summer Fun!

Working of aggreagations with periodic iterate

Related topics