I am loading a large data file into a Neo4j database, and I see a problem when applying apoc.merge.node in the apoc.periodic.iterate() procedure. I have run this query several times, and it never executed. The reason appears to be that it is running the apoc.load.csv() procedure in the background, and it never sends the data to the apoc.merge.node() procedure.
The version of Neo4J database I'm using is:
Neo4j Desktop - 1.1.17
Version: 3.5.3 Enterprise .
Settings:
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G
apoc.import.file.enabled=true
apoc.import.file.use_neo4j_config=false
The procedures I'm running utilize apoc.periodic.iterate and two other APOC procedures, including the ability to load a CSV file, and the method of merging nodes.
CALL apoc.periodic.iterate( "CALL apoc.load.csv('/data/all_data_now.csv') yield map return map", "
CALL apoc.merge.node(map.NODE_NAME, {uuid:map.NODE_ID} ,
{indicies:split(map.NODE_INDICIES,',') ,
data:split(map.NODE_DATA,',') , line: map.NODE_LINE}) " ,
{batchSize:10000, iterateList:true, parallel:true})
When I ran a different version that didn't need to use the apoc.node.merge because I initially created separate files that had only one type of node name, it executed fine. I used a Python script to create MERGE with the proper node name. This new version requires that I use the correct NODE name based on the column in the CSV file. This is the old call I used which worked.
CALL apoc.periodic.iterate( "CALL apoc.load.csv('/data/all_data_now.csv') yield map return map",
" MERGE (n:ICD9{uuid:map.NODE_ID})
SET n.indicies = split(map.NODE_INDICIES,',') ,
n.data = split(map.NODE_DATA,',') ,
n.line = map.NODE_LINE "
Data File:
I created the CSV data file an open-source file by processing it in Python and placing it into a single large file. I was able to read the whole file in Python Pandas and determined that the file has four columns and over 124 million rows.
`
- NODE 124091330
- NODE_ID 124091330
- NODE_INDICIES 124091327
- NODE_DATA 49309879
- dtype: int64
`
The problem I'm seeing is based on running the function "dbms.listQueries()"
The query list shows that there is the 'apoc.load.csv' procedure is running as cypher runtime=sloted that continues to run, and the main procedure never executes because it is waiting for the slotted procedure to complete.
I've utilized apoc.periodic.iterate() many times to load large CSV files, and it has always worked well.
What causes the procedure not to execute?
Brett Taylor