Using `UNWIND` with `apoc.periodic.iterate` to load data into a graph


(Michael McKenzie) #1

I am trying to unwind a list of files to do a periodic iterate import and merging of data. See my code below:

CALL apoc.periodic.iterate("
UNWIND [
	'file:///AK14.csv',
	'file:///AL14.csv',
	'file:///AR14.csv',
	'file:///AZ14.csv',
	'file:///CA14.csv',
	'file:///CO14.csv',
	'file:///CT14.csv',
	'file:///DC14.csv',
	'file:///DE14.csv',
	'file:///FL14.csv',
	'file:///GA14.csv',
	'file:///HI14.csv',
	'file:///IA14.csv',
	'file:///ID14.csv',
	'file:///IL14.csv',
	'file:///IN14.csv',
	'file:///KS14.csv',
	'file:///LA14.csv',
	'file:///MA14.csv',
	'file:///MD14.csv',
	'file:///ME14.csv',
	'file:///MI14.csv',
	'file:///MN14.csv',
	'file:///MO14.csv',
	'file:///MS14.csv',
	'file:///MT14.csv',
	'file:///NC14.csv',
	'file:///ND14.csv',
	'file:///NE14.csv',
	'file:///NH14.csv',
	'file:///NJ14.csv',
	'file:///NM14.csv',
	'file:///AK14.csv',
	'file:///NV14.csv',
	'file:///NY14.csv',
	'file:///OH14.csv',
	'file:///OK14.csv',
	'file:///OR14.csv',
	'file:///PA14.csv',
	'file:///PR14.csv',
	'file:///RI14.csv',
	'file:///SC14.csv',
	'file:///SD14.csv',
	'file:///TN14.csv',
	'file:///TX14.csv',
	'file:///UT14.csv',
	'file:///VA14.csv',
	'file:///VT14.csv',
	'file:///WA14.csv',
	'file:///WI14.csv',
	'file:///WV14.csv',
	'file:///WY14.csv' 
	] AS file
LOAD CSV WITH HEADERS FROM file AS row RETURN row",
"
MERGE (bridge:Bridge {id: row.STRUCTURE_NUMBER_008})
MERGE (place:Place {id: row.PLACE_CODE_004})
MERGE (county:County {id: row.COUNTY_CODE_003})
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (owner:Owner {id: row.OWNER_022})
MERGE (maintResp:MaintenanceResp {id: row.MAINTENANCE_021})
MERGE (bridge)-[:OF_PLACE]->(place)
MERGE (place)-[:OF_COUNTY]->(county)
MERGE (county)-[:OF_STATE]->(state)
MERGE (bridge)-[:OWNED_BY]->(owner)
MERGE (bridge)-[:MAINTAINED_BY]->(maintResp)
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
			  bridge.latitude = row.LAT_016,
			  bridge.longitude = row.LONG_017,
			  bridge.yearbuilt = row.YEAR_BUILT_027,
			  place.name = row.PLACE_CODE_004,
			  county.name = row.COUNTY_CODE_003,
			  state.name = row.STATE_CODE_001,
			  owner.name = row.OWNER_022,
			  maintResp.name = row.MAINTENANCE_021
",
{batchSize:1000,iterateList:true})
YIELD batches, total 
RETURN batches, total

This is a process I will have to repeat for many more files like this. I tried letting this run last night and it crashed. I have adjusted dbms.memory.heap.initial_size=1G and dbms.memory.heap.max_size=2G.

My thought is that I should pull UNWIND outside of the apoc.periodic.iterate, but I have gotten stuck there so far.


(Michael McKenzie) #2

Ok. I forgot to create indices on my nodes (this is the largest data set I have been working with):

CREATE INDEX ON :Bridge(id);
CREATE INDEX ON :Place(id);
CREATE INDEX ON :County(id);
CREATE INDEX ON :State(id);
CREATE INDEX ON :Owner(id);
CREATE INDEX ON :MaintenanceResp(id);

I was then able to run my import without issue.


Mass File Import from Google Drive