Is apoc.periodic.iterate not releasing memory and causing a websocket error?

Hi,

I have been dealing with a websocket error as described in this post this post.

Below is the query I have been trying to run when this error has been occuring:

MATCH (file:File)
WHERE NOT (file)-[:CONTAINS]->(:Row)
WITH file, collect(file.url) AS fileURLs
UNWIND fileURLs AS fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row
RETURN row
','
CREATE (fileRow:Row {fileURl: $url,
					 createdOn: date(),
                     STATE_CODE_001: row.STATE_CODE_001,
                        HIGHWAY_DISTRICT_002: row.HIGHWAY_DISTRICT_002,
                        COUNTY_CODE_003: row.COUNTY_CODE_003,
                        PLACE_CODE_004: row.PLACE_CODE_004,
                        RECORD_TYPE_005A: row.RECORD_TYPE_005A,
                        ROUTE_PREFIX_005B: row.ROUTE_PREFIX_005B,
                        SERVICE_LEVEL_005C: row.SERVICE_LEVEL_005C,
                        ROUTE_NUMBER_005D: row.ROUTE_NUMBER_005D,
                        DIRECTION_005E: row.DIRECTION_005E,
                        FEATURES_DESC_006A: row.FEATURES_DESC_006A,
                        CRITICAL_FACILITY_006B: row.CRITICAL_FACILITY_006B,
                        FACILITY_CARRIED_007: row.FACILITY_CARRIED_007,
                        STRUCTURE_NUMBER_008: row.STRUCTURE_NUMBER_008,
                        LOCATION_009: row.LOCATION_009,
                        MIN_VERT_CLR_010: row.MIN_VERT_CLR_010,
                        KILOPOINT_011: row.KILOPOINT_011,
                        BASE_HWY_NETWORK_012: row.BASE_HWY_NETWORK_012,
                        LRS_INV_ROUTE_013A: row.LRS_INV_ROUTE_013A,
                        SUBROUTE_NO_013B: row.SUBROUTE_NO_013B,
                        LAT_016: row.LAT_016,
                        LONG_017: row.LONG_017,
                        DETOUR_KILOS_019: row.DETOUR_KILOS_019,
                        TOLL_020 : row.TOLL_020,
                        MAINTENANCE_021: row.MAINTENANCE_021,
                        OWNER_022: row.OWNER_022,
                        FUNCTIONAL_CLASS_026: row.FUNCTIONAL_CLASS_026,
                        YEAR_BUILT_027: row.YEAR_BUILT_027,
                        TRAFFIC_LANES_ON_028A: row.TRAFFIC_LANES_ON_028A,
                        TRAFFIC_LANES_UND_028B: row.TRAFFIC_LANES_UND_028B,
                        ADT_029: row.ADT_029,
                        YEAR_ADT_030: row.YEAR_ADT_030,
                        DESIGN_LOAD_031: row.DESIGN_LOAD_031,
                        APPR_WIDTH_MT_032: row.APPR_WIDTH_MT_032,
                        MEDIAN_CODE_033: row.MEDIAN_CODE_033,
                        DEGREES_SKEW_034: row.DEGREES_SKEW_034,
                        STRUCTURE_FLARED_035: row.STRUCTURE_FLARED_035,
                        RAILINGS_036A: row.RAILINGS_036A,
                        TRANSITIONS_036B: row.TRANSITIONS_036B,
                        APPR_RAIL_036C: row.APPR_RAIL_036C,
                        APPR_RAIL_END_036D: row.APPR_RAIL_END_036D,
                        HISTORY_037: row.HISTORY_037,
                        NAVIGATION_038: row.NAVIGATION_038,
                        NAV_VERT_CLR_MT_039: row.NAV_VERT_CLR_MT_039,
                        NAV_HORR_CLR_MT_040: row.NAV_HORR_CLR_MT_040,
                        OPEN_CLOSED_POSTED_041: row.OPEN_CLOSED_POSTED_041,
                        SERVICE_ON_042A: row.SERVICE_ON_042A,
                        SERVICE_UND_042B: row.SERVICE_UND_042B,
                        STRUCTURE_KIND_043A: row.STRUCTURE_KIND_043A,
                        STRUCTURE_TYPE_043B: row.STRUCTURE_TYPE_043B,
                        APPR_KIND_044A: row.APPR_KIND_044A,
                        APPR_TYPE_044B: row.APPR_TYPE_044B,
                        MAIN_UNIT_SPANS_045: row.MAIN_UNIT_SPANS_045,
                        APPR_SPANS_046: row.APPR_SPANS_046,
                        HORR_CLR_MT_047: row.HORR_CLR_MT_047,
                        MAX_SPAN_LEN_MT_048: row.MAX_SPAN_LEN_MT_048,
                        STRUCTURE_LEN_MT_049: row.STRUCTURE_LEN_MT_049,
                        LEFT_CURB_MT_050A: row.LEFT_CURB_MT_050A,
                        RIGHT_CURB_MT_050B: row.RIGHT_CURB_MT_050B,
                        ROADWAY_WIDTH_MT_051: row.ROADWAY_WIDTH_MT_051,
                        DECK_WIDTH_MT_052: row.DECK_WIDTH_MT_052,
                        VERT_CLR_OVER_MT_053: row.VERT_CLR_OVER_MT_053,
                        VERT_CLR_UND_REF_054A: row.VERT_CLR_UND_REF_054A,
                        VERT_CLR_UND_054B: row.VERT_CLR_UND_054B,
                        LAT_UND_REF_055A: row.LAT_UND_REF_055A,
                        LAT_UND_MT_055B: row.LAT_UND_MT_055B,
                        LEFT_LAT_UND_MT_056: row.LEFT_LAT_UND_MT_056,
                        DECK_COND_058: row.DECK_COND_058,
                        SUPERSTRUCTURE_COND_059: row.SUPERSTRUCTURE_COND_059,
                        SUBSTRUCTURE_COND_060: row.SUBSTRUCTURE_COND_060,
                        CHANNEL_COND_061: row.CHANNEL_COND_061,
                        CULVERT_COND_062: row.CULVERT_COND_062,
                        OPR_RATING_METH_063: row.OPR_RATING_METH_063,
                        OPERATING_RATING_064: row.OPERATING_RATING_064,
                        INV_RATING_METH_065: row.INV_RATING_METH_065,
                        INVENTORY_RATING_066: row.INVENTORY_RATING_066,
                        STRUCTURAL_EVAL_067: row.STRUCTURAL_EVAL_067,
                        DECK_GEOMETRY_EVAL_068: row.DECK_GEOMETRY_EVAL_068,
                        UNDCLRENCE_EVAL_069: row.UNDCLRENCE_EVAL_069,
                        POSTING_EVAL_070: row.POSTING_EVAL_070,
                        WATERWAY_EVAL_071: row.WATERWAY_EVAL_071,
                        APPR_ROAD_EVAL_072: row.APPR_ROAD_EVAL_072,
                        WORK_PROPOSED_075A: row.WORK_PROPOSED_075A,
                        WORK_DONE_BY_075B: row.WORK_DONE_BY_075B,
                        IMP_LEN_MT_076: row.IMP_LEN_MT_076,
                        DATE_OF_INSPECT_090: row.DATE_OF_INSPECT_090,
                        INSPECT_FREQ_MONTHS_091: row.INSPECT_FREQ_MONTHS_091,
                        FRACTURE_092A: row.FRACTURE_092A,
                        UNDWATER_LOOK_SEE_092B: row.UNDWATER_LOOK_SEE_092B,
                        SPEC_INSPECT_092C: row.SPEC_INSPECT_092C,
                        FRACTURE_LAST_DATE_093A: row.FRACTURE_LAST_DATE_093A,
                        UNDWATER_LAST_DATE_093B: row.UNDWATER_LAST_DATE_093B,
                        SPEC_LAST_DATE_093C: row.SPEC_LAST_DATE_093C,
                        BRIDGE_IMP_COST_094: row.BRIDGE_IMP_COST_094,
                        ROADWAY_IMP_COST_095: row.ROADWAY_IMP_COST_095,
                        TOTAL_IMP_COST_096: row.TOTAL_IMP_COST_096,
                        YEAR_OF_IMP_097: row.YEAR_OF_IMP_097,
                        OTHER_STATE_CODE_098A: row.OTHER_STATE_CODE_098A,
                        OTHER_STATE_PCNT_098B: row.OTHER_STATE_PCNT_098B,
                        OTHR_STATE_STRUC_NO_099: row.OTHR_STATE_STRUC_NO_099,
                        STRAHNET_HIGHWAY_100: row.STRAHNET_HIGHWAY_100,
                        PARALLEL_STRUCTURE_101: row.PARALLEL_STRUCTURE_101,
                        TRAFFIC_DIRECTION_102: row.TRAFFIC_DIRECTION_102,
                        TEMP_STRUCTURE_103: row.TEMP_STRUCTURE_103,
                        HIGHWAY_SYSTEM_104: row.HIGHWAY_SYSTEM_104,
                        FEDERAL_LANDS_105: row.FEDERAL_LANDS_105,
                        YEAR_RECONSTRUCTED_106: row.YEAR_RECONSTRUCTED_106,
                        DECK_STRUCTURE_TYPE_107: row.DECK_STRUCTURE_TYPE_107,
                        SURFACE_TYPE_108A: row.SURFACE_TYPE_108A,
                        MEMBRANE_TYPE_108B: row.MEMBRANE_TYPE_108B,
                        DECK_PROTECTION_108C: row.DECK_PROTECTION_108C,
                        PERCENT_ADT_TRUCK_109: row.PERCENT_ADT_TRUCK_109,
                        NATIONAL_NETWORK_110: row.NATIONAL_NETWORK_110,
                        PIER_PROTECTION_111: row.PIER_PROTECTION_111,
                        BRIDGE_LEN_IND_112: row.BRIDGE_LEN_IND_112,
                        SCOUR_CRITICAL_113: row.SCOUR_CRITICAL_113,
                        FUTURE_ADT_114: row.FUTURE_ADT_114,
                        YEAR_OF_FUTURE_ADT_115: row.YEAR_OF_FUTURE_ADT_115,
                        MIN_NAV_CLR_MT_116: row.MIN_NAV_CLR_MT_116,
                        FED_AGENCY: row.FED_AGENCY,
                        DATE_LAST_UPDATE: row.DATE_LAST_UPDATE,
                        TYPE_LAST_UPDATE: row.TYPE_LAST_UPDATE,
                        DEDUCT_CODE: row.DEDUCT_CODE,
                        REMARKS: row.REMARKS,
                        PROGRAM_CODE: row.PROGRAM_CODE,
                        PROJ_NO: row.PROJ_NO,
                        PROJ_SUFFIX: row.PROJ_SUFFIX,
                        NBI_TYPE_OF_IMP: row.NBI_TYPE_OF_IMP,
                        DTL_TYPE_OF_IMP: row.DTL_TYPE_OF_IMP,
                        SPECIAL_CODE: row.SPECIAL_CODE,
                        STEP_CODE: row.STEP_CODE,
                        STATUS_WITH_10YR_RULE: row.STATUS_WITH_10YR_RULE,
                        SUFFICIENCY_ASTERC: row.SUFFICIENCY_ASTERC,
                        SUFFICIENCY_RATING: row.SUFFICIENCY_RATING,
                        STATUS_NO_10YR_RULE: row.STATUS_NO_10YR_RULE})
',
{batchSize:1000,parallel:false,params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

I was using Halin to monitor the Heap Size as seen in the following image:

image

It appears as though apoc.periodic.iterate is holding onto memory and possibly leading to the websocket error. I have tried this query with a variety of configurations (batch size, RAM allocations, etc.) and it keeps happening. The Heap always approaches the limit set in the config settings no matter how small of a batch size I use or how much memory I allocate to it.

Can someone explain how apoc.period.iterate uses memory? Is there a better approach I can use or it an issue with apoc.periodic.iterate?

Any insight would be much appreciated.

There's a couple worrying things here.

When you do WITH file, collect(file.url) you're collecting, per file on a row, that file's single url property. So per row you're creating 1-element collections then UNWINDing them. That doesn't seem like it's what you want to do. Maybe you wanted to just collect all file URLs and UNWIND them?

apoc.periodic.iterate() doesn't work well with LOAD CSV, we'd suggest not using that approach. Maybe try using apoc.load.csv() instead.

@andrew.bowman You are correct in that the inclusion of file with collect() is definitely not intentional. Thank you for pointing that out. I have been using LOAD CSV in lieu of apoc.load.csv because I have had issues getting them to work together.

I tried the following query based on the apoc documentation

MATCH (file:File)
WHERE NOT (file)-[:CONTAINS]->(:Row)
WITH collect(file.url) AS fileURLs
UNWIND fileURLs AS fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) yield map as row
RETURN row
','
CREATE (fileRow:Row) SET p = row
',
{batchSize:5000,parallel:false,params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

which lead me to this error message:

You have communicated with me before on using the apoc.load.csv with apoc.periodic.iterate here.

@andrew.bowman I realized that my ArrayIndexOutOfBoundsException was caused by failing to include the parameter for headers. I changed CALL apoc.load.csv(url) to CALL apoc.load.csv(url,{header:true}) which corrected that issue. PALM TO FACE

I am still getting a websocket error from some of the files and will be digging into those.

In addition to adding {header:true} to the apoc.load.csv function, setting quoteChar: "\u0000" appears to have rectified my websocket errors. Big thanks to help from @mdfrenchman and @jsmccrumb to get this figured out!