Query running endlessly for large input file

Hello,

I am using Neo4J Community 3.5.17. I wanted to find the closest :Fraud node to a :Person node for a list of fids(Person unique identifier) and am using the following query. Please note fids are currently of string type whereas the input and output files contain fid in int type.

profile cypher runtime=interpreted load csv with headers from 'file:///shortest_path_data/test.csv' as line with line.fid as fid match (n:Person) where n.fid=toString(fid) with n call apoc.path.expandConfig(n,{labelFilter:'/Fraud', maxLevel:10, optional:true, limit:1}) yield path return toInteger(n.fid) as fid,length(path)/2 as distance;

This query worked for an input file of 30K fids, however runs endlessly and triggers GC for 300K fids. Here is the debug log-

2020-04-21 19:55:57.554+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=417944, gcTime=313357, gcCount=11}
2020-04-21 19:59:39.207+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=322338, gcTime=322433, gcCount=10}
2020-04-21 20:05:03.770+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=327987, gcTime=328082, gcCount=9}

Current Heap Size is 31g and pagecache size is 100g

Please guide me with how to go forward with this.

Thanks and Regards,
Kevin

if you rerun and include a PERIODIC COMMIT ( https://neo4j.com/docs/cypher-manual/4.0/clauses/load-csv/#load-csv-setting-the-rate-of-periodic-commits ) such that you commit every 5k records, for example does this provide any improvement?

In addition to Dana's suggestion, make sure you have an index on :Person(fid)

Hi Team,

I do have an index on :Person(fid). On using periodic commit, I get the following error-

Cannot use periodic commit in a non-updating query (line 1, column 36 (offset: 35))
"using periodic commit 5000 load csv with headers from 'file:///shortest_path_data/test.csv' as line with line.fid as fid match (n:Person) where n.fid=toString(fid) with n call apoc.path.expandConfig(n,{labelFilter:'/Fraud', maxLevel:10, optional:true, limit:1}) yield path return toInteger(n.fid) as fid,length(path)/2 as distance;"

Thanks and Regards,
Kevin

Ah, you need to prefix the query with :auto for this to work in the browser or cypher-shell.

For the explanation why, see here:

Hi Andrew,

Getting the following error-

Invalid input ':': expected <init> (line 1, column 36 (offset: 35))
":auto using periodic commit 5000 load csv with headers from 'file:///shortest_path_data/test.csv' as line with line.fid as fid match (n:Person) where n.fid=toString(fid) with n call apoc.path.expandConfig(n,{labelFilter:'/Fraud', maxLevel:10, optional:true, limit:1}) yield path return toInteger(n.fid) as fid,length(path)/2 as distance;"

Regards,
Kevin

How are you running this? Via the Neo4j Browser (if so which version)? Cypher-shell? Client code via a driver?

I am using cypher-shell.