Run MATCH query for multi core machine


(M Kiuchi) #1

Hi, comms.

I have 8 core and 32GBMEM machine and going to run MATCH query as follows, but this query consumes only 1 core and takes long time.

bar = df.to_dict(orient='records') #df is Pandas dataframe and have 1M rows
with n4jses.begin_transaction() as tx:
    result = tx.run("""UNWIND {bar} as d
                       MATCH (a:AD_ID) WHERE a.adid = d.Ad_id RETURN a.adid""",
                    parameters={'bar': bar})
    print(list(result))

Is there any way to run them in parallel ?

Regards,
MK


(Stefan Armbruster) #2

That by design that a Cypher query runs on one single CPU. You can either split up work into multiple cypher statements on client side or use some parallel execution procedures from the apoc library, see https://neo4j-contrib.github.io/neo4j-apoc-procedures/.


(M Kiuchi) #3

Woa ! Thanks much ! I divided source dataset and my query works fine (like this).

def matchNodes(pbar):
    with n4jses.begin_transaction() as tx:
        tx.run("""UNWIND {bar} as d
                  MATCH (a:AD_ID) WHERE a.adid = d.Ad_id""",
                parameters={'bar': pbar})

start=datetime.now()
print(len(bar))
nbulk=5000

for (idx,i) in enumerate(range(int(len(bar)/nbulk))):
    nstart = idx*nbulk
    nend = nstart+nbulk-1
    
    matchNodes(bar[nstart:nend])
    
    dur = (datetime.now() - start).total_seconds()
    perf = int(nend/dur)
    est = datetime.now() + timedelta(seconds=int((len(bar)-nend)/perf))
    print("{0} nodes processed({1} ids per sec, est comp {2})".format(nend, perf, est))
nstart = (idx+1)*nbulk

matchNodes(bar[nstart:])

APOC is new world for me, so I'll learn later... Anyway, thanks again !

MK


(Michael Hunger) #4

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.


(M Kiuchi) #5

It looks clean and easy to use ;-). Thanks !