Run MATCH query for multi core machine

m-kiuchi · September 23, 2018, 11:16am

Hi, comms.

I have 8 core and 32GBMEM machine and going to run MATCH query as follows, but this query consumes only 1 core and takes long time.

bar = df.to_dict(orient='records') #df is Pandas dataframe and have 1M rows
with n4jses.begin_transaction() as tx:
    result = tx.run("""UNWIND {bar} as d
                       MATCH (a:AD_ID) WHERE a.adid = d.Ad_id RETURN a.adid""",
                    parameters={'bar': bar})
    print(list(result))

Is there any way to run them in parallel ?

Regards,
MK

stefan.armbruster · September 23, 2018, 7:35pm

That by design that a Cypher query runs on one single CPU. You can either split up work into multiple cypher statements on client side or use some parallel execution procedures from the apoc library, see Neo4j APOC Procedures User Guide.

m-kiuchi · September 23, 2018, 11:17pm

Woa ! Thanks much ! I divided source dataset and my query works fine (like this).

def matchNodes(pbar):
    with n4jses.begin_transaction() as tx:
        tx.run("""UNWIND {bar} as d
                  MATCH (a:AD_ID) WHERE a.adid = d.Ad_id""",
                parameters={'bar': pbar})

start=datetime.now()
print(len(bar))
nbulk=5000

for (idx,i) in enumerate(range(int(len(bar)/nbulk))):
    nstart = idx*nbulk
    nend = nstart+nbulk-1
    
    matchNodes(bar[nstart:nend])
    
    dur = (datetime.now() - start).total_seconds()
    perf = int(nend/dur)
    est = datetime.now() + timedelta(seconds=int((len(bar)-nend)/perf))
    print("{0} nodes processed({1} ids per sec, est comp {2})".format(nend, perf, est))
nstart = (idx+1)*nbulk

matchNodes(bar[nstart:])

APOC is new world for me, so I'll learn later... Anyway, thanks again !

MK

michael.hunger · September 24, 2018, 6:48pm

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.

m-kiuchi · September 25, 2018, 12:37am

It looks clean and easy to use ;-). Thanks !

Topic		Replies	Views
Optimize Neo4j cypher query on huge dataset Cypher optimization , performance , cypher , neo4j	3	369	December 20, 2021
How to fetch data on from data base having around 60 million node of one type? Cypher performance , cypher	4	1152	November 12, 2019
Parallel execution of single cypher query Cypher apoc , performance , cypher	8	415	June 6, 2023
Best way to run multi-statement cypher query Cypher	9	11003	January 16, 2019
Parallel Cypher & Apoc Cypher apoc , cypher	8	3932	June 19, 2019

Run MATCH query for multi core machine

Related topics