Poor performance UNWINDing 2 collections to find possible node relationships

Here's the basic schema:


I have network devices discovered with a network discovery tool. The tenant is nested to satisfy child/sub locations for a client
Our CRM (IT asset management) stores assets as :Crmasset

We are building missing assets within our CRM by using an API into our CRM system when we discover a new device on their network.

I'm using Neo4j to try to track which network discovered items have had assets created, and which are still missing (or orphaned)

I use a number of methods to tie assets to devices (MAC address, serial number,name etc) within a tenants space.

I'm writting a query to find :Auvikdevice nodes without a relationship to :Crmasset, and then find possible :Crmasset nodes within the same company in order to attempt a match (in the example below, using MAC address).
The largest qty for a combination of child tenant to Crmasset list is only 571 Crmassets and 882 Auvikdevice nodes, and there are only 54 total child tenants we are examining.
I'm perplexed on why it takes over a minute to run this, and I'm trying to discover why it's so slow:

MATCH (at:Auviktenant)--(:Auviktenant)--(a:Company) where at.adcount >0
WITH collect(distinct a) as companies
UNWIND companies as a
MATCH (a)--(ca:Crmasset) where exists(ca.mac)
WITH a,collect(distinct ca) as assets
MATCH (at:Auviktenant)--()--(a) where at.adcount >0
WITH assets,a,collect(distinct at) as tenants
UNWIND tenants as at
MATCH (ad:Auvikdevice)--(at)--(:Auviktenant)--(a) where not (ad)--(:Crmasset) and exists(ad.macaddress)
UNWIND assets as ca
WITH ca,ad,"[^a-zA-Z\\d]" as regex
WITH ca,ad,toLower(apoc.text.replace(ca.mac, regex,"")) as camac,toLower(apoc.text.replace(ad.macaddress, regex,"")) as admac
WITH ca,ad where camac=admac
RETURN ad.id,ca.name order by ad.name;

Hi @pdrangeid, did you saw the execution plan?

Please upload for us to understand a litle bit more about your query.

Also did you saw this link: https://neo4j.com/docs/cypher-manual/current/execution-plans/ ?

wasn't sure of the best/easiest way to post the "EXPLAIN" results on the forum, so here's goes with a png attachment:

Did you have considered to create a index, expecialy where you use distinct?

Also try no to use toLower or replace inside this query.

I have indexes on these properties, but the string parsing made a substantial difference. Unfortunately to test this I had to first make some mods to all the datasource inputs. (basically I performed the toLower and regex filtering as I create the nodes now, instead of when trying to compare them.


1 Like