How to best use MATCH across 5 different node labels to collect in a list unique values found on all 5 nodes with the same property label? I'm writing a somewhat functional query here but it's getting quite long so I don't know if I'm taking the right approach.
As a hypothetical scenario, I have these 5 nodes with these uniqueness constraints:
And all of these 5 nodes also contain a business_id as an additional node property for example. I created indexes on these 5 node properties too for faster search performance. Indexes were created as follows:
CREATE INDEX ActorBusID FOR (a:Actor) ON (a.bus_id);
CREATE INDEX DirectorBusID FOR (d:Director) ON (d.bus_id);
CREATE INDEX ProducerBusID FOR (p:Producer) ON (p.bus_id);
CREATE INDEX WriterBusID FOR (w:Writer) ON (w.bus_id);
CREATE INDEX DesignerBusID FOR (dg:Desginer) ON (dg.bus_id);
This is what I'm trying:
MATCH(a:Actor)
WITH a.bus_id AS actorids
ORDER BY actorids ASC
MATCH(d:Director)
WITH d.bus_id AS directorids
ORDER BY directorids ASC
MATCH(p:Producer)
WITH p.bus_id AS producerids
ORDER BY producerids ASC
MATCH(w:Writer)
WITH w.bus_id AS writerids
ORDER BY writerids ASC
MATCH(dg:Desginer)
WITH dg.bus_id AS designerids
ORDER BY designerids ASC
RETURN collect( DISTINCT [actorids, directorids, producerids, writerids, designerids]) AS `Business_IDs`
I'm hoping to return a list of ordered and ascending unique values
My end result should be something like:
Business_IDs
[1, 2, 3, 4, 5]
It seems to me like the query I wrote is super long and it could be a lot shorter or simpler.
Try this:
MATCH(a:Actor)
WITH collect(distinct a.bus_id) AS actorids
MATCH(d:Director)
WITH collect(distinct d.bus_id) AS directorids, actorids
with apoc.coll.sort(apoc.coll.union(actorids, directorids)) as actdir
MATCH(p:Producer)
WITH collect(distinct p.bus_id) AS producerids, actdir
with apoc.coll.sort(apoc.coll.union(actdir, producerids)) as actdirpd
MATCH(w:Writer)
WITH Collect(distinct w.bus_id) AS writerids, actdirpd
with apoc.coll.sort(apoc.coll.union(actdirpd, writerids)) as actdirpdwr
MATCH(dg:Desginer)
WITH collect(distinct dg.bus_id) AS designerids, actdirpdwr
with apoc.coll.sort(apoc.coll.union(actdirpdwr, designerids)) as final
RETURN final
The simplest approach would be to come up with some label that could be applied to all of these nodes, such as :BusinessEntity or something, and add the label to all of those (remember nodes can be multi-labeled) as well as create a corresponding index (which might replace the individual indexes on bus_id on the other labels).
Then your query simplifies to:
MATCH (n:BusinessEntity)
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
And if you don't want to do that, then you should be able to get Cypher to perform unions of each label scan under the hood with something like:
MATCH (n)
WHERE n:Actor OR n:Director OR n:Producer OR n:Writer OR n:Designer
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
Use an EXPLAIN of the plan to verify this. You should see a series of branching at the start, with NodeByLabelScan and Union operations. You do NOT want to see AllNodesScan operators.
@ameyasoft Thank you for your reply! I did not know about those two APOC functions. I'll give those a try.
@andrew.bowman
Oh ok, yeah I can definitely add another label to those 5 nodes. These 5 nodes get updated on a daily basis so adding a new label to these nodes when they get updated shouldn't be a problem.
Thank you Neo4j community! I think this is what I was looking for.
Just to make sure, when I create this index on a multi-labeled node, is this the correct syntax so that the index is applied to all of the bus_id on all 5 nodes?
CREATE INDEX BusinessIDs FOR (x:BusinessEntity) ON (x.bus_id);
Then, will this index apply to all of the 5 bus_id properties on each node when I run MATCH(n:BusinessEntity)?
If I'm reading the documentation correctly, it seems like Indexes on multi-labeled nodes are not allowed. No?
A named index on a single property for all nodes that have a particular label can be created with CREATE INDEX index_name FOR (n:Label) ON (n.property) .
I'm trying this out right now, and will report my findings here.