cancel
Showing results for 
Search instead for 
Did you mean: 

Remove isolates within subgraph, only return connected component

julie_fisher
Node Clone

I have a graph that is essentially bimodal, like the marvel database - Heroes connected to Comics and vice versa. No connections can be created between Heroes nor from Comic to Comic.

I'm working on a graph visualization and trying to remove noise. I have a list of heroes and want to return all the comics they're in. However, I only want the ones where the hero is connected via a comic to another hero within the list.

My solution thus far has been a pattern match with two where clauses and an OPTIONAL MATCH:

MATCH (c:Comic)-[:APPEARED_IN]-(:Hero)-[:APPEARED_IN]-(c1:Comic)
WHERE c.name in ["AVF 5", "COC 1", "H2 251", "H2 252", "M/PRM 35", "M/PRM 36", "M/PRM 37", "WI? 9", "AVF 4", "CM 51", "Q 14", "Q 16", "T 208", "T 214", "T 215", "T 216", "T 440", "CA3 36", "SLEEP 19", "SLEEP 1", "SLEEP 2", "W2 52", "W2 53", "XFOR 108", "XFOR 109"]
AND c1.name in ["AVF 5", "COC 1", "H2 251", "H2 252", "M/PRM 35", "M/PRM 36", "M/PRM 37", "WI? 9", "AVF 4", "CM 51", "Q 14", "Q 16", "T 208", "T 214", "T 215", "T 216", "T 440", "CA3 36", "SLEEP 19", "SLEEP 1", "SLEEP 2", "W2 52", "W2 53", "XFOR 108", "XFOR 109"]
AND id(c) <> id(c1) 
WITH distinct c
OPTIONAL MATCH (c)-[r:APPEARED_IN]-(h:Hero)
RETURN c, r, h

This has worked with a list of around 100. However, I tried a list of over 2,000 and killed the connection.

Is there a more elegant, optimized way to do this?

The profile plan looks like this:

3 REPLIES 3

Just to note, the PROFILE plan won't be of much use unless you expand all elements of the plan so we can see the details of each operation.

In addition, your query doesn't seem to have a list of heroes but a list of comics. If you already have the comics you're interested in, then you only need half the pattern: MATCH (c:Comic)-[:APPEARED_IN]-(h:Hero) WHERE c.name in ... which will avoid the cartesian product of heroes appearing in the comics from your previous pattern.

Sorry Andrew, I'm using the Marvel data set because it's public and initially assumed I could do it Hero to Hero. When I was writing the example a hero was connected to too many comics so I had to flip the example. I'm still going for the same concept.

The example code finds comics in a list of names where there is a connection to another comic in the list via a hero, then returns those comics and all heroes that appear in the comic.

So if we had the below graph (blue is a comic, beige is a hero), we'd be trying to exclude 9 and 10 because they're not connected to any of the other comics via a Hero. Then we'd want to return all Heroes that showed up in each of the comics.

2X_d_ddc355c2a09495e4b6530fd37a3dd0c7acc37815.png

Here's the expanded PROFILE plan.

Got it.

In that case we can use the approach I suggested, collecting the comics in question and using that list as a filter:

WITH ["AVF 5", "COC 1", "H2 251", "H2 252", "M/PRM 35", "M/PRM 36", "M/PRM 37", "WI? 9", "AVF 4", "CM 51", "Q 14", "Q 16", "T 208", "T 214", "T 215", "T 216", "T 440", "CA3 36", "SLEEP 19", "SLEEP 1", "SLEEP 2", "W2 52", "W2 53", "XFOR 108", "XFOR 109"] as comicNames // use parameters if possible
MATCH (c:Comic)
WHERE c.name IN comicNames
WITH collect(c) as comics
UNWIND comics as c
MATCH (c)-[r:APPEARED_IN]-(h:Hero)-[:APPEARED_IN]-(c2)
WHERE c2 IN comics
RETURN DISTINCT c, r, h