Difference between relationship negation vs node negation


(Rcfro2) #1

I have these 2 queries:

This one below, I negate two nodes from being equal:

MATCH (tom:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coActor:Person),
         (coActor)-[:ACTED_IN]->()<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks" 
        AND NOT (tom)-[:ACTED_IN]->()-[:ACTED_IN]-(coCoActor) 
        and tom<>coCoActor
RETURN distinct coCoActor.name
order by coCoActor.name desc
limit 3

and this one, where I negate/filter on a relationship

MATCH (tom:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coActor:Person),
         (coActor)-[:ACTED_IN]->(l)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks" 
        AND NOT (tom)-[:ACTED_IN]->()<-[:ACTED_IN]-(coCoActor) and not (tom)-[:ACTED_IN]->(l)
RETURN distinct coCoActor.name
order by coCoActor.name desc
limit 3

which both return the same results. I am wondering if they are fundamentally different either at an architectural level or in some other way ?

Many thanks,


(Andrew Bowman) #2

These queries use slightly different logic to arrive at the same results.

In the first query, the predicate explicitly says coCoActor cannot be Tom.

In the second query, because you ensure Tom can't have acted in l, it is impossible for coCoActor to be Tom, since your pattern requires that coCoActor acted in l.

This said, the operations used by each query is different, and one is clearly better than the other. If you run a PROFILE on both queries, you'll see that more query hits are needed in the second query to check for the match between Tom and l, as opposed to the first query where all that's needed is a node comparison between the already-matched tom and coCoActor (and this is done under-the-hood by comparing their graph ids, which is very quick).


(Rcfro2) #3

Ah ok. So I realize the logic is different, but it is substantially different such that the results would ever be different, given the Movie database structure? The reason for either of those predicates is to prevent Tom Hanks from being the coCoActor. In short, this result is not coincidentally the same, but will always be essentially true - you can negate a relationship in place of a node negation though it's not advised, right?

And the second part about being more efficient is exactly what I was hoping for though I could/should have used the PROFILE on both queries as you mentioned to discover that.


(Andrew Bowman) #4

I don't believe there would be a case or graph data such that the queries would yield different results.

That said, there is a similar question (which does not apply for the queries we've been discussing) in which a subtle difference can yield different results, and that's with specifying that Tom didn't coact with coCoActor.

Take the first query as our control query (we know it produces correct answers) and look at this new but similar (and incorrect) query:

MATCH (tom:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coActor:Person),
         (coActor)-[:ACTED_IN]->(l)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks" 
        AND NOT (tom)-[:ACTED_IN]->(l)-[:ACTED_IN]-(coCoActor) 
        and tom<>coCoActor
RETURN distinct coCoActor.name
order by coCoActor.name desc
limit 3

We use l here for the last movie, but the real difference is that we've used the l in what we believe to be the coactor pattern between Tom and coCoActor: NOT (tom)-[:ACTED_IN]->(l)-[:ACTED_IN]-(coCoActor).

But this predicate only ensures that Tom and coCoActor didn't act in a specific movie l which fit the above pattern, there could be a different movie that coCoActor and Tom coacted in which isn't the l movie being considered.


(Rcfro2) #5

Ah yes, this makes a huge difference! As you correctly stated, there might be other movies that coCoActor and Tom acted in together which the coActor did not and therefore would not produce the same results. We are trying to ensure that Tom and coCoActor did not act in ANY movie together :).

Thank you