Negate Variable Length Path Query

rcfro2 · October 1, 2018, 8:10pm

First, thank you so much for going back on fourth on this. And I have asked other people at Neo and I think I'm getting conflicting information so I'm a bit confused. But that's neither here nor there.

The real problem is this - you find the path to be TRUE in one case, and while you say the identify of the node doesnt matter, but it is because of the identify of a specific node which identifies a single path i.e. Charlize ---- That Thing You Do ---- Tom Hanks, that if evaluated against
(p) - [:ACTED_IN] -> () <- [:ACTED_IN] - (x)
it is true. So it is odd that the then when evaluated against a different path, a COMPLETELY different path such as the one where you have
Charlize --- The Devil's Advocate, and nothing for Tom which is what P refers to, that it still removes the row. The hole path includes at least Tom, Charlize and some intermediate node. In the case of Charlize and The Devils Advocate, you dont have Tom...

Again apologies and many thanks.

andrew_bowman · October 1, 2018, 8:20pm

More than likely it's not necessarily conflicting information, but alternate approaches. Graph databases, and Cypher, allow multiple ways to do what you want to do. I've mentioned a couple approaches already. It's very important to be clear about what it is you want to do and what output you are trying to capture, because small changes to patterns (such as the presence or lack of a variable in a pattern) can completely change the meaning and behavior of the query. It's important to be precise.

So to address this, at the time that I added the results returning the name of x and the title of m, my goal was to show that m is an element of each row, and so Charlize shows up twice (once per movie that she acted in, because you matched out to movies where x acted in m). It was an example to demonstrate that, and the return explicitly only returned x's name and m's title. That was just to show a bit of what's going on under the hood, and p wasn't an interesting part of what I wanted to show.

But let's look at the original query again:

match (p:Person{name:'Tom Hanks'}) - [:HAS_CONTACT*2] -> (x:Person),
(x) - [:ACTED_IN] -> (m)
where not (p) - [:ACTED_IN] -> () <- [:ACTED_IN] - (x) and p<>x
return distinct x.name

At the point right after the match, but before the WHERE clause filtering, p, x, and m are in scope, and are the fields per row. So p is present, (as the scope of variables per row hasn't been changed at that point), and is Tom Hanks. You can see this explicitly if I RETURN early and project out the names and titles of the nodes involved:

match (p:Person{name:'Tom Hanks'}) - [:HAS_CONTACT*2] -> (x:Person),
(x) - [:ACTED_IN] -> (m)
return p.name, x.name, m.title

rcfro2 · October 4, 2018, 4:07am

Ok. So...I think I get it but it is poorly worded.

The first query (where not (p)-:ACTED_IN->(m) ) actually checks against a specific movie with respect to P, if that path exists, you remove the row.

The second query check to see with respect to the actor, if P and X, for example, have acted together - hence why you said the movie doesnt matter. So really, this is much easier to see if you return a list of actors, and collect(m.titles). AT any rate, it checks against the actor, and if that actor has ever acted with this other actor, it removes the row, hence why both movies get removed because the actor no longer exists. Right?

andrew_bowman · October 4, 2018, 7:27am

I think you're almost there.

On the first section, you are correct, because p and m are in the pattern you're checking for, the path must include both of those nodes.

The second query doesn't use m at all, so the only pattern looked for is one involving p and m, and whether there is some path that fits the pattern where they have acted in some thing together.

For collections, I do find it easier to work with, and when the collection is small, this can be very efficient. You're referring to this example query I provided above, right?

match (p:Person{name:'Tom Hanks'})-[:ACTED_IN*2]-(coactor)
with p, collect(distinct coactor) as coactors
match (p) - [:HAS_CONTACT*2] -> (x:Person)
where not x in coactors and p<>x and (x)-[:ACTED_IN]->()
return distinct x.name

(I added in the and (x)-[:ACTED_IN]->() to ensure the contacts we consider are actors, otherwise non-actors could be recommended)

Conceptually I think this works better. We find Tom's coactors, and since we collected it we can treat it as a collection (if we never aggregate we can't do that, that's one of the things that tripped you up before). At that point it is very easy to check if a potential recommendation was a coactor or not, we just see if they are in that coactor collection, and if they are the row gets eliminated.

This would apply even if we added in the additional part of the pattern from the coactor to movies they've acted in (it doesn't make sense to do this, of course, as this isn't needed to determine whether the two persons are coactors, but I want to make it clear that the entire row gets eliminated, no matter what m is):

match (p:Person{name:'Tom Hanks'})-[:ACTED_IN*2]-(coactor)
with p, collect(distinct coactor) as coactors
match (p) - [:HAS_CONTACT*2] -> (x:Person)-[:ACTED_IN]->(m)
where not x in coactors and p<>x
return distinct x.name

rcfro2 · October 4, 2018, 8:04am

So was I missing something? Again I think this is easier if it's clear that from the beginning the second query is really asking if Tom and Charlize have ever worked with one another, if so, dont include Charlize Therone (which is why both the rows that mentioned her in previous query are removed - it's not about the movie, it's about whether these two actors have worked together). I think that was the major conceptual issue at the beginning.

rcfro2 · October 5, 2018, 8:52pm

Also if we replace the specified relationship, with none, does it iterate through the various relationships?

I mean -
match (p:Person{name:'Tom Hanks'}) - [:HAS_CONTACT*2] -> (x:Person),
(x) - [:ACTED_IN] -> (m)
where not (p) - [] -> () <- [:ACTED_IN] - (x) and p<>x
return distinct x.name

It seems that it should if it truly considers all relationships paths to some other node where a node X has acted in it

andrew_bowman · October 6, 2018, 2:25am

Yes, I think you've got it.

The overall idea is that operations produce rows of results, and WHERE clauses operate on each row. The predicates in the WHERE clauses may involve all, some, or even none of the variables for those rows, but in any case if the WHERE clause is false, then the row is filtered out. Also (and I think this may help out for you) any path predicate is independent of the path used in the previous MATCH. Maybe you were under the impression that a path predicate could only operate on the path from the MATCH? If so that misunderstanding could easily explain your line of reasoning.

As for your follow-up question about omitting the relationship type, you're correct, it would expand out to all outgoing relationships from p regardless of type and see if there's any path that matches. You can even use a shorthand of (p)-->() to indicate a relationship when you don't care about the type or properties, and don't need to bind a variable to it.

rcfro2 · October 6, 2018, 7:38pm

Ah perfect. Wow. Many thanks. I know this has been a ton of back and fourth but it was supremely helpful and very much appreciated! I look forward to exploring graphs more :)

Topic		Replies	Views
Does this mean the 101 demo dataset needs updating? Random: Challenges, Polls, Fun Banter	1	845	October 24, 2019
Help on query Introduce-Yourself	2	259	May 23, 2022
Graph Academy 2MATCH patterns Graph Academy cypher	2	455	April 27, 2020
Negation of Relationship Cypher	8	3340	February 2, 2021
Weekly Challenge #3 Lots of directors Archive migrated	5	322	August 17, 2022

Negate Variable Length Path Query

Related topics