Interesting problem that I worked out: Movies with costar of "Tom" but not including "Tom"

Some Cypher newbies might find this interesting.

I was curious to find Movies and Stars that had a co-star with the name that starts with "Tom" but not return Tom.

Of course, with any large enough DB, you're going to get many results, so it's good to restrict your trial and error to a small subset. It makes it easier to debug and to see what's going on. So, I picked the Movie"Top Gun" which has two Tom's in it. (I anticipated having two Tom's would be trouble!)

My first stab is:

// does not return Tom Skerritt and Tom Cruise
MATCH(p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})<-[:ACTED_IN]-(p2:Person ) 
   WHERE   p1.name STARTS WITH 'Tom'
  AND  NOT p2.name STARTS WITH 'Tom'
RETURN  m, p2

This is unsatisfying to me because I have to repeat the condition Person:name STARTS WITH 'Tom' (a second time with a negation). There's always the possibility that you might make a copy and paste error OR if the condition changes and you wanted a different first name, then you have to make two edits to change one condition. This violates the DRY principle (Don't Repeat Yourself.)

I thought if I had the id of the Tom's, then I could test for equality of id's.

I suspect that following wouldn't work (and I was right!). The hope is the id of Tom Skerritt won't match id of Tom Cruise so that it Tom's won't show up in the results. However, I suspected this wouldn't work: id(p1) <> id(p2).

// Didn't work: returns Tom Skerritt and Tom Cruise
MATCH(p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})<-[:ACTED_IN]-(p2:Person ) 
WHERE  p1.name STARTS WITH 'Tom'
  AND  id(p1) <> id(p2)
RETURN  m, p2

This is the funny thing about Database Declarative Query languages... conceptually things happen in "parallel": e.g. once through for p1= Tom Skerritt and once p1 = Tom Cruise. This can be confusing for people used to Procedural languages (such as Python or Java.)

So, to make this work, you need to first collect all the id's of all the Tom, then second test each actor, to see if their id is in the collection.

So, now this occurs in two steps:

MATCH (p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})
WHERE p1.name STARTS WITH 'Tom'
RETURN collect(id(p1))

Another conceptual hurdle with using Cypher better is to understand the WITH statement. This allows you to have multiple steps in your Cypher. So, after getting the persons whose names start with Tom, you can engage in the second step by carrying the variables forward to the next Cypher statement. You also need WITH m otherwise, the match on Movies won't be carried forward to the next Cypher statement.

MATCH (p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})
   WHERE p1.name STARTS WITH 'Tom'
WITH COLLECT(id(p1)) AS tomList, m
MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE NOT id(p2)  IN tomList
 RETURN m, p2

Now that I got this to work for Top Gun, then generalizing it to all movies in the DB, I remove the restriction of the Movie title:

MATCH (p1:Person)-[:ACTED_IN]->(m:Movie)
   WHERE p1.name STARTS WITH 'Tom'
WITH COLLECT(id(p1)) AS tomList, m
MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE NOT id(p2)  IN tomList
 RETURN m, p2

I hope this enlightens.

1 Like

Great walk through...thank you! The use of WITH is really great, particularly when you have a large amount of data that would otherwise be sent to the next step, potentially slowing things down. Great example!

1 Like