I have a bi-modal data set similar to the movies database. For the sake of analogy, I'm trying to run metrics on the movies based on the people who acted in the movie. I want to know the number of movies at variable path lengths based on a specific node property. For the analogy we can use genre.
Nodes have the following labels and properties:
Movie:
title: 'Serenity'
genre: 'Sci-fi'
Actor:
name: 'Nathan Filian'
role: 'Malcom Reynolds'
// is_hub: 1
is only recorded for actors with over 800 movie credits
Director:
name: 'Joss Whedon'
Writer:
name: 'Joss Whedon'
I can get an overall count with the following:
MATCH (movie:Movie {title: 'Serenity'})-[:ACTED_IN*2]-(m2:Movie {genre: 'action'})
RETURN movie, count(distinct m2) as movie_count
However, because some actors have over 800 movie credits I want to remove them from the search, due to delays in traversing at farther steps ( -[*4]- or -[*6]- ) and artificially inflating the movie count.
I initially converted the query to:
MATCH (movie:Movie {title: 'Serenity'})--(a:Actor)--(m2:Movie {genre: 'action'})
WHERE a.is_hub is null
RETURN movie, count(distinct m2)
But this gets unwieldy when I have to specify '.is_hub is null' at farther step lengths for every node in between the start and stop nodes (MATCH (movie:Movie {title: 'Serenity'})--(a:Actor)--(:Movie)--(a2:Actor)--(:Movie)--(a3:Actor)--(m2:Movie {genre: 'action'})
I was considering something along the lines of:
MATCH (movie:Movie {title: 'Serenity'})-[:ACTED_IN*6]-(m2:Movie {genre: 'action'})
WHERE NOT (movie)--({is_hub: 1)--(m2)
RETURN movie, count(distinct m2) as movie3_count
But due to the properties of null, just because it's null doesn't mean it isn't 1. So 'WHERE NOT (movie)--({is_hub: 1)--(m2)' doesn't actually filter anything.
Is there an easier way to do this?