Weekly Challenge #1: Movies of "Life"

Let's see what we can learn about movies for the first weekly challenge. Specifically, movies that contain "Life" in the title. Is there a genre or director you think came up with the best movie on this list? Let us know!

Here is what we want you to do:

Create a recommendations sandbox at sandbox.neo4j.com.
For this sandbox, write a query that returns one row for every movie in the graph that contains "Life" in its title.
Use a parameter named testString with a value of "Life" for the predicate.

The rows will be returned in movie year order.
Each row will contain:

  • Movie (title of the movie)
  • Year (the year the movie was released)
  • Genres (list of genre names for the movie)
  • Directors (list of director names for the movie)

Please post your solution code in your response and a screenshot of the solution results. We will be giving away a T-shirt to one of the submissions between 7-29-2022 and 8-5-2022, submit your solution to be included in the drawing!

@Cobra it is a great query. The solution is one where the elapsed ms goes down to as low as 26 ms

Hello @TrevorS :blush:

Here is my query:

:param testString => 'Life';

MATCH (m:Movie) 
WHERE m.title CONTAINS $testString 
RETURN m.title AS title, m.year AS year, [(m)-[:IN_GENRE]->(g) | g.name] AS genres, [(m)<-[:DIRECTED]-(d) | d.name] AS directors 
ORDER BY year;

Here is the screenshot:

Weekly Challenge #1.png

Regards,
Cobra

I was able to reduce to 18ms with this query:

MATCH (m:Movie) 
WHERE m.title CONTAINS $testString 
RETURN m.title AS title, m.year AS year, [(m)-[:IN_GENRE]->(g) | g.name] AS genres, [(m)<-[:DIRECTED]-(d:Director) | d.name] AS directors 
ORDER BY year;

Cobra_0-1659977148844.png

Do you have db hits info?

Regards,
Cobra

I see, do you have any hints?

Your query returns 81 rows because there is a row with no directors. I can get your query down to 23 ms. Total db hits of 2518.

The solution I have has 17ms with total db hits of 2499.
Did you add any indexes to the graph? You should not.

Here is my new query:

MATCH (m:Movie) 
WHERE m.title CONTAINS $testString 
WITH m 
MATCH (m)<-[:DIRECTED]-(d) 
WITH m, collect(d.name) AS directors 
MATCH (m)-[:IN_GENRE]->(g) 
RETURN m.title AS title, m.year AS year, directors, collect(g.name) AS genres
ORDER BY year 

Cobra_1-1660036754940.png

Regards,
Cobra

Great job @Cobra !

Your query performed as well as the solution that we had which is:

PROFILE
MATCH (movie:Movie)-[:IN_GENRE]->(g)
WHERE movie.title CONTAINS $testString
WITH movie, collect(g.name) AS genres
MATCH (movie)<-[:DIRECTED]-(director)
WITH movie, genres, collect(director.name) AS directors
RETURN movie.title AS Movie, movie.year AS Year, genres AS Genres, directors AS Directors ORDER BY movie.year

The query must return movies who have at least 1 director and 1 genre?

What is the expected number of rows for the query: 80, 81 or other?

No, I didn't add any indexes, I just relaunched multiple times the query :blush:

Hi!

I just gave this query a go and while I got an answer quickly I was struggling to understand why I was getting higher DB hits than both of you (not by much though). I widdled it down to this specific difference in the example below

PROFILE

MATCH (m)-[gi:IN_GENRE]->**(**g:Genre)

WHERE m.title CONTAINS $testString

WITH m, collect(g.name) AS genres

RETURN m.title AS Title

Using just (g) vs (g:Genre) results in fewer DB hits (1325 vs 1482). I would have expected the answers to be the same as there aren't any [:IN_GENRE] relationships with anything other than a genre right? I tried to validate this assumption by using the following where I replaced (g:Genre) with (g), both returned the same 20340 count but using (g) results in 228475 db hits where (g:Genre) results in 20400. I'm now left even more confused because this would imply g:genre is the more efficient method, so why is it worse in the weekly challenge query?

Appreciate any insight!
Thanks,
Josh

MATCH (m)-[:IN_GENRE]->(g:Genre)

RETURN count(g)