Hi,
Personal background: I'm mostly a front-end React dev, but I started using Neo4j on a personal project. I have a basic understanding of SQL/Cypher syntax but largely ignorant about nuances of perf implications, especially with Cypher.
I have an app that uses movie data and I want to show the cast for a movie, as well as any other roles the actors from the current movie portrayed in other titles that the current user has watched. So in this example I've seen 4 movies, The Matrix, Memento, John Wick 1 & 2. I have two general approaches for the query:
First approach: "Root" query (without COLLECT subqueries)
MATCH (user:User)
MATCH (title {id:"603", type: "movie"})<-[:IN_TITLE]-(credit:Credit)<-[:PORTRAYED]-(person:Person)
MATCH (character:Character)-[:WAS_PORTRAYED]->(credit)
OPTIONAL MATCH (user)-[:WATCHED]->(relatedTitle)<-[:IN_TITLE]-(relatedCredit:Credit)<-[:PORTRAYED]-(person)
WHERE relatedTitle <> title
OPTIONAL MATCH (relatedCharacter:Character)-[:WAS_PORTRAYED]->(relatedCredit)
RETURN title, credit, person, character, relatedTitle, relatedCredit, relatedCharacter
With the "root" query, I get a record for each related role a person had, and each record includes the title (even though the title
is the same on all records)
Response example:
Each row has "The Matrix" repeated, which is redundant. Also, since Keanu Reeves has more than one related role, "Keanu Reeves" is repeated once as well. However, writing queries this way seems a lot more "natural", i.e. it feels more like the examples in the docs and it feels more like how Cypher is intended to be used.
Second approach: "Nested" query (with COLLECT subqueries)
MATCH (user:User)
MATCH (title {id:"603", type: "movie"})
WITH *, COLLECT {
MATCH (title)<-[:IN_TITLE]-(credit:Credit)<-[:PORTRAYED]-(person:Person)
MATCH (character:Character)-[:WAS_PORTRAYED]->(credit)
WITH *, COLLECT {
MATCH (user)-[:WATCHED]->(relatedTitle)<-[:IN_TITLE]-(relatedCredit:Credit)<-[:PORTRAYED]-(person)
WHERE relatedTitle <> title
MATCH (relatedCharacter:Character)-[:WAS_PORTRAYED]->(relatedCredit)
RETURN {title: relatedTitle, credit: relatedCredit, character: relatedCharacter}
ORDER BY relatedTitle.popularity DESC
} as relatedRoles
RETURN {credit: credit,character: character, person: person, relatedRoles: relatedRoles}
ORDER BY credit.order ASC
} as cast
RETURN title, cast
The "nested" approach doesn't have this duplication though. It returns 1 record, which has a property for title and cast, which each have properties for related roles (or empty array if none are found). There is no redundant data, but the query looks weird(?) and I didn't see any examples like this in the docs.
Response example:
{
"title": "The Matrix",
"cast": [
{
"person": "Keanu Reeves",
"character": "Thomas A. Anderson / Neo",
"relatedRoles": [
{
"title": "John Wick",
"character": "John Wick"
},
{
"title": "John Wick: Chapter 2",
"character": "John Wick"
}
]
},
{
"person": "Laurence Fishburne",
"character": "Morpheus",
"relatedRoles": [
{
"title": "John Wick: Chapter 2",
"character": "Bowery King"
}
]
}
]
}
This is a lot easier to consume for my application as the shape is very similar to what I ultimately want to send to the UI, as opposed to having to iterate through all the records in the root approach and build the response object up that way. They both appear to be roughly the same performance. The subquery one seems more complicated to read, but easier to consume.
I know there might not be a "right way" but I'm curious if any of this is a code smell or if I should be thinking about these problems differently. Am I doing something weird?