Which code is optimized and efficient in Neo4j?
A) MATCH(tom:Person{name:"Tom Hanks"})-[:ACTED_IN]->(mov:Movie)<-[:ACTED_IN]-(co:Person)
RETURN co.name,mov.title
B) MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name
I believe that B will be more efficient. We know that all :ACTED_IN relationships must point to Movie nodes so there is no need to use it in the pattern. Also if you don't need to return the title, that is less property access that needs to occur.
The real answer to your question, however is based upon the data you are querying. And of course the definitive answer is to prepend these queries with PROFILE which will give you the answer. Make sure, however that you take the send run or PROFILE for each of them as the first run of PROFILE needs to compile the query into the query cache.
All of this is covered in the course, Cypher Query Tuning in Neo4j 4.x.
Yes, we know about it, however does machine also knows in the smart way that :ACTED_IN must point to Movie only? If not and then if machine scans to find the relation for all nodes then it will be overhead.
Thanks for sharing the option to PROFILE the queries. I will check it.
shows 265 total db hits in 120 ms the first time the query is run after restarting the DB.
B) PROFILE MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name
shows 214 total db hits in 82 ms the first time the query is run after restarting the DB
The profile pictures shows why (A on the left, B on the right.) The query with the extra Labels (Movie and Person for co actors) have two extra Filter statements, which in the Movie DB are unneeded because ACTED_IN relationship always starts with a Person and ends with a Movie. That might not be true in general.
I will note that having the extra Labels helps a person understand the query better.
I do wonder if this is an opportunity for optimization: could Neo4J keep track of the Labels associated with a Relationships, and not bother to filter if there is only one type of Label for either the in or out of the relationship.
It was nice to see the explanation about the difference. Thanks!
However, will curious to know the results in a huge Graphdb having multiple relations etc.