Sure.
Map projection first, that should be easiest.
Map Projection
A map projection is a custom projection from either a map or node properties. Here's an example:
MATCH (p:Person {name:'Keanu Reeves'})-[:ACTED_IN]->(m:Movie)
WITH p, collect(m) as movies
RETURN p {.name, movies, whoa:true} as Keanu
Map projection is used in the RETURN so we get a custom map consisting of a mix of node properties and other values. We project out a map consisting of keanu's name
, and then the movies
collection from earlier (the key for this in the map will be the variable name 'movies', and the value will be the variable's value), and a custom property whoa
which we set to true.
Map projections allow us to customize the map, using a custom projection. More examples are in the previous documentation.
Node Lookups
We need to lookup starting nodes in the graph before we start traversing and looking for matching patterns.
There are 4 main categories of node lookups:
-
Lookup a node by id. This is the fastest lookup, but it requires that you know the graph id of the node (or relationship). We don't recommend storing ids outside of Neo4j for later lookup, since you might not know if a node has been deleted and recreated, and if so the id may not point to anything (because the node was deleted), or it might point to a whole different node (since we reuse ids post-deletion). It's uncommon to do these lookups.
-
AllNodesScan. When no label is available in the MATCH pattern, then the planner has no choice but to scan through every node in the graph, performing property access (if properties are present in the pattern) to find the nodes that meet what you're looking for. For example, MATCH (n {name:'Keanu Reeves'})
. No label is present, so every node in the graph must be checked. This lookup is the slowest available, since this lookup slows down the more nodes there are in the graph, and unless you're doing a graph-wide operation regardless of label, this is usually evidence of a mistake in the query, telling you that you should at least add a label to the MATCH pattern.
-
NodeByLabelScan. If a label is present in the pattern, then the planner only has to look through nodes of that type. For example, MATCH (p:Person {name:'Keanu Reeves})
. There is a :Person label present, so only :Person nodes need to be checked. If there is no index on :Person(name), then this will do a NodeByLabelScan followed by a property Filter operation, performing property access on those nodes until all matching nodes are found. This is a medium speed lookup, and is dependent upon how many nodes there are for the given label. Usually when there is a property present for which you want to filter quickly, you should add an index so this becomes an index lookup instead.
-
NodeIndexSeek. When a label and property are present in a MATCH pattern, and there is an index for that label and property combination, then instead of the previous NodeByLabelScan + filter, we can do an index lookup instead. Take the example from the previous section: MATCH (p:Person {name:'Keanu Reeves})
. Without an index present, this is a NodeByLabelScan followed by a property filter on p.name
. But if we create an index for this: CREATE INDEX ON :Person(name)
, now the planner can use an index lookup instead. It does not access properties, but gets the nodes quickly via the index. So within a Cypher query you do not know if a certain lookup on a starting node is going to do a label scan and filter or an index lookup. You would need to know what indexes are available in your graph (CALL db.indexes()
), and you could also check by running an EXPLAIN on the query, which will show you what operations the planner will use to execute the query. You could see from the query plan which type of lookup will be used to find your starting nodes.
After starting nodes are found, Neo4j uses traversal and filtering to find the rest of the pattern (as opposed to table joins, which is what a relational database would do). The executor will start from the starting nodes and expand relationships, filter, and perform whatever other operations are associated with the cypher query to find the result.
If this is still tough to understand, consider a real world example of the approaches you could use to find Keanu Reeve's phone number.
The analog for an AllNodesScan is: MATCH (p {name:'Keanu Reeves'}) RETURN p.phoneNumber
. You have no label, you don't know what type of thing this is. So you start asking every single object in the world if they have the name Keanu Reeves, and if so you ask for their phone number. After you have asked every single object in the entire world, you now have the phone numbers of everything that told you that yes their name is Keanu Reeves.
The analog for a NodeByLabelScan is: MATCH (p:Person {name:'Keanu Reeves'}) RETURN p.phoneNumber
. The difference now is that you know that this must be a person, so you only need to ask every person in the world if their name is Keanu Reeves, and if so you get their phone number, and you have to ask everyone in the world since there may be several people with that name. This is far more efficient than asking every thing in the world for their name, but it will still take a long time. If instead we had MATCH (p:Actor {name:'Keanu Reeves'})
now we know we only have to ask actors, which cuts down the number of people we need to ask significantly.
But consider if we had some web page where (for people only) we could enter a name and it would tell us the people who had that name anywhere in the world, and then we could get their phone numbers. This is the equivalent of an index on :Person(name), and is very fast (we don't need to ask people for their names, we ask the index instead, which was constructed specifically for looking up people by their name very quicky). Creating an index takes time (usually quick, but can be longer for millions or billions of nodes or more). But once the index is created, it knows how to lookup nodes of certain types by a certain property (or properties). The planner is aware of the indexes available, and when it sees a pattern with potential starting nodes that include the label and property (or properties) of an index, it will use an index lookup to find the starting nodes.
Filtering
As for traversal and filtering, consider this query:
MATCH (p:Person {name:'Keanu Reeves'})-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2003
RETURN p, collect(m) as moviesIn2003
Note that this query is equivalent in every way to this one:
MATCH (p:Person {name:'Keanu Reeves'})-[:ACTED_IN]->(m:Movie {released:2003)
RETURN p, collect(m) as moviesIn2003
If we only have indexes on :Person(name) and :Movie(title), then the planner will choose to use p
as the starting node, performing an index lookup for :Persons with the name of 'Keanu Reeves'. Then it will expand outgoing :ACTED_IN relationships, and then it will filter on the m
node, first performing a label filter to make sure m
is a :Movie node, then doing property access and making sure the node's released
property equals 2003. Nodes that are not labeled as :Movie, or that do not have a released
property, or have a released
property that isn't equal to 2003 will be filtered out.