Writing out performant queries for n-relationship queries on a single node where n > 2

Not a bug, just couldn't find documentation on this.

Starting with a known reference point: 1 node with 2 connections to 2 different nodes:

MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2) RETURN *
Make sense, and it's easy.

Now let's say a node with 5 connections to 5 different nodes, each relationship with a different relationship type.

Do we:

MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2), (associated_node3)-[:REL3]-(start_node)-[:REL4]-(associated_node4), (associated_node5)-[:REL5]-(start_node)
MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2)
MATCH (associated_node3)-[:REL3]-(start_node)-[:REL4]-(associated_node4)
MATCH (associated_node5)-[:REL5]-(start_node)
  1. ?

I am interpreting the question to be "how do get query from a node and get all the nodes it has relationships to without defining each relationship". Created a model that I think is what you're looking for. It connects five different nodes to a single node with five different relationships. The Cypher query pattern here is to anchor off the start node and not define relationship(s). The model has all nodes with the label Node (you should use a label in a query when you can in almost all cases):

  (`0` :Node {name:'start'}) ,
  (`1` :Node {name:"associated_node1"}) ,
  (`2` :Node {name:"associated_node2"}) ,
  (`3` :Node {name:"associated_node3"}) ,
  (`4` :Node {name:"associated_node4"}) ,
  (`5` :Node {name:"associated_node5"}) ,
  (`0`)-[:`REL1` ]->(`1`),
  (`0`)-[:`REL2` ]->(`2`),
  (`0`)-[:`REL3` ]->(`3`),
  (`0`)-[:`REL4` ]->(`4`),
  (`0`)-[:`REL5` ]->(`5`)

The image and rudimentary create code was built using the "arrows" tool found at: http://www.apcjones.com/arrows/

The simplest query would be to start with the node that has the name property value 'start', and get all nodes via any relationship to any other node with the label :Node:

MATCH path=(:Node {name: 'start'})-->(:Node) RETURN path

A more explicit approach would be to define how many relationship 'hops' you want the query to traverse from the start node. In this case there's only one level of relationships:

MATCH path=(:Node {name: 'start'})-[*..1]->(:Node) RETURN path

Or, get anything to any depth related to the node with label :Node with the property name:'start'

MATCH path=(:Node {name: 'start'})-[*]->() RETURN path

As always, you should PROFILE or EXPLAIN your queries to make sure they're efficient, using indexes where it makes sense to have them, etc.

1 Like

@dan.flavin I appreciate you putting in the work to flesh out the answer above, but that is not my question.

The question is about syntax, specifically about the “best” (aka most performant) way to write out the query.

Say we’re looking for all nodes connected via REL1, REL3, REL4, REL5

How should we write that out in cypher?
How about if we’re looking for all but 1 label?

Filter the relationship:

MATCH (n:Node {name:"start"})-[r]-(d)
WHERE type(r) <> "REL2"
RETURN n, d;


To get all:

MATCH (n:Node {name:"start"})-[r]-(d)
//WHERE type(r) <> "REL2"
RETURN n, d;


@ameyasoft That does not seem performant or best-practice.

How would you structure that if there were 10 labels and you wanted 3 of them, or 8?
What if there were 50 connected nodes each of a different type? (yes, I know: reassess the graph structure. 50 is only there as a thought experiment)

As stated above, try something similar to:

MATCH (n)-[r]-(m) 
WHERE type(r) IN ['REL1', 'REL2',  'REL4']
return n,r,m

"Best practice" is in the details since Neo4j graph traversal is very efficient at its core. How any one query behaves can be quantified by looking at the PROFILE output. How it works in your environment is a matter of query complexity, graph structure, and graph size as a function of hardware capabilities. What might not be "best practice" may work just fine in "reality".

But since you brought it up, it is not generally a good idea to anchor a query based on relationships. For example, the original example query in the post had no node labels. That will force a complete scan through the node store because any node has the potential to be part of the result set. You will see a AllNodesScan in the PROFILE output. Add a node label and you'll see the first step being NodeByLabelScan. Much better. Add a property with an index and even better. Note that the main reason for indexes is to find the starting nodes for a query.

It might help to understand the context of the query pattern. It is often questioned if you're starting with a relationship as the anchor for a query pattern, then could it be better served as a property (e.g. think something that modifies the noun that defines a label)?

1 Like

I apologize: I don’t seem to be communicating my request well, and may have clouded the request with specifics.

I suspect that the real answer is going to be found by testing each possible syntax. Maybe Neo4j already optimizes and my question is trival, I’m not sure.

I’ve have only seen official documentation about writing cypher for 3-node paths, and none about finding more complicated paths. Looking to fill in the gaps in my knowledge while avoiding bad habits.

See above:

Unbounded graph traversals are just part of Cypher. But always be mindful of your query pattern. Graphs are highly connected and you can find it easy to inadvertently return a large amount of data.

No, unbounded traversals are not what I’m asking about.

The answer will:

  • Be Cypher
  • Relate to at minimum 4 nodes
  • Have 4 node indicators “()”
  • Not be a simple chain ((a)->(b)->(c)->(d))
  • Instead have a way to indicate this query:

or this query:


Hope this clarifies.
With reapect,

All those query patterns are not a problem with Cypher. I would offer that you create a simple test graph and queries and post your questions based on what you've observed with the cypher code. That will give us a common base to move forward with.

If I'm reading your question right, you have a starting node in the middle, and surrounding nodes connected by a single relationship each, with the type of that relationship being different on each.

If that's the case, you can use | in your relationship pattern to separate the relationships you're interested in. It works as an OR in this case, and is documented here.

An example query

MATCH (start_node:Node {name: "Start"})-[:REL1 | REL2 | REL3 | REL4]-(associatedNode)
RETURN associatedNode

If you only wanted associatedNodes of a certain subset of types, then you could use something like WHERE associatedNode:Person OR associatedNode:Place OR associatedNode:Thing after the match.

Ahhhh, excellent! Thank you @andrew.bowman; that is one piece of the puzzle.

Does this same syntax apply to nodes?
Like: MATCH (a:Person)-[]-(b:Location | c:Time)?

It does not. You'll need to handle that in your WHERE clause, either through the example I provided, where you know the labels ahead of time and can hardcode the check into the query, or with a dynamic check such as something like this:

WHERE any(rel in ['Person', 'Place', 'Thing'] WHERE rel in labels(associatedNode))

The list of course could be supplied as a parameter to the query. In most cases though you will likely know the label or labels of the node you want and can hardcode them into the query.