Default ordering of nodes and relationships

One for the engineering team!

I have noticed that when using a NodeByLabelScan - even with a filter - the nodes are always returned by default in strict ascending id order, and with no gaps - that is I don't need to sort by id to get a sorted list of node ids. I'd like to know if this is a guaranteed default behaviour, or is it just a a co-incidence? As illustration of what I mean, these two queries return identical results.

MATCH (n:T) 
WHERE id(n) >= 100
RETURN id(n) 
ORDER by id(n) 
LIMIT 10
MATCH (n:T) 
WHERE id(n) >= 100
RETURN id(n) 
LIMIT 10

The behavior you've observed is correct - that is how neo4j works -- but as a matter of software contract I think it's not a good idea to rely on that. Here what I think matters is the semantics of the cypher language itself. If you have no ORDER BY clause, then it is not legitimate to assume the order of the results. Cypher would still be "right" if it returned the records in another order. Under the hood in neo4j, other complicated things are happening (like node ID reuse and packing). Your cypher queries shouldn't need to be aware of things like that, or how changes in them over time could affect the outcome of your queries.

Remember that as new versions are introduced and neo4j goes through changes, if you come to rely on that sort of behavior it might cause problems later, because AFAIK that's not part of the contract of how cypher works.

Explicit ORDER BY is better.

tl;dr yes it would work but I don't recommend it. :slight_smile:

@david.allen

Given the scenario where I have a node (label User), I have a property named "status" which is indexed. (But the question I intend to ask this property doesn't matter much)

Based on the images provided, it seems like there will be performance differences between explicit ORDER BY and non-explicit ORDER BY?

If so, how can I avoid it?

I don't think there's any such thing as non-explicit order by. If you don't use the order by clause, you are not guaranteed to get the records in order. Relying on the order of records when not using "order by" should not be done.

In order to do pagination properly, you must do order by. Because the order of the records could change if you don't use order by, your pagination could be very inconsistent if you don't do that.

Thanks for the reply. What I am trying to ask is that, if I do use ORDER BY as seen in the images, there are more records returned 100,000+ rows vs. 100 rows. Will that leads to performance bottleneck as the number of records grows?

It should not result in a performance bottleneck assuming that you are doing ORDER BY on an indexed property (the internal system ID of the node should be OK too).

Cardinalities that you see in the EXPLAIN plan are approximate and probably not exactly right.

I want to avoid the ordering which cypher is doing by default. I want to get the results as per the order in query.
E.g Below is the code:

"MATCH (p:TableName{tabind:'4'})
OPTIONAL MATCH (p)-[:CHILD]->(c:ColumnName) where c.colind IN ['1','2']
RETURN {TableName : p.name, ColumnName : c.column}");

Column 1 is paymentDate and Column 2 is Amount. I want to get paymentDate, amount where as my cypher qury is returning amount, paymentDate.

Thanks in advance for Help

1 Like

Cypher is a declarative language, meaning that you tell it what you want and it goes and gets it. In the query you provided, you did not specify an order, and so cypher guarantees no particular order.

If you want your results in a particular order, always use ORDER BY. Never assume that they will come in any particular order, because this is up to Cypher to decide, according to what the most efficient way of executing the query is. Even if Cypher seems to do the same thing every time, there is no guarantee UNLESS you use ORDER BY.