cancel
Showing results for 
Search instead for 
Did you mean: 

How to speed up ORDER BY

Hello, I have a question about the ORDER BY performance of neo4j.

  • neo4j version: 4.0.6 / neo4j desktop

Node & Relationship

Node - 10,001,203 EA

  1. Article - 10,000,000 EA
  2. ...

Releationship - 11,251,324 EA

  1. Section-[CONSISTS_OF]->Article

Indexes (native-btree-1.0)

| 1 | "article_index" | ["Article"] | ["articleId", "officeId"] | "native-btree-1.0" |
| 2 | "office_index" | ["Article"] | ["officeId"] | "native-btree-1.0" |
| 3 | "order_index" | ["Article"] | ["updateDateTime"] | "native-btree-1.0" |

without order by desc

MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a LIMIT 100

with order by desc

MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a ORDER BY a.updateDateTime DESC LIMIT 100

The number of 'Article' nodes is about 10 million.
and The number of '(a:Article {officeId: '001'})' is 227,245.

There are many performance differences depending on whether ORDER BY is used or not.

Is there any other way to address ORDER BY performance issues?

Thanks

10 REPLIES 10

soham_dhodapka1
Node Clone

Can you try this?

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

@soham.dhodapkar

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
RETURN a LIMIT 100

The above query itself is slow. it takes about 400ms

Is there a problem with the 'updateDateTime' index?
Or need something like a descending index?

The 'Article' Node properties are as follows.

        "properties": {
          "contents.title": "...",
          "contents.content": "...",
          "officeId": "001",
          "articleId": "0009656943",
          "contents.section": "10464f000",
          "insertDateTime": "20180820035121",
          "updateDateTime": "20181029013007",
          "refinedContent": "..."
        }

Do you have indexes on officeId and updateDateTime ?
Also, can you please profile the query and share the query execution plan?

PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

Indexes

I have indexes on officeId and updateDateTime.

Profiles

PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

And are the results returned correct?
I am trying to think of a way to make this faster than 282ms. Number of (a:Article {officeId: '001'}) is 227,245. How many nodes are (s:Section {sectionId:'104'}) ?

We can try to leverage index-backed ordering, but this requires a means to hint the type of the property, such as in a WHERE clause against a value of the same type. Can you give this a try?

PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a

@andrew.bowman

PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a

No result from the above query. I think where a.updateDateTime < dateTime() clause doesn't seem to work.

updateDateTime format

"updateDateTime": "20181029013007"

Profile

Is this right condition between updateDateTime and dateTime()?

Ah, I thought you were using a dateTime type. If you're using the epoch time, then maybe something like WHERE a.updateDateTime < timestamp() for that segment.

I just changed the property 'insertDateTime' to dateTime type with the query below

CALL apoc.periodic.iterate(
    "MATCH (a:Article) RETURN a",
    "SET a.insertDateTime = datetime({epochmillis: apoc.date.parse(a.insertDateTime, 'ms', 'yyyyMMddhhmmss')})",
    {batchSize:1000, parallel:true})

I made a Index on 'insertDateTime' property and I ran the command you recommend

PROFILE
MATCH (a:Article {officeId:'001'})
where a.insertDateTime < datetime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.insertDateTime DESC
LIMIT 100
RETURN a

Below query is also slow ..

PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) 
WHERE a.insertDateTime < dateTime()
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100

And this query is faster than both above

PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) 
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100

Is leverage index-backed ordering working?

@soham.dhodapkar

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

The result of returned is not correct.There are only seven results.

MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})

I think the above query is executed as a result from 'ORDER BY a.updateDateTime DESC LIMIT 100'.

And Number of (s:Section {sectionId:'104'}) is only one node