Hello, I have a question about the ORDER BY performance of neo4j.
- neo4j version: 4.0.6 / neo4j desktop
Node & Relationship
Node - 10,001,203 EA
- Article - 10,000,000 EA
- ...
Releationship - 11,251,324 EA
- Section-[CONSISTS_OF]->Article
Indexes (native-btree-1.0)
| 1 | "article_index" | ["Article"] | ["articleId", "officeId"] | "native-btree-1.0" |
| 2 | "office_index" | ["Article"] | ["officeId"] | "native-btree-1.0" |
| 3 | "order_index" | ["Article"] | ["updateDateTime"] | "native-btree-1.0" |
without order by desc
MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a LIMIT 100
with order by desc
MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a ORDER BY a.updateDateTime DESC LIMIT 100
The number of 'Article' nodes is about 10 million.
and The number of '(a:Article {officeId: '001'})' is 227,245.
There are many performance differences depending on whether ORDER BY is used or not.
Is there any other way to address ORDER BY performance issues?
Thanks
Can you try this?
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
@soham.dhodapkar
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
RETURN a LIMIT 100
The above query itself is slow. it takes about 400ms
Is there a problem with the 'updateDateTime' index?
Or need something like a descending index?
The 'Article' Node properties are as follows.
"properties": {
"contents.title": "...",
"contents.content": "...",
"officeId": "001",
"articleId": "0009656943",
"contents.section": "10464f000",
"insertDateTime": "20180820035121",
"updateDateTime": "20181029013007",
"refinedContent": "..."
}
Do you have indexes on officeId
and updateDateTime
?
Also, can you please profile the query and share the query execution plan?
PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
Indexes
I have indexes on officeId and updateDateTime.
Profiles
PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
And are the results returned correct?
I am trying to think of a way to make this faster than 282ms. Number of (a:Article {officeId: '001'})
is 227,245. How many nodes are (s:Section {sectionId:'104'})
?
We can try to leverage index-backed ordering, but this requires a means to hint the type of the property, such as in a WHERE clause against a value of the same type. Can you give this a try?
PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a
@soham.dhodapkar
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
The result of returned is not correct.There are only seven results.
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
I think the above query is executed as a result from 'ORDER BY a.updateDateTime DESC LIMIT 100'.
And Number of (s:Section {sectionId:'104'})
is only one node
@andrew_bowman
PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a
No result from the above query. I think where a.updateDateTime < dateTime()
clause doesn't seem to work.
updateDateTime format
"updateDateTime": "20181029013007"
Profile
Is this right condition between updateDateTime and dateTime()?
Ah, I thought you were using a dateTime type. If you're using the epoch time, then maybe something like WHERE a.updateDateTime < timestamp()
for that segment.
I just changed the property 'insertDateTime' to dateTime type with the query below
CALL apoc.periodic.iterate(
"MATCH (a:Article) RETURN a",
"SET a.insertDateTime = datetime({epochmillis: apoc.date.parse(a.insertDateTime, 'ms', 'yyyyMMddhhmmss')})",
{batchSize:1000, parallel:true})
I made a Index on 'insertDateTime' property and I ran the command you recommend
PROFILE
MATCH (a:Article {officeId:'001'})
where a.insertDateTime < datetime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.insertDateTime DESC
LIMIT 100
RETURN a
Below query is also slow ..
PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
WHERE a.insertDateTime < dateTime()
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100
And this query is faster than both above
PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100
Is leverage index-backed ordering working?