Are there any plans to support sort order in fulltext search index?

Hi all,

Context

Currently we are evaluating using Full-text search (FTS) index for some of our use cases. In one of the use cases we want to show data matching hashtag and results are sorted by a property (creation time) and we were not able to sort by property and limit the results with FTS index query method (db.index.fulltext.queryNodes).

Here is simplified data model:

MERGE (g:Group {id: string})
MERGE (a:Article {
   content: string, group_id: string,
   created_at: unixtimestamp,
   created_at_str: "20210819T1200"})
MERGE (g)-[:CONTAINS]->(a)

What we have tried

Example case: group1 has 90.000 articles, group2 has 10.000 articles

// index
call db.index.fulltext.createNodeIndex("fts__article",
    ["Article"], ["content", "group_id", "created_at_str"]);

// query most recent articles with very common hashtag
// (lets assume this hashtag is available in all articles,
// neo4j is sorting 100k articles outside of full text search index
with 'group_id: (group1 OR group2) AND content: "#verycommonhashtag"' as query
call db.index.fulltext.queryNodes("fts__article", query)
yield node as a, score
return a
order by a.created_at desc
limit 100

Boosting

Imagine now we have 1M matching articles and query is getting slower and slower because of sorting 1M data.

We have considered boosting results by creation time and limit results to order by doing:

with 'group_id: (group1 OR group2) AND content: "#verycommonhashtag" AND (created_at_str: 202108*^10 OR created_at_str: 202107*^8 OR created_at_str:202*^1)' as query
call db.index.fulltext.queryNodes("fts__article", query, {limit: 1000}
yield node as a, score
return a
order by a.created_at desc
limit 100

Problem with boosting

which boosts articles from last 2 month then limits results by 1000, then we order outside of FTS index. This works fine if distribution of data on each group is almost similar every day or every month across all groups. But we have some groups which are very active in some days and in-active in some other weeks, which makes it difficult to create boosting condition, should boosting criteria be by month (202108^10), by week (2021-week-34^10), by day (20210819^10) or even by hour?

Imagine 1 group had 2.000 articles on a single day and we are boosting by day or week, when we limit by 1.000 we are missing some articles which might be more recent than articles returned as a result of limiting

Feature request

Based on these observations, we are thinking maybe ordering inside Lucene might be helpful for us and faster. Still 1M articles should be sorted somewhere if they match criteria, either by Lucene or by Neo4j, considering that data is already in Lucene it might be faster than (1) fetching matched results from Lucene (2) then order in Neo4j.

Do you guys have plans to add sort order in FTS index?

Thanks a lot for awesome product!

Hi @Ika I have the same issue as you. Did you find a solution for your use case?