Relationships Index for sorting

Hello everyone,

I have a use-case where I have thousands of relationships for a specific relationship type with a numerical attribute. I would like to get the top 10 nodes by sorting on this attribute. There is a B-tree index on the relationship attribute (I am using the latest version of neo4j). My cypher query uses sorting on this specific attribute but it seems that the sorting does not use an index as it goes through all the relationships with many db hits making the query very slow, is that expected? What should I do to tell the engine to use an index for sorting? Thanks a lot

My query is similar to this :

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) WHERE IN ['name1', 'name2'] AND NOT IN ['name3', 'name4'] WITH a1, a2 ORDER BY rel.score DESC WITH as a1_name, collect( as a RETURN a1_name, a[0..10]

The same thing happens with the simpler query:

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) WHERE'name1' RETURN ORDER BY rel.score DESC LIMIT 10

There is already an index here on the node name attribute which it uses correctly. Also, doing an exact match on the relationship attribute works quite well and uses well the index. It seems that the problem happens only with sorting.

Thanks a lot for your help.

The screenshot is for this query:

It performs 10k db hits while i am only asking for 10 results and it doesn't seem to use the index.

Hi @Celes-10 !

This feature was requested long time ago on Node indexes, it's now closed but with no direct use of the index (sort sometimes is out of the planner optimization). Is there any low boundary for the pmi score? Can you add a WHERE condition like

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) 
AND rel.score > 0

Technically, a 10k filter should be easy to handle. How long does it the query takes? Can you send the full PROFILE of the query? I assume you also have an index on the Node(name).

See ya!


Hello @Bennu , Thanks for your answer. Yes i have a unique constraint on the Node(name) and it creates an index on it automatically. Actually my use-case is a bit more than that specific query. I would like to extract the topX of the topX... until an arbitrary depth. For now i do it by step in multiple queries (equal to the depth) where an example of the first query is the one showed above to get the top10. Then I do another query with the results of the first one (the top10 nodes) to get the top10 of the top10 ... This takes time. I am not sure if this is the most optimal way to do it, perhaps I should use one of the algorithms in the apoc suite?

To answer your question, given a depth of 3 with some filtering on the pmi score, it takes 2 to 4 seconds