How to return random results

Neo4j Community server 4.3.3 - Ubuntu 20.04

Is there a way to return random result from a query?

MATCH (n:DOC) RETURN n LIMIT 25 always returns the same (annoying) 25 items.

I'd like to have something like

MATCH (n:DOC) RETURN n LIMIT 25 RANDOM

How can achieve this result?

Thank you

@paolodipietro58

somewhat random though not entirely

MATCH (d:Doc) WHERE rand() <= 0.1 return d limit 3

i.e for each node examined we simply run the rand() function which will return a value from 0.0 to 0.99 and so ae looking for items where rand() retrurns a value less than <0.1. But this is going to start at the first :Doc node recorded. So if you have for example 10,000 :Doc nodes you might be getting more of the early entered :Doc nodes

1 Like

@dana_canzano Very good suggestion!

Then I searched the documentation and found a better solution:

MATCH (n:DOC) RETURN apoc.coll.randomItems(collect(n), 5)

I run it just one hundred times and it never returns the same itemset, and also it returns both old and recent items.

Not sure about the 'efficiency of the solution, but if there is enough memory it shouldn't be a problem!

What do you think?

Try this. Add your match conditions and set your limit.

match (n)
with n, rand() as r
order by r
limit 2
return n

@glilienfield: How can I fit your solution in a more complex case like this?

match (n:Doc)<-[:HAS_REL]-(v:Vol)
with [v,n] as x
return
apoc.coll.randomItems(collect(x), 5) 

This should work:

match (n:Doc)<-[:HAS_REL]-(v:Vol)
with n, v, rand() as r
order by r
limit 5
return n,v

No, this is not a good solution, as it repeat the (n,v) couple n times, with the same value of rand.
To be clear:
n1,v1,r1
n1,v2,r1
n1,v3,r1
...
n2,v11,r2
n2,v12,r2

and so on

The point is to have completely different values of n and v

The rand() function looks like it produces a random value of double type, thus the probability of repeating is extremely small. You can see with the following query what a sequence of 100 random values looks like.

unwind range(1,100) as indexes
return indexes, rand() as random
order by random

I thought you wanted 5 randomly selected paths between DOC and VOL nodes. It sounds like you also don't want any of the paths to have duplicate :Doc or :Vol nodes.

To help you better, can you explain your data? From your example, it looks like you have a one-to-many relationship between DOC nodes and VOL nodes, with each VOL node belongs to only one DOC node. Is this accurate? If this is true and you want 5 randomly selected paths between a DOC node and a VOL node, without repeating the DOC nodes, then you could try:

match(n:Doc)
with n, rand() as r
order by r
limit 5
call {
with n
match (n)<-[:HAS_REL]-(v:Vol)
with v, rand() as r
order by r
limit 1
return v
}
return n, v

The query first finds all the :Doc nodes, and randomly selects 5. It then performs a subquery for each of the 5 :Doc nodes separately to obtain all the :Vol nodes attached to that :Doc node. It then randomly selects one of the :Vol nodes attached to the :Doc node. Finally, it returns the ordered pair of :Doc and :Vol nodes. If I understand your data, you will get 5 randomly selected :Doc nodes, with one randomly selected :Vol node that is attached to the :Doc node; thus, giving you 5 randomly selected paths between :Doc and :Vol nodes, where no two :Doc nodes or :Vol nodes are alike.

If this is not what you are looking for, please describe your data so we can see if we can help.