How to return random results

paolodipietro58 · December 3, 2021, 7:24pm

Neo4j Community server 4.3.3 - Ubuntu 20.04

Is there a way to return random result from a query?

MATCH (n:DOC) RETURN n LIMIT 25 always returns the same (annoying) 25 items.

I'd like to have something like

MATCH (n:DOC) RETURN n LIMIT 25 RANDOM

How can achieve this result?

Thank you

dana_canzano · December 3, 2021, 11:42pm

@paolodipietro58

somewhat random though not entirely

MATCH (d:Doc) WHERE rand() <= 0.1 return d limit 3

i.e for each node examined we simply run the rand() function which will return a value from 0.0 to 0.99 and so ae looking for items where rand() retrurns a value less than <0.1. But this is going to start at the first :Doc node recorded. So if you have for example 10,000 :Doc nodes you might be getting more of the early entered :Doc nodes

paolodipietro58 · December 4, 2021, 6:13pm

@dana_canzano Very good suggestion!

Then I searched the documentation and found a better solution:

MATCH (n:DOC) RETURN apoc.coll.randomItems(collect(n), 5)

I run it just one hundred times and it never returns the same itemset, and also it returns both old and recent items.

Not sure about the 'efficiency of the solution, but if there is enough memory it shouldn't be a problem!

What do you think?

glilienfield · December 9, 2021, 3:30pm

Try this. Add your match conditions and set your limit.

match (n)
with n, rand() as r
order by r
limit 2
return n

paolodipietro58 · December 9, 2021, 11:32pm

@glilienfield: How can I fit your solution in a more complex case like this?

match (n:Doc)<-[:HAS_REL]-(v:Vol)
with [v,n] as x
return
apoc.coll.randomItems(collect(x), 5)

glilienfield · December 10, 2021, 2:43am

This should work:

match (n:Doc)<-[:HAS_REL]-(v:Vol)
with n, v, rand() as r
order by r
limit 5
return n,v

paolodipietro58 · December 12, 2021, 10:52pm

No, this is not a good solution, as it repeat the (n,v) couple n times, with the same value of rand.
To be clear:
n1,v1,r1
n1,v2,r1
n1,v3,r1
...
n2,v11,r2
n2,v12,r2

and so on

The point is to have completely different values of n and v

glilienfield · December 14, 2021, 2:38am

The rand() function looks like it produces a random value of double type, thus the probability of repeating is extremely small. You can see with the following query what a sequence of 100 random values looks like.

unwind range(1,100) as indexes
return indexes, rand() as random
order by random

I thought you wanted 5 randomly selected paths between DOC and VOL nodes. It sounds like you also don't want any of the paths to have duplicate :Doc or :Vol nodes.

To help you better, can you explain your data? From your example, it looks like you have a one-to-many relationship between DOC nodes and VOL nodes, with each VOL node belongs to only one DOC node. Is this accurate? If this is true and you want 5 randomly selected paths between a DOC node and a VOL node, without repeating the DOC nodes, then you could try:

match(n:Doc)
with n, rand() as r
order by r
limit 5
call {
with n
match (n)<-[:HAS_REL]-(v:Vol)
with v, rand() as r
order by r
limit 1
return v
}
return n, v

The query first finds all the :Doc nodes, and randomly selects 5. It then performs a subquery for each of the 5 :Doc nodes separately to obtain all the :Vol nodes attached to that :Doc node. It then randomly selects one of the :Vol nodes attached to the :Doc node. Finally, it returns the ordered pair of :Doc and :Vol nodes. If I understand your data, you will get 5 randomly selected :Doc nodes, with one randomly selected :Vol node that is attached to the :Doc node; thus, giving you 5 randomly selected paths between :Doc and :Vol nodes, where no two :Doc nodes or :Vol nodes are alike.

If this is not what you are looking for, please describe your data so we can see if we can help.

Topic		Replies	Views
Returning a random subset of nodes (ideally repeatable) Cypher	4	4087	June 24, 2019
Hello, How can I go through the paths of a query in different ways? Cypher	2	206	June 24, 2021
WITH rand() change functionality between CALL {} and CALL apoc.cypher.run Cypher apoc , cypher	3	146	March 22, 2023
Cypher never resolves (takes to long and consumes much memory) Cypher	1	307	December 16, 2020
Generating random start and target nodes Cypher	2	250	November 8, 2021

August Summer Fun!

How to return random results

Related topics