WITH rand() change functionality between CALL {} and CALL apoc.cypher.run

Hello all,

I am having a weird scenario when I am trying to randomly select a few neighbors of particular nodes.

  • neo4j version, desktop version, browser version
    I am using neo4j:5.5.0-community docker instance on linux.
    With the following parameter
      - NEO4J_AUTH=none
      # - NEO4_PLUGINS='["graph-data-science"]'
      - apoc.export.file.enabled=true
      - apoc.import.file.enabled=true
      - apoc.import.file.use_neo4j_config=true
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_memory_transaction_total_max=0
      - NEO4J_dbms_memory_heap_initial__size=32g
      - NEO4J_dbms_memory_heap_max__size=32g
  • what kind of API / driver do you use
    I am using python with py2neo package.

I tried 2 queries:
First using apoc

        MATCH (cur_node)
        WHERE ANY (id IN cur_node.ID WHERE id IN $node_id_list)
        WITH cur_node
        CALL apoc.cypher.run(
            'MATCH (cur_node){direction_arrow}(neighbors)
            WITH rand() as r, neighbors
            ORDER BY r
            LIMIT $num_samples
            RETURN collect(r) AS rand, collect(neighbors.ID) AS nn',
            {{cur_node: cur_node, num_samples: $num_samples}}) YIELD value
        RETURN value.rand, value.nn AS neighbors

OUTPUT:
first iteration:

{'value.rand': 
[0.00015447678200963821, 0.0002513570108183538, 0.00036323837330132225, 0.0004571500939272166, 0.0005692942868501527, 0.0006319216646581971, 0.0009800730498202848, 0.0009975206145961257, 0.0012616359436184998, ...], 
'neighbors': [171053, 125765, 89153, 49813, 57168, 140995, 81320, 67133, 216481, ...]}

second iteration:

{'value.rand': 
[0.00015406391020234, 0.00022974141129827874, 0.00027559936540422214, 0.00048246754345004916, 0.0005592177219301275, 0.0008769336556695428, 0.0009803530300780405, 0.0010443457475343143, 0.001150494865988283, ...], 
'neighbors': [76714, 187927, 213515, 166957, 52992, 182661, 73519, 150725, 127881, ...]}

and without it using CALL {}

        MATCH (cur_node)
        WHERE ANY (id IN cur_node.ID WHERE id IN $node_id_list)
        CALL {{
            WITH cur_node
            MATCH (cur_node){direction_arrow}(neighbors)
            WITH rand() as r, neighbors
            ORDER BY r
            LIMIT $num_samples
            RETURN collect(r) AS rand, collect(neighbors.ID) AS nn
        }}
        RETURN rand, nn AS neighbors

OUTPUT:
first iteration:

{'rand':
 [0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, 0.15242892235650374, ...], 
'neighbors': [3202, 232954, 232774, 231676, 231841, 231775, 231566, 231065, 230755, ...]}

second iteration

{'rand':
 [0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, 0.9181689214459184, ...], 
'neighbors': [3202, 232954, 232774, 231676, 231841, 231775, 231566, 231065, 230755, ...]}
}

The issue is that when using CALL {}, it seems that I only sample one random value and so the sorting / LIMIT is meaningless. I could use apoc, but I understood that it's better performance-wise to use CALL {}. (It is also easier to debug with PROFILE)

As for the apoc query, we have the correct result.

Any idea what I may be doing wrong?

I repeated your 'call subquery' code and it worked as expected. I am using 4.4.17

Thank you for your response.

I changed my docker image to neo4j:4.4.17-community, but I still get the same output with the call sub query.
I am a bit at a loss, any idea how your set up and mine could be different?

I am using enterprise in Neo4j Desktop.