Return the full `COUNT` alongside a `COLLECT` and after `LIMIT` and `ORDER BY`

rtkushner · May 22, 2024, 12:47am

Wondering if this is possible and whether i'm getting something in the query wrong. Thanks in advance.

I have a bunch of target nodes (rib:RibosomeStructure) in the example that get filtered based on their own properties and properties of nodes around them(it's not really important how the filtering is done, i think). Once the filtering is done, i'd like to save a full count of target nodes that pass the filters, but then return this full count AND only a subset of them (applygin SKIP/LIMIT).

Concretely, let's say i have 200 `:Apple` nodes, i filter them down to only red apples from Brazil via `WHERE`. I am left with a 90 `:Apple` nodes. I'd like to now return `return count(apples), collect(apples limit 10)` and get `[90, [{apple_object1}, {apple_object2},..., {apple_object10}]]` as a response. That -- without rerunning the filtering logic (`where`) twice. I have been unable to do that so far.

This is what i actually have:

    match (rib:RibosomeStructure) 
    with rib order by rib.rcsb_id desc 

           // ------- filters --------
          where toLower(rib.citation_title) 
                + toLower(rib.pdbx_keywords_text) 
                + apoc.text.join(rib.citation_rcsb_authors, "")  contains "some_string" 
          and rib.citation_year > 2020 
          and rib.resolution < 3
          and ALL(x in ["val1", "val2"] where x in apoc.coll.flatten(collect{match (rib)-[]-(p:Polymer) return p.nomenclature }) )
          and ANY(y in [9606] where y in apoc.coll.flatten(collect{ match (rib)-[:source]-(p:PhylogenyNode)-[:descendant_of*]-(s:PhylogenyNode) return [p.ncbi_tax_id, s.ncbi_tax_id]}) )
         // ------- filters --------
    // Let's say there are 42 objects that passed the filters so far.
    with rib limit 10      // <--- the relevan "limit" here
              // ------- more matches  --------
                optional match (l:Ligand)-[]-(rib) 
                with collect(PROPERTIES(l)) as ligands, rib
                match (rps:Protein)-[]-(rib) 
                with collect(PROPERTIES(rps)) as proteins, ligands, rib
                optional match (rna:RNA)-[]-(rib) 
                with collect(PROPERTIES(rna)) as rnas, proteins, ligands, rib
             // ------- more matches  --------
    with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib
    return collect(apoc.map.merge(rib, rest)), count (rib)
// returns: [   [{obj1, obj2, ..., obj10}]  , 10]
// i want  : [   [{obj1, obj2, ..., obj10}]  , 42]

This, naturally returns "10" as count(rib) because i limited it midway through before the optional matches.

But even if with an attempt to "pre-save" the total_count before limiting i can't get it:

    match (rib:RibosomeStructure) 
    with rib
    order by rib.rcsb_id desc 
    where // [....filters as above, redacted for brevity....]
    with count(rib) as total_count, rib // <-- trying to "pre-save" total_count
    with total_count, rib limit 10
        // [....matches as above, redacted for brevity....]
    with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib, total_count
    return collect(apoc.map.merge(rib, rest)), count (rib), total_count
// returns:    [ [{obj1, obj2, ..., obj10}] , 10, 1  ]
// expected:   [ [{obj1, obj2, ..., obj10}] , 10, 42 ]

returns the correct array of objects, correct LIMITED count and 1 for total_count.

I need to keep the limit in the middle of the query becuase i want to restrict the amount of optional matches downstream. Is there a way to do this short of running the count with all the filters separately? Thanks

glilienfield · May 22, 2024, 5:59am

You could collect all the rows and count them in a “with” clause after filtering. Then unwind the collection of rows to further process. The count will be appended to each row. Keep passing it through your steps. At the end when you collect your final data, also collect the count using “distinct” and just keep the zeroth element as the original count. Actually, you don’t really need distinct since all the elements are the same and there will always be a zeroth element.

Collect(distinct total_count)[0]

Also, instead of unwinding the full collection of records and limiting the number of rows to ten using “with” and “limit” clauses, you can use list slicing to keep the first 10 elements and unwind the results.

Using your apple example:

Match(n:Apple{color:red})
with collect(n)[..10] as apples, count(n) as total_count 
Unwind apples as apple 
//each row will have the total_count appended to it. Keep passing it along 
//at the end when you do you final collecting, also collect it
Return collect(x), collect(distinct total_count)[0]

rtkushner · May 22, 2024, 9:49pm

Thanks, Gary. Like a charm.

rtkushner · May 22, 2024, 9:53pm

Any idea if the collect(...)[start..end] solution is any more or less performant than SKIP/LIMIT?

glilienfield · May 22, 2024, 10:44pm

They are both useful tools to use when appropriate. Use list slicing when you have a list. Use skip/limit when you have rows of data.

I would not collect rows just to limit the data with slicing, nor would I purposely unwind a list just to use skip/limit.

Topic		Replies	Views
Get COUNT before LIMIT Neo4j Graph Platform performance , cypher	10	3329	November 4, 2024
Pagination on match queries Cypher	1	216	May 2, 2023
Multiple limits in one query Cypher	1	3436	September 5, 2018
How to paginate intermediate nodes Cypher cypher	1	528	December 14, 2019
Avoiding second match while returning total count and limited results Neo4j Graph Platform migrated	0	155	November 27, 2022

Return the full `COUNT` alongside a `COLLECT` and after `LIMIT` and `ORDER BY`

Related topics