Return the full `COUNT` alongside a `COLLECT` and after `LIMIT` and `ORDER BY`

Wondering if this is possible and whether i'm getting something in the query wrong. Thanks in advance.

I have a bunch of target nodes (rib:RibosomeStructure) in the example that get filtered based on their own properties and properties of nodes around them(it's not really important how the filtering is done, i think). Once the filtering is done, i'd like to save a full count of target nodes that pass the filters, but then return this full count AND only a subset of them (applygin SKIP/LIMIT).

Concretely, let's say i have 200 :Apple nodes, i filter them down to only red apples from Brazil via WHERE. I am left with a 90 :Apple nodes. I'd like to now return return count(apples), collect(apples limit 10) and get [90, [{apple_object1}, {apple_object2},..., {apple_object10}]] as a response. That -- without rerunning the filtering logic (where) twice. I have been unable to do that so far.

This is what i actually have:

    match (rib:RibosomeStructure) 
    with rib order by rib.rcsb_id desc 

           // ------- filters --------
          where toLower(rib.citation_title) 
                + toLower(rib.pdbx_keywords_text) 
                + apoc.text.join(rib.citation_rcsb_authors, "")  contains "some_string" 
          and rib.citation_year > 2020 
          and rib.resolution < 3
          and ALL(x in ["val1", "val2"] where x in apoc.coll.flatten(collect{match (rib)-[]-(p:Polymer) return p.nomenclature }) )
          and ANY(y in [9606] where y in apoc.coll.flatten(collect{ match (rib)-[:source]-(p:PhylogenyNode)-[:descendant_of*]-(s:PhylogenyNode) return [p.ncbi_tax_id, s.ncbi_tax_id]}) )
         // ------- filters --------
    // Let's say there are 42 objects that passed the filters so far.
    with rib limit 10      // <--- the relevan "limit" here
              // ------- more matches  --------
                optional match (l:Ligand)-[]-(rib) 
                with collect(PROPERTIES(l)) as ligands, rib
                match (rps:Protein)-[]-(rib) 
                with collect(PROPERTIES(rps)) as proteins, ligands, rib
                optional match (rna:RNA)-[]-(rib) 
                with collect(PROPERTIES(rna)) as rnas, proteins, ligands, rib
             // ------- more matches  --------
    with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib
    return collect(apoc.map.merge(rib, rest)), count (rib)
// returns: [   [{obj1, obj2, ..., obj10}]  , 10]
// i want  : [   [{obj1, obj2, ..., obj10}]  , 42]

This, naturally returns "10" as count(rib) because i limited it midway through before the optional matches.

But even if with an attempt to "pre-save" the total_count before limiting i can't get it:

    match (rib:RibosomeStructure) 
    with rib
    order by rib.rcsb_id desc 
    where // [....filters as above, redacted for brevity....]
    with count(rib) as total_count, rib // <-- trying to "pre-save" total_count
    with total_count, rib limit 10
        // [....matches as above, redacted for brevity....]
    with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib, total_count
    return collect(apoc.map.merge(rib, rest)), count (rib), total_count
// returns:    [ [{obj1, obj2, ..., obj10}] , 10, 1  ]
// expected:   [ [{obj1, obj2, ..., obj10}] , 10, 42 ] 

returns the correct array of objects, correct LIMITED count and 1 for total_count.

I need to keep the limit in the middle of the query becuase i want to restrict the amount of optional matches downstream. Is there a way to do this short of running the count with all the filters separately? Thanks

You could collect all the rows and count them in a “with” clause after filtering. Then unwind the collection of rows to further process. The count will be appended to each row. Keep passing it through your steps. At the end when you collect your final data, also collect the count using “distinct” and just keep the zeroth element as the original count. Actually, you don’t really need distinct since all the elements are the same and there will always be a zeroth element.

Collect(distinct total_count)[0] 

Also, instead of unwinding the full collection of records and limiting the number of rows to ten using “with” and “limit” clauses, you can use list slicing to keep the first 10 elements and unwind the results.

Using your apple example:

Match(n:Apple{color:red})
with collect(n)[..10] as apples, count(n) as total_count 
Unwind apples as apple 
//each row will have the total_count appended to it. Keep passing it along 
//at the end when you do you final collecting, also collect it
Return collect(x), collect(distinct total_count)[0]
1 Like

Thanks, Gary. Like a charm.

Any idea if the collect(...)[start..end] solution is any more or less performant than SKIP/LIMIT?

They are both useful tools to use when appropriate. Use list slicing when you have a list. Use skip/limit when you have rows of data.

I would not collect rows just to limit the data with slicing, nor would I purposely unwind a list just to use skip/limit.

1 Like