Wondering if this is possible and whether i'm getting something in the query wrong. Thanks in advance.
I have a bunch of target nodes (rib:RibosomeStructure)
in the example that get filtered based on their own properties and properties of nodes around them(it's not really important how the filtering is done, i think). Once the filtering is done, i'd like to save a full count of target nodes that pass the filters, but then return this full count AND only a subset of them (applygin SKIP
/LIMIT
).
Concretely, let's say i have 200 :Apple
nodes, i filter them down to only red apples from Brazil via WHERE
. I am left with a 90 :Apple
nodes. I'd like to now return return count(apples), collect(apples limit 10)
and get [90, [{apple_object1}, {apple_object2},..., {apple_object10}]]
as a response. That -- without rerunning the filtering logic (where
) twice. I have been unable to do that so far.
This is what i actually have:
match (rib:RibosomeStructure)
with rib order by rib.rcsb_id desc
// ------- filters --------
where toLower(rib.citation_title)
+ toLower(rib.pdbx_keywords_text)
+ apoc.text.join(rib.citation_rcsb_authors, "") contains "some_string"
and rib.citation_year > 2020
and rib.resolution < 3
and ALL(x in ["val1", "val2"] where x in apoc.coll.flatten(collect{match (rib)-[]-(p:Polymer) return p.nomenclature }) )
and ANY(y in [9606] where y in apoc.coll.flatten(collect{ match (rib)-[:source]-(p:PhylogenyNode)-[:descendant_of*]-(s:PhylogenyNode) return [p.ncbi_tax_id, s.ncbi_tax_id]}) )
// ------- filters --------
// Let's say there are 42 objects that passed the filters so far.
with rib limit 10 // <--- the relevan "limit" here
// ------- more matches --------
optional match (l:Ligand)-[]-(rib)
with collect(PROPERTIES(l)) as ligands, rib
match (rps:Protein)-[]-(rib)
with collect(PROPERTIES(rps)) as proteins, ligands, rib
optional match (rna:RNA)-[]-(rib)
with collect(PROPERTIES(rna)) as rnas, proteins, ligands, rib
// ------- more matches --------
with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib
return collect(apoc.map.merge(rib, rest)), count (rib)
// returns: [ [{obj1, obj2, ..., obj10}] , 10]
// i want : [ [{obj1, obj2, ..., obj10}] , 42]
This, naturally returns "10" as count(rib)
because i limit
ed it midway through before the optional match
es.
But even if with an attempt to "pre-save" the total_count before limit
ing i can't get it:
match (rib:RibosomeStructure)
with rib
order by rib.rcsb_id desc
where // [....filters as above, redacted for brevity....]
with count(rib) as total_count, rib // <-- trying to "pre-save" total_count
with total_count, rib limit 10
// [....matches as above, redacted for brevity....]
with apoc.map.mergeList([{proteins:proteins},{nonpolymeric_ligands:ligands},{rnas:rnas},{other_polymers:[]}]) as rest, rib, total_count
return collect(apoc.map.merge(rib, rest)), count (rib), total_count
// returns: [ [{obj1, obj2, ..., obj10}] , 10, 1 ]
// expected: [ [{obj1, obj2, ..., obj10}] , 10, 42 ]
returns the correct array of objects, correct LIMITED count and 1
for total_count
.
I need to keep the limit in the middle of the query becuase i want to restrict the amount of optional matches downstream. Is there a way to do this short of running the count with all the filters separately? Thanks