Create ranks (limit) over subgroups in Neo4j


(Airamgabriela17) #1

Hello: I have the following query to retrieve the top 2 papers (in terms of the number of citations) of every proceeding for researchers:

MATCH (p:Paper)-[:SUBMITTED_TO]->(a:Proceeding),(p)-[:CITED_BY]->()
WITH p.paperTitle as Paper, size((p)-[:CITED_BY]->()) as numCitations, 
collect(distinct a.proceedingName) as Proceedings
UNWIND Proceedings AS Proceeding
WITH Paper, numCitations, Proceeding
ORDER BY Paper, numCitations, Proceeding
RETURN Paper, numCitations, collect(Proceeding) as Proceedings
ORDER BY Proceedings, numCitations DESC

The output I get from this is:

Paper      numCitations     Proceedings
Title1     4                Proc1
Title2     3                Proc1
Title3     2                Proc1
Title4     7                Proc2
Title5     5                Proc2
Title6     3                Proc2
Title7     8                Proc3
Title8     4                Proc3
Title9     2                Proc3

What I want to do now is to get only the top 2 of all the papers that appear for each proceeding (three for all of them), but if I use LIMIT 2 at the end of the query, I only get the top 2 of everything (not for each proceeding):

Paper      numCitations     Proceedings
Title7     8                Proc3
Title4     7                Proc2

Note that Proc1 was completely discarded. This is not what I am looking for. Any ideas on how to do this in Neo4j? Thanks!


(Michael Hunger) #2

It should be much simpler, try this.

You want to

  1. sort by citations desc for popular papers
  2. group by proceedings, not by papers and collect both paper and citation into a map
  3. take the first two entries from that list
MATCH (p:Paper)-[:SUBMITTED_TO]->(a:Proceeding) 
WHERE (p)-[:CITED_BY]->()
WITH p, a, size((p)-[:CITED_BY]->()) as numCitations
ORDER BY numCitations desc
RETURN a.proceedingName, collect({paper:paper.paperTitle, citations: numCitations})[0..2]

(Airamgabriela17) #3

Thank you Michael! You saved me :slight_smile: