Fast count with count store on entity relations

sunny_pelletier · September 23, 2020, 9:48pm

Hello internet!

Me and my team are trying to write a query where we count the number of information that is possessed by more than a given number of people. In Cypher terms, here's our query

MATCH (info:Info)
WITH info, size((:Person)-[:HAS_INFO]->(info)) as peopleCount
WHERE peopleCount > 3
RETURN count(info)

We currently have around 150,000 info in the database, and the query profiling of this is pretty terrible. Here's what we can see

As you can see, neo4j seem to be iterating over all the data only to count it, and indeed, this query performs really poorly (500ms).

We tried looking for a different approach that would use the count store, but we can't seem to find a way to make this query faster.

Is there a magic apoc procedure or anything that would allow us to speed up this request, considering the number of info will increase in time?

ameyasoft · September 23, 2020, 11:25pm

Try this:

MATCH (p:Person)-[:HAS_INFO]->(i:info)
WITH id(i) as ID, count(distinct p) as Cnt where Cnt >= 3
RETURN ID as infoID, Cnt as peopleCount ORDER BY peopleCount DESC LIMIT 20

sunny_pelletier · September 23, 2020, 11:45pm

Thank you for your reply. If I try to apply your suggestion, it is indeed a little faster (around 150ms), which still makes me wonder a big number of info (millions of it).

I had to edit your query to get what I want out of it, so here's what I have:

MATCH (p:Person)-[:HAS_INFO]->(info:Info)
WITH id(info) as ID, count(p) as peopleCount
WHERE (peopleCount >= 3) 
RETURN count(ID)

Also, the actual query is a little bigger than that, but I tried to simplify the problem by providing only a part of it . If you want the real query, here it is

MATCH (info:Info)-[:MATCHES]->(pattern:Pattern)-[:PART_OF]->(patternGroup:PatternGroup)
WHERE ($sha256 = [] OR info.sha256 IN $sha256) AND
      ($patternGroups IS NULL OR patternGroup.id IN $patternGroups) AND
      (info.likelihood >= 0.5)
                  
WITH info, pattern, patternGroup, size((:Person)-[:HAS_INFO]->(info)) as peopleCount
WHERE ($minPeopleCount IS NULL OR peopleCount >= $minPeopleCount) AND
      ($maxPeopleCount IS NULL OR peopleCount < $maxPeopleCount)

RETURN count(info)

Our problems is for params

{ sha256: [], patternGroups: null, minPeopleCount: null, maxPeopleCount: null }

We tried the solution you propose on our actual query and the result is roughly the same as with size((:Person)-[:HAS_INFO]->(info))

Topic		Replies	Views
Count store - Where clause workaround Cypher counts	3	457	April 12, 2022
Counting rows performance Cypher	2	253	January 10, 2022
Count store - Where clause workaround Neo4j Graph Platform counts , migrated , cypher-tagged	1	157	November 5, 2022
Too slowly recount nodes and relationships Cypher	3	528	November 29, 2019
My count() query is too slow Cypher cypher , counts	6	1469	June 18, 2021

Get Certified in June!

Fast count with count store on entity relations

Related topics