Count store - Where clause workaround

dylan_stouls · April 12, 2022, 11:48am

Hi,
I took a look at the count store documentation and as I understand, it's the best solution to count high volume of nodes/relations but there is a limitation on the where clause.

It mentioned that workarounds will be discussed at the end of this topics but I didn't find any answers to my questions so I'm posting here to take your advice.

Let's suppose we have Movies with a unique property "name" and billions of Persons who can like a movie.
If I want to get all the Person who like matrix1, what's the best approach ?

MATCH (:Person)-[r:LIKE]->(:Movie {name : 'matrix1'})
RETURN count(r) as count

I was thinking about 2 workarounds to get fast like count :

Use the property as a label (which should be used only once).

MATCH (:Person)-[r:LIKE]->(:Matrix1)
RETURN count(r) as count

Store the count store on the property. The idea is to increment a likeCounter property on the node each time a relation is made.

Do you think one of these solutions works better than the other ? Do you have a better idea ? I would be glad to know your opinion.

dana_canzano · April 12, 2022, 12:22pm

@dylan_stouls

its not clear what version of Neo4j you are using but at least with Neo4j 4.4.3 ( presumably others ) if you run

profile MATCH (:Person)-[r:LIKE]->(:Movie {name : 'matrix1'})
RETURN count(r) as count

you will see one of the block is a

Expand(All)@neo4j
anon_0, r, anon_1
(anon_0)-[r:LIKE]->(anon_1)

which indicates its going to expand all relationships named :LIKE. If you have billions of :Person and each :LIKEs 5 :Movies this could be quite expensive

However if you change the query to

profile
MATCH (m:Movie {name : 'matrix1'})
RETURN size(  (m)-[:LIKE]->() )

this will include the countStore and as such the above query simply looks for the node(s) which have a name of 'matrix1` and then simply looks to that node(s) metadata and asks for # of outgoing :LIKE relationships. In this case we do not need to iterate over the N :LIKE relationships and count 1 by 1.
Note the countStore is called upon since the countStore hold the following details

# of Nodes per Label
# of Nodes per Label and per relationship type and per direction.

the countStore does not include details for

# of Nodes per Label and per relationship type and per direction. 'and to destination node label type`.

As such if your model is such that a :LIKE relationship type is used to join a :Movie to a :Person but it is also user to join a :Movie to a :SocialMediaPost then the query of

MATCH (m:Movie {name : 'matrix1'})
RETURN size(  (m)-[:LIKE]->() )

would report the number of LIKES in aggregate for both :Person and :SocialMediaPosts.

Further

MATCH (m:Movie {name : 'matrix1'})
RETURN size(  (m)-[:LIKE]->(:Person) )

would negate the usage of the countStore and thus we would need to iterate over all :Movie to :Person :LIKE relationships and count 1 by 1

glilienfield · April 12, 2022, 12:24pm

The 'count store' concept is new to me. From the link you provided, it looks like neo4j is tracking counts of basic entities, such as nodes and relationships. Because of that, getting counts of these basic entities is a look up instead of being calculated. It is applicable for a small set of scenarios, basically querying for the count of nodes with/without a single label and relationships with/without a type. The count can not include any 'where' conditions, as that then restricts the nodes, and the store count is tracking all the nodes.

To answer your question, 'count store' does not seem applicable to your query, where you are restricting your set of nodes to a specific set with a 'where' clause.

I personally don't like the approach of storing 'attribute like' information as a label, so I don't favor your approach to using the label :Matrix1 to identify matrix nodes. Can end up having a very large set of labels to track each movie.

I also don't like the concept of tracking aggregations as you suggest. I feel this is prone to error, as you will have to make that every path that you add or remove a node updates this count. It could be fairly complicated if you are tracking specific cypher patterns. How will you ensure that every time a change occurs that matches the pattern, the aggregation will be updated. How about people just doing adhoc stuff in Neo4j Browser.

I feel the best approach is to let neo4j calculate it dynamically when you need it. that way you ensure the value is correct. The approach I would recommend is with the pattern match you defined. I would index add an index on the name property for Movie nodes to improve the lookup of the anchor node. I would also remove the :Person label if only a person can like a movie, as cypher will not have to traverse across the relationship to verify the nodes label.

dylan_stouls · April 12, 2022, 4:18pm

Thank you @dana_canzano, your answer is exactly what i'm looking for.
I guess there is a typo in the relation direction but I got the point.

@glilienfield thank you for sharing your point of view on my workarounds. I guess you are right, these workarounds could lead to other issues.

Topic		Replies	Views
Count store - Where clause workaround Neo4j Graph Platform counts , migrated , cypher-tagged	1	156	November 5, 2022
Fast count with count store on entity relations Cypher querying , performance , cypher	2	337	September 23, 2020
Count store expand all problem Neo4j Graph Platform migrated	2	353	November 5, 2022
Cypher Query - Multiple Counts based on relationship properties to determine percentages Cypher cypher	0	1348	June 27, 2019
Optimizing in-count searches with neo4j Cypher	0	166	December 27, 2021

Get Certified in June!

Count store - Where clause workaround

Related topics