Newbie question: How to aggregate and count identical query results?

arunas.kraujutis · September 8, 2018, 6:02pm

I am trying to do some text analysis in Neo4j and I want to write a query where it sorts the number of results in a descending order. My data is structured: (Word)->[next]->(Word)->[Next] etc, I want to write a query which says which are the most popular 3 word combinations, 4 word combinations, etc. I tried this but it always gives a count of one for word combinations:

MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word) 
WITH [w1.name,w2.name,w3.name] AS word_pair 
RETURN COUNT(word_pair) as frequency, word_pair 
ORDER BY frequency DESC LIMIT 50

michael.hunger · September 9, 2018, 12:04am

I think it's your data

This should actually work if you have the w1-w2-w3 pattern appearing in your graph more than once.

Can you share your data or an example?

arunas.kraujutis · September 9, 2018, 11:16am

This is how I loaded the data, so there is only one node for a unique word, but the relationships repeat. I thought there was a way to take an output of a query and count how many times it repeats within the dataset.

WITH split(tolower("My cat eats fish on Saturday")," ") as text
Unwind range(0,size(text)-2) as i
MERGE (w1:Word {name: text[i]})
ON CREATE SET w1.count = 1 ON MATCH SET w1.count=w1.count+1
MERGE (w2:Word {name: text[i+1]})
ON CREATE SET w2.count = 1 ON MATCH SET w2.count=w2.count+1
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count=r.count+1
RETURN w1,r, w2

second statement

WITH split(tolower("My cat eats cat food on Mondays")," ") as text
Unwind range(0,size(text)-2) as i
MERGE (w1:Word {name: text[i]})
ON CREATE SET w1.count = 1 ON MATCH SET w1.count=w1.count+1
MERGE (w2:Word {name: text[i+1]})
ON CREATE SET w2.count = 1 ON MATCH SET w2.count=w2.count+1
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count=r.count+1

michael.hunger · September 9, 2018, 3:38pm

You're both right and wrong.

if there were duplicate nodes/paths in the graph your query should work
but your import statements make sure that there are no duplicate nodes and rels in the graph

So what you need to do is to take either the word or relationship frequencies into account and sum them up.

e.g.

MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word)
RETURN [w1.name,w2.name,w3.name] AS word_pair,
sum(r.count + r2.count) as frequency
ORDER BY frequency DESC LIMIT 50

If you want to you can additionally sum up the counts of the words, or actually use a proper formula for computing a relevance score.

arunas.kraujutis · September 11, 2018, 9:28am

Thanks, that's really helpful!

Topic		Replies	Views
Count how many times a node match Cypher	5	765	October 18, 2020
Count distinct pairs i.e. Count rows "after" RETURN distinct a, b Cypher	5	4103	May 6, 2020
Count two relationships in one MATCH separately Neo4j Graph Platform counts	4	1336	December 17, 2019
Counting Number of Relationships Per Grouped Nodes Cypher apoc , performance , cypher , operations , relationship , index	2	1200	November 21, 2020
Aggregate related nodes across all rows Cypher apoc , browser	2	3273	December 17, 2018

August 🏄 🏖️ 🏊

Newbie question: How to aggregate and count identical query results?

Related topics