I have two groups of nodes in neo4j. The Mail node have this form
{
time : year/month/day,
content : Hey Anna Sorry I took awhile getting ...
}
The node word have this form:
{
word : ***
}
I want to match how many times the nodes Word match with the nodes Content. How can I do this ? There aren't any relationship between the nodes. For example, if I have
{
word : cat
}
{
time : year/month/day,
content : the cat is on the table
},
{
time : year/month/day,
content : the cat is on the floor
},
{
time : year/month/day,
content : the cat is somewhere over the raimbow
}
A query like this should return 3
MATCH (n:Content)
WHERE
n.time>'2001/01/01' and n.time<'2001/02/01'
MATCH (w:Word)
WHERE
ANY(word IN split(n.content,' ') WHERE word = w.word)
return w.word, #number of count#
I think what @accounts means to convey is .. do you want to compute word count per mail OR compute a count of mails where the specified word was found? Cause depending on that query would need to be adjusted.
From your original question, I believe, you want to compute number of occurrence of a Word Per Mail? Is that right?
create (:Mail{content:'this is a word count mail message'})
create (:Mail{content:'this is not a word count mail message'})
create (:Mail{content:'this is funny message'})
create (:Mail{content:'this is not so funny message'})
create (:Mail{content:'the message was lost in translation'})
create (:Mail{content:'this is a secret message'})
create (:Mail{content:'this is a not so secret message'})
create (:Mail{content:'this is a not so secret message, but this is a secret message'})
create (:Mail{content:'this is not so funny message, but this is a funny message'})
match (m:Mail)
match (w:Word)
where m.content contains w.word
merge (w)-[o:OCCURS_IN]->(m)
set o.count = size(apoc.text.indexesOf(m.content, w.word,0, -1))
match(fun:Word{word:'funny'})-[o:OCCURS_IN]->(m:Mail)
return fun, o, m
Try this:
MATCH (w:Word)
with collect(distinct w.word) as w1
MATCH (a:Mail)
with split(a.content, " ") as s1, w1, a
with a, s1, w1, apoc.coll.removeAll(w1, s1) as rmvd
return a.content, size(s1) as wordCount, (size(w1) - size(rmvd)) as matchedWords