Removal of stopwords using text file

cypher
neo4j-import

(Prashanth) #1

I loaded CSV file in neo4j successfully and now i want to remove stop words in the data set.
i have separate stop words list in a text file.I found an example code using stop words.But i want to replace it with my stop words list.How do i need to proceed?Can we load 2 data sets in one query?
image
Thanks in Advance


(Mark Needham) #2

If you had a stop word file that looked like this:

stopWord
the
and
my
they

You could remove those words like this (assuming you put that file in the import directory of your Neo4j installation):

LOAD CSV WITH HEADERS FROM "file:///stopWords.csv" AS row
MATCH (w:Word {name: row.stopWord})
DETACH DELETE w

(Prashanth) #4

my cypher query is.
LOAD CSV FROM "file:///kbv410000.txt" as row fieldterminator "."
with row
unwind row as text
with reduce(t=tolower(text), delim in [",",".","!","?",'"',":",";","'","-"] | replace(t,delim,"")) as normalized
with [w in split(normalized," ") | trim(w)] as words
unwind range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
ON CREATE SET w1.count = 1
ON MATCH SET w1.count = w1.count + 1
MERGE (w2:Word {name:words[idx+1]})
ON CREATE SET w2.count = 1
ON MATCH SET w2.count = w2.count + (case when idx = size(words)-2 then 1 else 0 end)
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count = r.count +1

when i apply above cypher query to remove stopwords

LOAD CSV WITH HEADERS FROM "file:///ThaiStopWords_Banking_V.1.0.0_20180927.txt" AS row
MATCH (w:Word {word: row.stopWord})
DETACH DELETE w

But i got result no changes.
Is anything i did wrong please let me know.
Thanks in advance!


(Mark Needham) #5

Does that one need the field terminator bit as well?