cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Is it Possible to perform POS Tagging in sentence nodes in neo4j graph

jatinjaitleypro
Node Link

Hi, All I am new to the graph world. I am trying to dynamically generate the graph given spacy to do the tokenization and attach POS of each word as a property to each node? What is the best way to approach this kind of problem?

Suppose I have 2 sentences that I have created using the below code

WITH split(tolower("His dog eats turkey on Tuesday")," ") as text
Unwind range(0,size(text)-2) AS i
MERGE (w1:Word {name: text[i]})
MERGE (w2:Word {name: text[i+1]})
MERGE (w1)-[:NEXT]->(w2)
RETURN w1, w2
WITH split(tolower("My cat eats fish on Saturdays")," ") as text
Unwind range(0,size(text)-2) as i
MERGE (w1:Word {name: text[i]})
MERGE (w2:Word {name: text[i+1]})
MERGE (w1)-[:NEXT]->(w2)
RETURN w1, w2

Orignal question on DS exchange: nlp - In Neo4j is it possible to Dynamically generate the graph given spacy to do the tokenization a...

10 REPLIES 10

Hello @jatinjaitleypro and welcome to the Neo4j Community!

I am not sure how you define POS for a particular node?

The POS is based upon the sentence.

I am not familiar with spacy and how it defines POS.

Elaine

Elaine thanks for getting back

Basically add part of speech tags as properties to a node in neo4j.

Spacy example: Part-Of-Speech (POS) Tagging in Natural Language Processing using spaCy | Asquero

I want to attach these tags as node properties in Neo4j. Let me know if you need more clarification. Any help or leads would be highly appreciated

andy_hegedus
Graph Voyager

Hi,

My workflow with Spacy and Neo4J is to pass text from a node (such as an abstract on a document node). I run Spacy to get features of interest such as noun chunks or words. I then create word nodes and attach relationships to the originating document. Now the challenge for your use case is that POS for a given word can have very different values. For example and issue have when I have multiple documents with the same word nodes.
"A process to etch a wafer.."
"A process chamber to etch a wafer.."
"Process the wafer according.."
The POS of "process" is noun, adjective, and verb so attaching it to the node process is problematic. That leaves the relationship but you will need a convention since you are connecting two nodes with one relationship and where to put the value.

Perhaps you can elaborate on what you need to do with the value and you workflow.

Andy

Hi Andy,

You have a good point but my use case is different I want to perform concordance analysis.

https://orange3-text.readthedocs.io/en/latest/widgets/concordance.html

Concordance finds the queried word in a text and displays the context in which this word is used.
The idea is to implement it through graph for obvious reasons because graph traversal would be very easy and effective. It makes sense to use a graph-based approach here.
So I can query about a word and see in what context that word has been used. Since I am still very new to graph that is why I am not sure what is the best practice to perform this exercise. But my question still remains the same "

Any help or a pseudo code will be highly appreciated. Thanks a million

Hi,

You can put the POS tag into the relationship that connects the words. For example if you are looking at "doctor" as word node and then the word next might use the relationship "next", you could put the POS into that relationship. The choice now becomes since the relationship is touching two words you might want to have two properties in "next" such as "POS_from" and "POS_to". so in Spacy you would capture the POS tags of the words and create the relationship within Python client and set the properties. tied to that relationship.
Andy

Thanks @andy.hegedus for your response could you please share the pseudo code if possible. I am still not able to get my head around it as I am new to Neo4j. In Python NLP what I have done is below now how to get this in neo4j graph.

Hi,

Your original structure had word nodes connected by a relationship, "next".
Since the "next" relationship has a direction I would suggest that use attach the POS values into the specific relationships. To that end within python I would create a data table that has (setting all the words to lowercase.
Word1, Word2, Pos1,Pos2
In your example:
his, dog, PRON, NOUN
dog, eats,NOUN,VERB
eats, turkey,VERB,PROPN
turkey, on, PROPN, ADP
on, tuesday, ADP,PROPN

I would then create the word nodes with the unique property being term.
Assuming you are going to bring it through a csv file. (I find it faster in python to create sci and then pass the cyphers commands as opposed to going line by line in python)
Merge (w1:word{term:row.Word1)
Merge(W2:word{term:row.Word2)
Merge (W1)-[r:Next]->(W2)
set r.start = row.POS1
set r.end = row.POS2

Then the words are connected and you have the POS in the relation properties.
Andy

Hi Andy, Thanks for providing clarity. However, I tried doing the exercise using pandas dataframe. Doesn't seem to work for me. Getting error 'ValueError: dictionary update sequence element #0 has length 3; 2 is required. May be I am not passing the parameters correctly

Hello Andy,

I suggest you post your question in the NLP discussion area.

Have you explored Hume by GraphAware?

Elaine

Hi Elaine,

Thank you for your response. I am looking for an open-source solution. Not sure Hume by GraphAware is open source or not.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.


Free NODES Training Series


October 19th -

Intro to Neo4j


October 20th -

Healthcare Analytics Using Neo4j


October 25th -

Handling Neo4j data with Apache Hop


October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j


November 2nd -

Graph EDA Using the Neo4j GDS Client