cancel
Showing results for 
Search instead for 
Did you mean: 

From Python notebook to Neo4j graph via Cypher query

jatinjaitleypro
Node Link

I have a dataframe which I wish to transfer from python to Neo4j. My dataframe looks like below.

3X_9_6_96791b889a34104272eac29694e7c05f7d7b32f3.png

I want the text column to be connected via Next relationship. Something like below.

I know the Cypher query. My requirement is I want the POS column rows attached as a property to each word. Example Node Dog has has POS NOUN so NOUN should be attached as a property to that node and the NEXT relationship should be maintained as shown above.

How can I write the query in python notebook and see the same results in Neo4j graph? Please assist me with the syntax as I am pretty new to Neo4j and Cypher?

1 ACCEPTED SOLUTION

The load_df will not work for you in this specific use case. Solved your problem as follows. Note that you need to install apoc library to make it work:

import spacy
import en_core_web_sm
import pandas as pd
nlp = spacy.load("en_core_web_sm")

text = "The wild is dangerous"
doc = nlp(text)
cols = ("text", "POS")
rows = []
for t in doc:
    row = [t.text, t.pos_]
    rows.append(row)
df = pd.DataFrame(rows, columns = cols)

#clean-up
db.query("MATCH (w:Word) detach delete w")
db.create_index("Word", "index")

q = """
UNWIND $data as row
MERGE (w:Word{text:row.text})
ON CREATE SET w.POS = [row.POS]
ON MATCH SET w.POS = CASE WHEN row.POS in w.POS THEN w.POS ELSE w.POS + [row.POS] END
WITH collect(w) as coll
WITH apoc.coll.pairsMin(coll) as pairs
UNWIND pairs as pair
WITH pair[0] as node1, pair[1] as node2
MERGE (node1)-[:NEXT]->(node2)
"""

db.query(q, {'data': df.to_dict(orient='records')})

text = "The rockstar is wild"
doc = nlp(text)
cols = ("text", "POS")
rows = []
for t in doc:
    row = [t.text, t.pos_]
    rows.append(row)
df_2 = pd.DataFrame(rows, columns = cols)

db.query(q, {'data': df_2.to_dict(orient='records')})

View solution in original post

7 REPLIES 7

jatinjaitleypro
Node Link

Here is what I have done so far.
I have py2neo and neo4j installed in my PC.
I wish to run the cypher query from my python notebook and the changes should reflect in NEO4j graph
Already know some basic like

import pandas as pd
from py2neo import Graph,Node,Relationship
from neo4j import GraphDatabase, basic_auth
graph = Graph("http://localhost:7474/browser/", auth=("neo4j", "*****"))

for index, row in df.iterrows():
    tx = graph.begin()
    tx.evaluate('''cypher query goes here''')
    tx.commit()

From python notebook by using a dataframe putting the value of second column POS as property and maintaining the Next relationship in the first column as shown above

Hi @jatinjaitleypro

How about this.
I just added pos.

WITH split(tolower("His dog eats turkey on Tuesday")," ") AS text,
     split("PRON NOUN VERB PROPN ADP PROPN"," ") AS pos
UNWIND range(0,size(text)-2) AS i
MERGE (w1:Word {name: text[i], pos: pos[i]})
MERGE (w2:Word {name: text[i+1], pos: pos[i+1]})
MERGE (w1)-[:NEXT]->(w2)
RETURN w1, w2

@koji No, I am not looking for this. The challenge I am facing is on jupyter notebook how do I perform the same operation via pandas dataframe from python notebook

import pandas as pd
from py2neo import Graph,Node,Relationship
from neo4j import GraphDatabase, basic_auth
graph = Graph("http://localhost:7474/browser/", auth=("neo4j", "*****"))

for index, row in df.iterrows():
    tx = graph.begin()
    tx.evaluate('''cypher query goes here''')
    tx.commit()

paltusplintus
Node Clone

neointerface package has load_df method, however in order to create NEXT relationships btw words you need to load the dataframe with index column and run an additional query. try something like:

#pip install neointerface
import neointerface
import pandas as pd
db = neointerface.NeoInterface(host="neo4j://localhost:7687" , credentials=("neo4j", "YOUR_NEO4J_PASSWORD"))
df = pd.DataFrame(...)
db.load_df(df.reset_index(), label="Word", merge=False)
db.create_index("Word", "index")
db.query("MATCH (w1:Word), (w2:Word) WHERE w2.index = w1.index + 1 MERGE (w1)-[:NEXT]->(w2)")

@paltusplintus

First of all thank you for your response I was able to use "neointerface".
But my goal is still not achieved. Here is what I have tried.

Then

Output:

Ideally the node the and "The" should have been created once but they were created twice.
example

and

What I am looking for is there should be no duplicate text
The POS against each text word should be created as a list or array or collection anything

Example: "text" : "wild" (should only be one node)
"POS": ["NOUN", "ADJ"]

The load_df will not work for you in this specific use case. Solved your problem as follows. Note that you need to install apoc library to make it work:

import spacy
import en_core_web_sm
import pandas as pd
nlp = spacy.load("en_core_web_sm")

text = "The wild is dangerous"
doc = nlp(text)
cols = ("text", "POS")
rows = []
for t in doc:
    row = [t.text, t.pos_]
    rows.append(row)
df = pd.DataFrame(rows, columns = cols)

#clean-up
db.query("MATCH (w:Word) detach delete w")
db.create_index("Word", "index")

q = """
UNWIND $data as row
MERGE (w:Word{text:row.text})
ON CREATE SET w.POS = [row.POS]
ON MATCH SET w.POS = CASE WHEN row.POS in w.POS THEN w.POS ELSE w.POS + [row.POS] END
WITH collect(w) as coll
WITH apoc.coll.pairsMin(coll) as pairs
UNWIND pairs as pair
WITH pair[0] as node1, pair[1] as node2
MERGE (node1)-[:NEXT]->(node2)
"""

db.query(q, {'data': df.to_dict(orient='records')})

text = "The rockstar is wild"
doc = nlp(text)
cols = ("text", "POS")
rows = []
for t in doc:
    row = [t.text, t.pos_]
    rows.append(row)
df_2 = pd.DataFrame(rows, columns = cols)

db.query(q, {'data': df_2.to_dict(orient='records')})

Thank you so much the solution works for my use case. Appreciate your help @paltusplintus