- neo4j 4.4
- spark(pyspark) 3.3.1
I have a Cypher that works well on browser, which is to select two nodes (ID 188 & ID 189) and build a new edge (property type='test') for them:
match (n:entity) where id(n)=188 with n
match (m:entity) where id(m)=189 with m, n
create (n)-[r:relation {type:'test'}]->(m)
Now I am trying to do the same through pyspark. First I read these two nodes by using option 'query':
from pyspark.sql import *
spark = SparkSession.builder.getOrCreate()
result = spark.read \
.format("org.neo4j.spark.DataSource") \
.option("url", "bolt://localhost:7687") \
.option("query", "match (n:entity) where id(n)=188 with n match (m:entity) where id(m)=189 return m, n") \
.load().toDF('m', 'n')
Get these two nodes as result:
+---------------------+---------------------+
| m| n|
+---------------------+---------------------+
|{189, [entity], G...|{188, [entity], G...|
+---------------------+---------------------+
Then I tried to use m & n to write a new edge through query as well
result.write\
.format("org.neo4j.spark.DataSource")\
.mode("Overwrite")\
.option("url", "bolt://localhost:7687") \
.option("query", "create (n)-[r:relation {type:'test'}]->(m)")\
.save()
Yet I realize this script generates two new nodes rather than using the existing m & n to build the new edge.
How can I pass the parameters m & n from the former reading script into the writing script?