Deleting nodes/relationships with the Spark connector

What is the recommended way to delete nodes in a Neo4j database using the Spark connector?
Documentation around this topic seems to be lacking. The following code is not considered a valid write query in neo4j_connector_apache_spark_2_12_5_1_0_for_spark_3.jar. Guidance on best-practices would be welcome.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType

spark = SparkSession.builder.getOrCreate()

empty_df = spark.createDataFrame(
    [], 
    schema=StructType([
        StructField("boilerplate", IntegerType(), nullable=True),
    ])
)

(empty_df
     .write
     .format("org.neo4j.spark.DataSource")
     .option("url", f"bolt://{NEO_URL}:{NEO_PORT}")
     .option("authentication.basic.username", NEO_USER)
     .option("authentication.basic.password", NEO_PWD)
     .mode("overwrite")
     .option("query","""
             :auto
             MATCH (n)
             CALL { 
                WITH n
                DETACH DELETE n
            } IN TRANSACTIONS OF 10000 ROWS;
            """)
     .save())

You should rather use the Java driver directly here.

The Spark connector for Neo4j is useful when you either to create a DataFrame from Neo4j data, or persist a DataFrame to Neo4j.

In your example, the DataFrame plays no meaningful role so the Spark connector won't really help.

1 Like