I have to create pyspark Application with Neo4j Integration.
Below i have written a simple code to setup things.
from pyspark.sql import SparkSession
url = "neo4j://localhost:port"
username = "neo4j"
password = "password"
dbname = "neo4j"
spark = (
SparkSession.builder.config("neo4j.url", url)
.config("neo4j.authentication.basic.username", username)
.config("neo4j.authentication.basic.password", password)
.config("neo4j.database", dbname)
.getOrCreate()
)
query = "MATCH (n) RETURN n LIMIT 10"
neo4j_df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("query", query)
.load()
)
But I am getting error:
Py4JJavaError: An error occurred while calling o245.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: org.neo4j.spark.DataSource. Please find packages at https://spark.apache.org/third-party-projects.html.
Help me to resolve this error.
Thanks:)