Spark UDF calling neo4j

idayan · April 16, 2020, 9:19am

Hi,

On my app I have data of 5 million users. I load it into Neo4j as a graph.

Also, on Spark I processed the data, and for each user I need to query Neo4j.

I did it using a spark UDF, that http calls Neo4j server.

It took too long and get connection errors.

What is the better way to do 5M queries to neo4j?

david_allen · April 16, 2020, 10:28pm

The better way to do 5 million queries is to not do 5 million queries to neo4j. :).

The better way would be to use something like the neo4j spark connector. Use 1 cypher query to pull all of the data you need from Neo4j into a single DataFrame, and then use standard Spark SQL to join that resulting dataframe to the data that you have.

That's 1 big query pulling 5 million results, which you can then further partition and join in spark.

idayan · April 20, 2020, 9:28am

Thanks for you response.

Now I need to figure how to craft the query that ask 5M questions with getting out of memory :) . Will open new topic if needed.

Topic		Replies	Views
Spark connector with reactive neo4j driver Import / Export spark	2	171	November 16, 2023
Can we find any benchmarking figures for neo4j spark connector (DataFrame to DB) Neo4j Graph Platform	1	456	November 12, 2020
Hello everyone - Fábio from Florianópolis, Brazil Introduce-Yourself	2	281	March 22, 2021
How to write neo4j in python with neo4j-spark-connector Neo4j Graph Platform	12	2107	November 12, 2020
Current best approach to programmatically bulk import data into Neo4j from Spark? Import / Export	1	721	February 28, 2021

Spark UDF calling neo4j

Related topics