Spark UDF calling neo4j


On my app I have data of 5 million users. I load it into Neo4j as a graph.

Also, on Spark I processed the data, and for each user I need to query Neo4j.

I did it using a spark UDF, that http calls Neo4j server.

It took too long and get connection errors.

What is the better way to do 5M queries to neo4j?

The better way to do 5 million queries is to not do 5 million queries to neo4j. :).

The better way would be to use something like the neo4j spark connector. Use 1 cypher query to pull all of the data you need from Neo4j into a single DataFrame, and then use standard Spark SQL to join that resulting dataframe to the data that you have.

That's 1 big query pulling 5 million results, which you can then further partition and join in spark.

Thanks for you response.

Now I need to figure how to craft the query that ask 5M questions with getting out of memory :) . Will open new topic if needed.