cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 1: replicate users.

What is the current approach to getting a Graphx graph from Neo4j?

mcsoini
Node

(already asked on SO)

I understand that the previous (deprecated) Neo4j Spark Connector allowed for the generation of Spark Graphs and GraphFrames using the corresponding methods of the org.neo4j.spark.Neo4j class. With the Neo4j class gone, the only examples I found using the new approach of generating DataFrames is based on something like:

spark.read.format("org.neo4j.spark.DataSource")
    .option("url", "bolt://localhost:7687")
    .option("query", "...")
    .load()

How do I get Graphx Graph instances directly using the current Neo4j Connector for Apache Spark? Or would I need to combine separate DataFrames with edges and nodes?

1 REPLY 1

conker84
Graph Voyager

Hi @mcsoini you can easily transform your Dataframe into an RDD by invoking df.rdd

i.e. given a graph like this (Person)-[:KNOWS]->(Person):

val persons: RDD[(VertexId, (String, String))] = spark.read.format("org.neo4j.spark.DataSource")
    .option("url", "bolt://localhost:7687")
    .option("labels", ":Person")
    .load()
    .rdd
    .map(row => (row.getAs[Long]("<id>"), (row.getAs[String]("name"), row.getAs[Long]("surname"))))

val knows: RDD[Edge(VertexId, VertexId, String)] = spark.read.format("org.neo4j.spark.DataSource")
    .option("url", "bolt://localhost:7687")
    .option("relationship.nodes.map", "false")
    .option("relationship", "KNOWS")
    .option("relationship.source.labels", "Person")
    .option("relationship.target.labels", "Person")
    .load()
    .rdd
    .map(row => Edge(row.getAs[Long]("`<source.id>`"), row.getAs[Long]("`<target.id>`"), row.getAs[Long]("`<rel.type>`")))

// and then
val graph = Graph(users, knows)
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online