What is the current approach to getting a Graphx graph from Neo4j?

(already asked on SO)

I understand that the previous (deprecated) Neo4j Spark Connector allowed for the generation of Spark Graphs and GraphFrames using the corresponding methods of the org.neo4j.spark.Neo4j class. With the Neo4j class gone, the only examples I found using the new approach of generating DataFrames is based on something like:

    .option("url", "bolt://localhost:7687")
    .option("query", "...")

How do I get Graphx Graph instances directly using the current Neo4j Connector for Apache Spark? Or would I need to combine separate DataFrames with edges and nodes?

Hi @mcsoini you can easily transform your Dataframe into an RDD by invoking df.rdd

i.e. given a graph like this (Person)-[:KNOWS]->(Person):

val persons: RDD[(VertexId, (String, String))] = spark.read.format("org.neo4j.spark.DataSource")
    .option("url", "bolt://localhost:7687")
    .option("labels", ":Person")
    .map(row => (row.getAs[Long]("<id>"), (row.getAs[String]("name"), row.getAs[Long]("surname"))))

val knows: RDD[Edge(VertexId, VertexId, String)] = spark.read.format("org.neo4j.spark.DataSource")
    .option("url", "bolt://localhost:7687")
    .option("relationship.nodes.map", "false")
    .option("relationship", "KNOWS")
    .option("relationship.source.labels", "Person")
    .option("relationship.target.labels", "Person")
    .map(row => Edge(row.getAs[Long]("`<source.id>`"), row.getAs[Long]("`<target.id>`"), row.getAs[Long]("`<rel.type>`")))

// and then
val graph = Graph(users, knows)