Hi Graphistas!
I am fooling around with fabric and was curious about enterprise scale query federation and symbolic linking across shards...
using a configuration like this:
fabric.database.name=neo4jfabric
fabric.graph.0.uri=neo4j://localhost:7687
fabric.graph.0.database=neo4jshard1
fabric.graph.0.name=neo4jshard1
fabric.graph.1.uri=neo4j://localhost:7687
fabric.graph.1.database=neo4jshard2
fabric.graph.1.name=neo4jshard2
I split the movie graph into two shards, one with the -[ACTED_IN]- relationships, and one with all the other relationships (after loading the whole :play-movies graph into each shard):
//only (:Movie)<-[:ACTED_IN]-(:Person)
:use neo4jshard1;
MATCH (n:Person) WHERE NOT (n)-[:ACTED_IN]->() DETACH DELETE n;
MATCH (n:Person)-[r:DIRECTED|WROTE|PRODUCED]->() DELETE r;
MATCH (n:Movie) WHERE NOT (n)<--() DETACH DELETE n;
//every other type of role for (Movie)<-[*]-(Person), except ACTED_IN
:use neo4jshard2;
MATCH (n:Person) WHERE NOT (n)-[:DIRECTED|WROTE|PRODUCED]->() DETACH DELETE n;
MATCH (n:Person)-[r:ACTED_IN]->() DELETE r;
MATCH (n:Movie) WHERE NOT (n)<--() DETACH DELETE n;
I can do lots of queries that inspect these shards:
UNWIND neo4jfabric.graphIds() AS graphId
CALL {
USE neo4jfabric.graph(graphId)
MATCH (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
RETURN m,r,p
}
RETURN *
but what I'd really like to do is create a new virtual graph that recognizes that the two instances of (m:Movie {title: "The Matrix" )
are in fact (semantically) the same node, by creating a virtual node that bridges the two actual nodes (movie_shard1)-(movie_virtual)-(movie_shard2). This would open the door for some really advanced federated queries using fabric.
I've experimented with creating nodes in the fabric graph (failed)
UNWIND neo4jfabric.graphIds() AS graphId
CALL {
USE neo4jfabric.graph(graphId)
MATCH (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
RETURN m,r,p
}
WITH DISTINCT m.title as title
CREATE (:Movie_V {title: title})
Invalid combination of query execution types: READ_WRITE, EXPLAIN:WRITE
and I've done some initial testing with apoc.create.virtual.fromNode(node, [propertyNames]) and failed to import the fabric node...
Failed to invoke function `apoc.create.virtual.fromNode`: Caused by: org.neo4j.internal.kernel.api.exceptions.EntityNotFoundException: Unable to load NODE with id 1125899906842625.
..and if I completely rip the nodes and rels apart, I can rebuild a workable virtual graph:
UNWIND neo4jfabric.graphIds() AS graphId
CALL {
USE neo4jfabric.graph(graphId)
MATCH g = (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
RETURN
labels(m) AS m_lbl,
properties(m) AS m_prop,
type(r) AS r_type,
properties(r) AS r_prop,
labels(p) AS p_lbl,
properties(p) AS p_prop
}
WITH m_lbl,m_prop,COLLECT([p_lbl,p_prop,r_type,r_prop]) AS rows
CALL apoc.create.vNode(m_lbl,m_prop) YIELD node AS movie
UNWIND rows AS row
CALL apoc.create.vNode(row[0],row[1]) YIELD node AS person
CALL apoc.create.vRelationship(person,row[2],row[3],movie) YIELD rel
RETURN movie,rel,person
so maybe there's an more efficient way to accomplish this directly from the fabric nodes?
Any tips or suggestions would be welcome!
Thanks, Michael