I'm reading about virtual nodes and relationships. I'm not sure what the use case for them would be? Can someone tell me more about them and where they would be used?
Virtual nodes and relationships are used for quite a number of things.
You can create visual representation of data that are not in the graph, e.g. from other databases, that's also what the
apoc.bolt.load procedure supports.
Also the virtual nodes and relationships are used by
You can use it to visually project data, e.g. aggregate relationships into one, or collapse intermediate nodes into virtual relationships etc. For instance, you can project a citation graph into a virtual author-author or paper-paper graph with aggregated relationships between them.
Or turn twitter data into an user-user mention graph.
This is already automated in
apoc.nodes.group which automatically groups nodes and relationships by grouping properties, read more about that here.
You can combine real and virtual entities, e.g. connecting two real nodes with a virtual relationship or connecting a virtual node via a virtual relationship to a real node.
Apoc has already some means to also create "virtual graphs" which can also be used for export.
Some more uses of virtual entities:
- return only a few properties of nodes/rels to the visualization, e.g. of you have huge text properties
- visualize clusters found by graph algorithms
- aggregate information to a higher level of abstraction
- skip intermediate nodes in a longer path
- hide away properties or intermediate nodes/relationships for security reasons
- graph grouping
- visualization of data from other sources (computation, RDBMS, document-dbs, CSV, XML, JSON) as graph without even storing it
- projecting partial data
You can also create them yourself e.g. for projections.
One thing to keep in mind: as you cannot look up already created virtual nodes from the graph you have to keep them in your own lookup structure. Something that works well for it is
apoc.map.groupBy(Multi) which creates a map from a list of entities, keyed by the string value of a given property.
Virtual entities so far work across all surfaces, Neo4j-Browser, Bloom, neovis, and all the drivers, which is really cool, even if it was not intended.
They are mainly used for visualization as Cypher itself can't access them. That's why I added a bunch of functions to access their properties, labels, and rel-types.
In some future, they might be subsumed by graph views, the ability to return graphs and composable Cypher queries in Cypher 10.
We also allow graph projections to be used as inputs for graph algorithms, so you don't actually have to change your data to run an algorithm on a different shape but you can just specify node- and relationship-lists with two Cypher statements.
This turned out much more than I originally intended, almost a blog post :)
Looking forward to have projections on the physical graph in Neo4j as first class citizen in Cypher. That would be helpful to blend graphs persisted in Neo4j with other data retrieved on the fly from other repositories, without persisting them in Neo4j. I am thinking about our future data catalog for different data stores (data lake, dwh, rdbms) used in my company and how we could blend the meta data from from this different sources together via Neo4j to present it to end users via GraphQL based web front end.
I also wonder whether Cypher for virtual views could contribute to build onthologies on the label/property graph, like OWL does for rds graphs. I am getting constantly push back from colleagues that favour RDS over label/property graphs like Neo4j because they fall in love with OWL.