Processing after creating a 'virtual' graph/sub-graph

This is a general question about what processing is possible after returning a virtual graph/sub-graph.

Our data model includes nodes/rels needed for data lineage and other 'admin' purposes. To run analysis on the 'real' nodes/rels we are creating virtual sub-graphs. I'm trying to understand the implications of taking this approach...

  • Is it feasible to 'virtualise' an entire graph. In our case that would mean lifting out 18M nodes from 100M and maybe 50M rels from 230M?
  • Is any cypher functionality NOT able to run on virtual graphs?
  • Can the algorithms run on virtual graphs without restriction?

Another approach could be to dynamically build a second db from the master db limiting the data model to only the real nodes/rels.

Thanks in advance.
Mike

What kind of functionality would you want to run?

The existing virtual nodes and rels in APOC are mainly meant for visualization purposes.
There are a bunch of functions in apoc to allow you to access their properties, labels, type, id etc.
Cypher itself uses the lower level APIs so those virtual nodes with negative ids don't exist for it.

While virtualization would probably work at this scale I'd rather recommend to just use aggregation as needed and work on that aggregated data for regular cypher queries.

For graph algorithms it should work fine to e.g. use algo.graph.load to load your projection into a named graph and then run multiple algorithms on it, and either consume the results in a client or write the computations back to the graph.

1 Like

Thanks Michael,

There are a bunch of functions in apoc to allow you to access their properties, labels, type, id etc

Is this the case for output from all apocs? For example I am using apoc.path.expand and its derivatives and want to access properties from the returned path's node/rels.

What kind of functionality would you want to run?

Yes, we want to run algos, thanks for the algo.graph.load tip

It's only for Virtual Nodes which APOC creates on the fly that Cypher/Kernel-SPI cannot access them.

For expand etc. it's real nodes from the DB that are returned.