Simplify graph by removing "unnecessary" nodes from graph and result


Having serious fun exploring the possibility of using Neo4j for my companies analytical needs.

I'm currently working on the following problem. I have a somewhat polluted topology. some paths between two nodes of interest look like:

(**node of interest 1**) - [:links to] -> (polluted node 1) - [:links to] -> (polluted node2) - ... -> (**node of interest 2**)

What is the best way to filter out this pollution and obtain a result like:

(**node of interest 1**) - [:links to] -> (**node of interest 2**)

In the fysical world my graph is representing the polluted nodes are nothing, they are a result of data entry.

The links between the fictional and 'real' nodes in entire path are in the fysical world one and the same thing. So by having this pollution any analytics on the graph (total distance for example for shortest path finding) would yield some proper garbage.

You can probably use apoc's virtual relationships

Thanks! I've watched the included video, looks interesting!

I cannot see directly how to use virtual Nodes and Relationships to 'aggregate' these redundant nodes away for now. It looks like adding vNodes will only add more nodes to the graph.

Can you push me in the right direction with an example Cypher query? Or do you need an example dataset to work with? If so, I can try to add one.

In the video, it was hard to see what was being typed. And the presenter made a few stumbles which made it harder to follow.

I'm still puzzled by virtual nodes and relationships. The documentation is a bit sketchy.

I did find this well-worked out example more helpful but I'm going to have to think about this more: