Any plans to make Path and Graph computable?


(Aanastasiou) #1

I have a directed Multigraph which is basically a number of directed graphs (of the same type), overlapped. There are quite a few queries in my use case that slice and dice this multigraph looking for shortest paths.

In general, this works. But... the return type of allShortestPaths does not remain a Path. It is basically a List<Relationship> and this is the only way to access it as.

I think that I may have come across a use case where this is a bit limiting and it would probably be worth adding some functionality for Path and Graph to be able to propagate within CYPHER queries.

The majority of my queries return sensible paths. However, it is possible that with absurd parameters, the paths that allShortestPaths returns have more than one components. In this case, I have no way (or so I think at least, from what I have tried so far) of knowing if the path is continuous or not.

In other words, allShortestPaths should be applied at the induced subgraph that results from a query on the stored graph. This query (that produces the induced subgraph) is the one that determines if the solutions to allShortestPaths will be valid or not. Right now, the filtering query is applied ON the allShortestPaths solution.

Knowing if there is a path between source and sink is very important to decide if the query results are meaningful.

This is because, if you assume that what allShortestPaths returns is indeed a Path, then you can safely iterate its relationships (and to an extent, it doesn't matter how you do that).

But when allShortestPaths returns "Paths" that are disjoint, then List<Relationship> contains more than one Paths that it is impossible to separate.

Since Path is a specialisation of Graph, maybe it would be worth having named patterns return stuff as Graph with the ability to apply CYPHER READ clauses further down the query. This could extend CYPHER logic operators towards things like MATCH p=[...] WHERE p IS connected WITH filter(e IN relationships(p) where [conditions]) as f_e ...., or MATCH G=(a)-[r]-(b) WHERE [conditions of subgraph] RETURN path_length(p) as L, clustering_coefficient(p) as C and so on.

The point is to make Path and/or Graph computable within CYPHER so that it is possible to pass them down the query chain from big towards smaller graphs.

This might make sense from the point of view of performance as well. At the end of the day, allShortestPaths would run faster in the induced Graph because the induced graph would have vastly less edges than the stored graph.

Some of this stuff is probably done (still in a limiting way) via APOC if you filter the nodes and relationships you want, but contrast this with something like MATCH U=()-[]-() where U is now a Graph and ()-[]-() defines the condition of creating an induced subgraph (which could have any sort of shape and form). And then, once you have your U, you could do something like return path_length(U) as L, clustering_coefficient(U) as C...Or even return U is connected as is_connected...or even CASE WHEN U IS CONNECTED.... and so on.

Is there a way to decide if the Path is continuous at the level of CYPHER currently? I may of course have been approaching this consistently from the wrong angle and therefore cannot see the "easy" solution :confused:

(Alastair Green) #2

Very perceptive questions. I'd like to comment on the general point about returning a sub-graph (or more generally, projecting a new graph) from a query.

There has been a lot of work on this subject in the openCypher community in the last eighteen months, and this feature has been implemented in the Cypher for Apache Spark project.

In Spark you have immutable data sets so you have to have support for multiple graphs.You can also create virtual graphs (views) over more than one input graph. These features make Cypher into a composable language like SQL, but with more flexible and complex graph objects, rather than tables.

Multiple named graphs can be stored in a PropertyGraph catalog in the Spark project, and there is also support for defining graph types with specific schemas to constrain permitted node types and relationship types and edge constraints.

In Neo4j we're discussing how and when to start bringing these powerful features into the Neo4j database, but there are no definite plans yet.

(Aanastasiou) #3

Hello Alastair

Thank you very much for your reply.

This is very interesting for another aspect of the project we are working on.

Is this handled at the programming language level via Spring (for example)? Another way to ask this is: Are these features "implemented" in a separate Java (possibly) library that is based on the current Neo4j implementation or are they "emulated" via translation of specific functionality to CYPHER queries?