I would like to run some graph data science algorithms on a sub spanning tree of nodes and relations of a larger graph. To get the spanning tree, I use the APOC library.
MATCH (e:Entity)
WHERE e.property in ["foo", "bar"] # some property condition
CALL apoc.path.spanningTree(e, minLevel: 0, maxLevel: 3})
YIELD path
WITH collect(path) as paths
CALL apoc.graph.fromPaths(paths, "tree", null)
YIELD graph
RETURN *;
I'm now at a loss of how to get that subgraph into a graph data science graph though. I don't think the native projection can capture the complex substructure of a sub spanning tree from specific starting nodes.
The only way I can currently see is to use the cypher projection. I would need to unravel the node and relationship ids and add match for them, but that approach seems very inelegant.
The easiest way would probably be if I could directly pass node and relationship ids to gds.graph.create, but that doesn't seem to be a possibility. Maybe I am missing something very obvious, but any and all suggestions would be appreciated.
I tinkered with this a bit, best I can tell the gds routines only operate on the main db store, I couldn't get them to recognize virtual relationships/nodes (or graph). It is what I expected, but gave it a try anyway. At the moment, I can see two ways to do this,
Use cypher project, just do the query twice. You'll need to adjust for your dataset and use, but here is my test query, below... Seems to work ok. Yes this runs the query twice, but that is the GDS design for Cypher projection, one query for nodes and another for the relationships.
CALL gds.graph.create.cypher(
'my-cypher-graph',
'match (e:Gene {name:"SNCA"})
CALL apoc.path.spanningTree(e, {minLevel: 0, maxLevel: 3, limit: 25}) YIELD path
unwind nodes(path) as n return id(n) as id',
'match (e:Gene {name:"SNCA"})
CALL apoc.path.spanningTree(e, {minLevel: 0, maxLevel: 3, limit: 25}) YIELD path
unwind relationships(path) as r
RETURN id(startNode(r)) AS source, id(endNode(r)) AS target, type(r) AS type'
)
One could use tagging to mark the subgraph by adding a new label to all the nodes in the subgraph (caveat: this modifies the graph!), then I believe one could use native projection (or cypher projection) to extract the subgraph easily. If identifying a sub-graph required a complicated or compute intensive process this approach might be worth exploring, but probably not for a simple spanning tree...
I have just been looking into a similar question, and using parameters you can pass in the node and relationship ids (GDSL docs - projection parameters)
MATCH path = (:Person)-[:ACTED_IN]->(:Movie)
WITH collect(path) AS paths
CALL apoc.graph.fromPaths(paths,'test', {})
YIELD graph AS g
WITH [node in g.nodes | ID(node)] AS nodeIds,
[rel in g.relationships | [ID(startNode(rel)),ID(endNode(rel))]] AS relIds
CALL gds.graph.create.cypher(
'test-param-input',
'UNWIND $nodes AS id RETURN id',
'UNWIND $relationships AS rel RETURN rel[0] as source, rel[1] as target',
{
parameters: { nodes: nodeIds, relationships: relIds }
}
) YIELD graphName, nodeCount, relationshipCount
return graphName, nodeCount, relationshipCount