cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Running GDS on a subsets of a graph projection based on list of nodes

bjoernoesth
Node Link

So I would like to run different algorithms on a subset of data, specified by a list of nodes and edges. I know I can do certain things with the cypher projection, but is it possible to provide a specific list of nodes/edges? Especially different centrality measures.

For example for the pageRank I can provide a set of "sourceNodes" as well as my projected graph. This mean I can use the apoc.path.subgraph, and then generate a personalized pageRank. However most of the algorithms does not seem to allow this (and gds.alpha.eigenvector.stream just uses sourceNodes as start nodes and still seem to traverse the whole projected_graph)
Would I need to create a anonymous graph using cypher projection each time? (you mention that cypher projections are not recommended in productive systems) Or are there some way where I can run for example “Degree Centrality” on a subgraph specified by a set of nodes (and edges)?
Or are there any other good ways of dealing with this?

For pageRank I can do

MATCH (p:Node {name: "mynode "})
CALL apoc.path.subgraphAll(p, {
relationshipFilter: "KNOWS",
minLevel: 0,
maxLevel: 2
})
YIELD nodes
WITH nodes as sourceNodes
CALL gds.pageRank.stream(
'projected_graph',
{ maxIterations: 20,
sourceNodes: sourceNodes})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10

4 REPLIES 4

Good question! We're actually introducing the ability to create subgraph projections, based on node or relationship properties in our 1.6 release of the graph data science library: Create a Subgraph in the Catalog

If you want to test out the feature, you can pull a pre-release version: GDS 1.6-alpha04. GDS 1.6 will be GA on May 27.

bjoernoesth
Node Link

thats what I call perfect timing, it looks great. I just tried it, but I do get some errors in the style of

"Failed to invoke procedure gds.beta.graph.create.subgraph: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 18466 out of bounds for length 1"

for the example (and also if I use my own data), running it witht he Neo4J 4.2.5.

One question, any chance this would be available as anonymous graph or some kind of streaming so I would avoid having to store/delete a subgraph each time (my usecase would be get a subgraph, run algorithm, return results, no need to store it)

Yikes - and this is what pre-releases are for!

Can you share the commands you tried to run, the graph.list() info from the in-memory graph you're trying to project from, and the results of CALL gds.debug.sysInfo() ? Feel free to DM me if that's easier.

RE your feature request - what would you expect the surface to look like? Basically pushing the subgraph projection commands into the algorithm call? It's not possible now, but certainly something we can consider for future releases.

bjoernoesth
Node Link

@alicia.frame1 Regarding how the surface could look like. I would imagine something the pushing the graph projection together with a list of nodes/filter directly. For example something like below. But im not picky. I just would love the be able to throw in a list of nodes that I get from somewhere else and subset it on that

CALL gds.<algo>.<mode>(
  'new-graph-name',
subgraph: {
nodeFilter,
edgeFilter
} 
  {
   
    // algorithm configuration
  }