So I would like to run different algorithms on a subset of data, specified by a list of nodes and edges. I know I can do certain things with the cypher projection, but is it possible to provide a specific list of nodes/edges? Especially different centrality measures.
For example for the pageRank I can provide a set of "sourceNodes" as well as my projected graph. This mean I can use the apoc.path.subgraph, and then generate a personalized pageRank. However most of the algorithms does not seem to allow this (and gds.alpha.eigenvector.stream just uses sourceNodes as start nodes and still seem to traverse the whole projected_graph)
Would I need to create a anonymous graph using cypher projection each time? (you mention that cypher projections are not recommended in productive systems) Or are there some way where I can run for example “Degree Centrality” on a subgraph specified by a set of nodes (and edges)?
Or are there any other good ways of dealing with this?
For pageRank I can do
MATCH (p:Node {name: "mynode "})
CALL apoc.path.subgraphAll(p, {
relationshipFilter: "KNOWS",
minLevel: 0,
maxLevel: 2
})
YIELD nodes
WITH nodes as sourceNodes
CALL gds.pageRank.stream(
'projected_graph',
{ maxIterations: 20,
sourceNodes: sourceNodes})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
Good question! We're actually introducing the ability to create subgraph projections, based on node or relationship properties in our 1.6 release of the graph data science library: Create a Subgraph in the Catalog
If you want to test out the feature, you can pull a pre-release version: GDS 1.6-alpha04. GDS 1.6 will be GA on May 27.
thats what I call perfect timing, it looks great. I just tried it, but I do get some errors in the style of
"Failed to invoke procedure gds.beta.graph.create.subgraph: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 18466 out of bounds for length 1"
for the example (and also if I use my own data), running it witht he Neo4J 4.2.5.
One question, any chance this would be available as anonymous graph or some kind of streaming so I would avoid having to store/delete a subgraph each time (my usecase would be get a subgraph, run algorithm, return results, no need to store it)
Can you share the commands you tried to run, the graph.list() info from the in-memory graph you're trying to project from, and the results of CALL gds.debug.sysInfo() ? Feel free to DM me if that's easier.
RE your feature request - what would you expect the surface to look like? Basically pushing the subgraph projection commands into the algorithm call? It's not possible now, but certainly something we can consider for future releases.
@alicia_frame1 Regarding how the surface could look like. I would imagine something the pushing the graph projection together with a list of nodes/filter directly. For example something like below. But im not picky. I just would love the be able to throw in a list of nodes that I get from somewhere else and subset it on that