Yen's Shortest Path throws error when applied to projection without parallel edges

I'm trying to get the top x unique shortest node paths. If I use a simple projection:

CALL gds.graph.project("graphSingleType", "*", {TYPE: {type: "*"}})

Yen's will return the same node path multiple times having taken different parallel edges between the nodes.

So I make a graph projection without parallel edges:

CALL gds.graph.project("graphSingleType_NP", "*", {TYPE:{type:'*', aggregation:"SINGLE"}})

The problem is that Yen's then only works when k=1 (ie when Dijkstra's is used). For all k>1, The following error is thrown:

Failed to invoke procedure `gds.shortestPath.yens.stream`: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0

I found the issue in GDS version 2.1 so I updated to 2.3 but the problem is still there.

Am I not understanding something or is this an actual bug? Alternatively, is there another way I can get the top x unique node paths?

To reproduce:

CREATE (a:CITY),
		(b:CITY),
		(c:CITY),
		(d:CITY),
		(e:CITY),
		(f:CITY),
		(a)-[:ROAD]->(b),
		(a)-[:ROAD]->(b),
		(b)-[:ROAD]->(c),
		(b)-[:ROAD]->(d),
		(c)-[:ROAD]->(f),
		(d)-[:ROAD]->(e),
		(e)-[:ROAD]->(c),
		(e)-[:ROAD]->(f),
		(a)-[:PATH]->(b),
		(d)-[:PATH]->(e),
		(d)-[:PATH]->(e)
MATCH (source), (target) WHERE id(source)=0 AND id(target)=5
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
                              {sourceNode:source, 
                               targetNode:target,
                               k:3})

Thank you for the nice write up and reproducible example. This looks like a bug to me, one that is related to the parallel aggregation. I will go ahead and report it.

For future reference, we consider most java.lang.* exceptions to be bugs. Feel free to report them here: https://github.com/neo4j/graph-data-science/issues .

You are correct that native projections are significantly faster for large graphs than Cypher projections.

I found a work around: use a cypher projection rather than a native projection.

From the example above, I get the following graph:

I then create my graph projection, "graphSingleNodeType_NP", a single graph without parallel edges.

CALL gds.graph.project.cypher(
  'graphSingleType_NP',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[r]->(m) RETURN DISTINCT id(n) AS source, id(m) AS target'
)

The projection removes all parallel edges, in the same way the native projection did.

I then run Yen's algorithm.

MATCH (source), (target) WHERE id(source)=0 AND id(target)=5
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
                              {sourceNode:source, 
                               targetNode:target,
                               k:10}) YIELD nodeIds RETURN nodeIds

And get the desired results: in this case the top 10 unique node paths. However, since no more than 3 unique node paths exist between nodes 0 and 5, only 3 paths are returned.

2023-02-03_12h59_40.png

It's probably worth mentioning that I found Native projections to be notably faster than Cypher Projections when used on larger Neo4j databases.

Also, I found this reference to be very helpful in understanding GDS graph projections https://tbgraph.wordpress.com/2020/03/30/analyzing-multigraphs-in-neo4j-graph-data-science-library for

What I have actually ended up doing:

So for my real database, a cypher projection is too slow (I gave up after 40 minutes). However, GDS 2.3 has added a handy function "toUndirected". Since I want the option of finding the shortest directed or undirected paths, this function handles half the battle for me. For now, I'll just use Dijkstra's for the directed option.

#1) Make a single graph projection with one edge type (one rel type, no parallel edges)

CALL gds.graph.project("graphSingleType_NP", "*", {TYPE:{type:'*', aggregation:"SINGLE"}})

#2) Add and "undirected" edge type to the projection

CALL gds.beta.graph.relationships.toUndirected(
  'graphSingleType_NP',
  {relationshipType: 'TYPE', mutateRelationshipType: 'TYPE_UD'}
)
YIELD inputRelationships, relationshipsWritten
RETURN inputRelationships, relationshipsWritten

#3) Calculating Yen's on the undirected edge type works

MATCH (source), (target) WHERE id(source)=31 AND id(target)=26
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
							  {sourceNode:source, 
							   targetNode:target,
							   k:3,
							   relationshipTypes:['TYPE_UD']
							}) YIELD nodeIds RETURN nodeIds
							
#4) Calculating Yen's on the original type still fails:

MATCH (source), (target) WHERE id(source)=31 AND id(target)=26
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
							  {sourceNode:source, 
							   targetNode:target,
							   k:3,
							   relationshipTypes:['TYPE']
							}) YIELD nodeIds RETURN nodeIds

And thanks for the responses. Nice knowing this will probably be fixed in the future. Cheers!

Hi @Jammer ,

Thank you for bringing this issue to our attention. We have identified the issue, we are working on a fix. We will let you know when a patch with the fix is released.

Best regards,

Ioannis.

Hi @Jammer again,

The fix is contained in GDS 2.3.1 which was released just a few days ago.

Thank you again for bringing this to our attention :)