Data science DFS returns nonsense

running the code from the example here directly -

https://neo4j.com/docs/graph-data-science/current/algorithms/dfs/

it returns A->C->D->E->B
running the dfs again returns A->E->D->C->B

both of these are different from one another, and both are incorrect.

I know its an alpha feature, but just making a note that even the most basic version copy-paste from the docs doesn't work.

Hi Keith, Welcome!

Could you provide additional information about your environment (e.g. database and plugin versions), I guess it is a version specific issue. I'm running neo4j:3.5.14-enterprise, when I run the example you reference it returns the expected results (same) every time, for

MATCH (a:Node{tag:'a'})
WITH id(a) AS startNode
CALL gds.alpha.dfs.stream('myGraph', {startNode: startNode})
YIELD path
UNWIND [ n in nodes(path) | n.tag ] AS tags
RETURN tags
ORDER BY tags

tags
"a"
"b"
"c"
"d"
"e"

Thanks for the reply!

Sorry I should have been a bit more specific -- I removed "ORDER BY tags". One will of course get the same answer abcde every time when ordering by tags. This only shows that the nodes are reachable, which is true, but the actual path returned is not a DFS. While it is understandable that two searches may produce different results (child choice ordering etc), regardless the paths are not DFS.

As to versions, I have actually tried a number of them, but specifically/most pointedly the sandbox data science version. looking under the hood, it's still v3 which is perhaps telling?

Neo4j Browser version: 3.2.20
Neo4j Server version: 3.5.11 (enterprise)

Not sure which version of the plugin the sandbox is running. I just figured sandbox demo would be the most representative/most likely to work.

That being said, I've tried locally on 4.0.4 Enterprise with GDS 1.2.1 with the same results.

just to confirm, I have tried setting concurrency to 0 or 1 as well.

For reference here is what the example graph looks like
image

MATCH (a:Node{tag:'a'})
WITH id(a) AS startNode
CALL gds.alpha.dfs.stream('myGraph', {startNode: startNode})
YIELD nodeIds
match (n)
where ID(n) in nodeIds
return n.tag

n.tag
"c"
"e"
"d"
"b"
"a"

Agree, I don't see a DFS either.

alpha label...

thank you for the follow-up! again, no worries about alpha, i'm using apoc path expansion to solve the issue for the time being but just a heads up.

from my experience with that however, one thing that could be nice as a feature for dfs here would be an option to return a "full" traversal path... imagining we add a relation b->z above, this would be

a -> b -> e -> b -> z ->b -> a -> c -> d

with (e) - [:TO_PARENT] -> b or somesuch

these relationships could easily be filtered out, resulting in the same as the current schema. adding [:IS_BACKEDGE] for cycle detection might be helpful too.

I had no idea they published a Graph Data Science manual, thanks for the link!

https://neo4j.com/docs/graph-data-science/current/

This looks new... and great! :slight_smile:

Agree I'm glad to see good documentation around it, and also the new capabilities. I'm particularly excited about the in memory graph functions, great potential if it continues to be enhanced/expanded.

Looks like a bug - I get the same results as you. I think the valid paths should be:

a -> b -> e -> c -> d
a -> c -> d -> b -> e

but I'm also getting some paths that return 'd' in 2nd place, which I don't understand.

Can you create an issue on the repository? https://github.com/neo4j/graph-data-science/issues

You can just paste the contents of your first post on there.

thanks, just put an issue up!

https://github.com/neo4j/graph-data-science/issues/44