Depth First is going from target to root instead of start to target

Hi,

I have a large tree structure and my goal is to find the first intersection between to points.

In the test here I start a point and following a path to another node. Hopefully you can see the picture.
The query returns a path from the target all the way back to base root. The start point is not included on the path. Returned path
class
"A01D41/1276"
"A01D41/127"
"A01D41/12"
"A01D41/00"
"A01D34/00"
"A01D"
"A01"
"A"

Notice the start node, 'A01D41/10', is not on the list.
Second part: There is property key, level (int), associated with each node. How do I return the minimum level (lowest common node) and its ID?

So what is wrong with this query:
MATCH (start:cpc{cpcClass:'A01D41/10'}),(target:cpc{cpcClass:'A01D41/1276'})
CALL algo.dfs.stream('cpc', 'Reports_to', 'start', Id(target)) YIELD nodeIds
UNWIND nodeIds as nodeId
RETURN algo.asNode(nodeId).cpcClass as class

Which version of the graph algos library are you using? Also I don't see any place you've provided the start node or its id, you've given a string 'start' which isn't associated with your start variable.

You should review the signature of the procedure:

call dbms.procedures() yield name, signature
where name contains 'dfs.stream'
return name, signature

I have Graph Algorithms vers 3.5.14.0 installed.

The documentation gives this
algo.dfs.stream

CALL algo.dfs.stream(label:String, relationshipType:String, startNodeId:long, direction:Direction, {writeProperty:String, target:long, maxDepth:long, weightProperty:String, maxCost:double}) YIELD nodeId

From that I am guessing that I need to pass:
label which is 'cpc as string
realationshipType which is 'Reports_to' as string
Start Node ID as long which I try as ID(start) but this returns error.
direction is a bit unclear since I want both ways though the relationship Reports_to is directional

I can't find a definition for write property so I am at bit of a loss?

The signature returned is which is even more cryptic and does not match the documentation.

algo.dfs.stream" "algo.dfs.stream(label :: STRING?, relationshipType :: STRING?, direction :: STRING?, startNodeId :: INTEGER?, config = {} :: MAP?) :: (startNodeId :: INTEGER?, nodeIds :: LIST? OF INTEGER?, path :: PATH?)"

Note: I am learning at this and unfortunately the "Graph Algorithms" Book NEO4J recommends does not have an example of this algorithm. It talks about it but does not give example which would help with the exact syntax needed.
Andy

I think you gave the version of Neo4j, not the graph algos version, can you check again?

For direction, there are a number of accepted values, but "OUTGOING", "INCOMING" or "BOTH" will work.

I'll check with the algos team about the signature and documented param differences, that will require a documentation update, but what you're seeing from the query I provided should be the source of truth for the ordering.

See if this will work for you:

MATCH (start:cpc{cpcClass:'A01D41/10'}),(target:cpc{cpcClass:'A01D41/1276'})
CALL algo.dfs.stream('cpc', 'Reports_to', 'BOTH', id(start), {targetNodes:[id(target)]}) YIELD nodeIds
UNWIND nodeIds as nodeId
RETURN algo.asNode(nodeId).cpcClass as class

Hi Andrew,

I gave it a try and it did work thank you.

The results are bit different then my naive expectation, but that is on me.

Andy

If you could clarify again exactly what you're trying to find, that may help us give you a better solution. Simply finding a dfs path between the nodes seemed like a first step and not the end result.

Also a follow-up, the graph algos library is now deprecated, take a look at the Graph Data Science library instead.

Algos should have been ported to this new library, with docs and bug fixes applied, though you should review the documentation as some of the procs and usage have changed.

Hi Andrew,

What I am looking for is the actually the deepest node. Without trying to give too much away. I am looking at patents and there is prior work on the topic though I think it did not fully exploit graph theory and my goal is strategies that use other ML techniques to create additional meta data that create new and additional relationships and additional strength to the graph. So with that premise.

I have two master nodes A & B and each of them will have certain classifications assigned to each of them with the number of classifications varying from 1 to as much as 200 though typically in the range of 2-10. The classifications are a pure tree graph. So for any two pair of master nodes what is the deepest deepest node for any pair of classification nodes. I actually don't need the whole path just the deepest node. The end point is a table of classification nodes for each pair and deepest node. I could impose a limit depth of 3 or from each side returning null if there is no common node within that number of levels. This will be input into other analysis tools.
Endpoint is a table for export.
Master A Master B Deepest
XXLB XXLA XXL
XXLBD XXLBF XXLB
XXLBD XXLA XXL

On Top of this is that the master nodes come from two holding groups and I would then do the pairwise look on each pair from the holding groups. There could be a fair amount of redundancy in the classification pairs so if there is a strategy of once the classification pair is identified with its deepest common node that result could be copied that may be a faster search strategy. But at the moment I working of looking if this search algorithm brings value.

Andy

I see, so in your image, C would be the deepest common node for your start and target nodes.

If your query is only finding the common node for two input nodes, then this is easy: first match the entire path for one of the nodes and collect all nodes in that path as a list, then match from the other node such that the end node must be in that list, and limit to a single result. Cypher expansions by default use depth first expansion, so (assuming no cycles in the graph for this relationship) the first hit will be the deepest common node.

For your query:

MATCH (start:cpc{cpcClass:'A01D41/10'})-[:Reports_to*0..]->(ancestor)
WITH collect(ancestor) as ancestors
OPTIONAL MATCH (target:cpc{cpcClass:'A01D41/1276'})-[:Reports_to*0..]->(ancestor)
WHERE ancestor IN ancestors
RETURN ancestor
LIMIT 1

Hi,

Thank you it seems to work. Now I need to work to get all the cpc codes for a given patent.
To give you a picture of what is happening.


2 patents
Patent 1 has three codes
Patent 2 has two codes

Code C is shared by both fine.
The A-D path yield one a common ancestor while the B-D path has a more distant common ancestor. The paths A-C and D-C do not have a common ancestor.

So the challenge now is craft a query that returns all the pairs with NULL if no common ancestor exists.

Thank you again.

Andy