How apoc.path.subgraphNodes travers the graph?

axel · October 13, 2020, 4:53pm

Hi!

We are trying to implement a very simple content based recommendation query for our database.

The database for now only includes the following nodes and relationships:

Nodes & properties:

(Course {course_id})
(Game {game_id})
(Coach {coach_id})

Relationships:

(:Course)-[:BELONGSTO]->(:Game)
(:Coach)-[:PUBLISHED]->(:Course)

And given a course_id we want to travers the graph to RETURN 20 other courses to recommend to the user.

After researching, testing and evaluating the profile(db hits, etc) with different approaches we got the following query:

Query
MATCH (c:Course {course_id: $course._id}) CALL apoc.path.subgraphNodes(c, {limit: 50,minLevel: 1,maxLevel: 2}) YIELD node WITH node WHERE labels(node)=["Course"] RETURN node LIMIT 20

The limit:50 for the apoc path expander is more a hack to prevent traversing all possibilities, but we cannot limit 20 at that point since it's taking into account all nodes involved (Game, Coach and Course).

So the "problem" for us is that, doing it this way subgraphNodes looks like is trying first one path which in our case is always the one that relates the specific course with the coach: (:Coach)-[:PUBLISHED]->(:Course) and therefore is giving back only courses from that specific coach.
We never get to the point where it recommends courses from the same game but a different coach.

Why subgraphNodes is always using that first path, is it related with the order of the id's?

And we would like to get roughly the same amount per each possible path, but we couldn't find a way for that.

If anyone knows a better approach we would be much appreciated and thank you so much for your time

Neo4j version in Aura: 4
JavaScript driver: 4.1.2

ameyasoft · October 15, 2020, 4:53am

I created a simple graph with your nodes and relationships.

MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})

MERGE (g:Game {game_id: toInteger(120)})

MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})

MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)

MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)

MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)

Result:

Screen Shot 2020-10-14 at 9.15.03 PM

Since the focus is on a particular game and associated courses and coaches:

MATCH (g:Game) 
CALL apoc.path.subgraphNodes(g, {}) 
YIELD node
RETURN node

Result: as above picture
In apoc.path.subgraphNodes you can add labelFilter and relationshipFilter:

MATCH (g:Game) 
CALL apoc.path.subgraphNodes(g, {labelFilter:'Course'}) 
YIELD node
RETURN node

Result:
Screen Shot 2020-10-14 at 9.15.59 PM

axel · October 15, 2020, 8:02am

Hello @ameyasoft and thank you so much for your answer.

Our problem is more the following (sorry maybe I didn't explain well before):
This would maybe be a more realistic graph:

MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})
MERGE (c6:Course {course_id: toInteger(55)})

MERGE (g:Game {game_id: toInteger(120)})
MERGE (g1:Game {game_id: toInteger(130)})

MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})

MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)
MERGE (c6)-[:BELONGSTO]->(g1)

MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)
MERGE (b)-[:PUBLSIHED]->(c6)

MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)

And our start point would have to be a specific course instead of the game, using the course_id property. This would be I think the best query for the path with max of two hops, considering there would be more games, coaches and courses.

MATCH (g:Course {course_id: 40}) 
CALL apoc.path.subgraphNodes(g, {maxLevel: 2}) 
YIELD node
RETURN node

With the following graph:
graph

Now if we wanted to stop once we got 3 other courses how do you think we could do it? Instead of having to travers all the subgraph and then limit.

And if we try with the real database the following query:

MATCH (g:Course {course_id: "5ece57514bafe30004054b70"}) 
CALL apoc.path.subgraphNodes(g, {maxLevel: 2, limit:20}) 
YIELD node
RETURN node

We get:

Which only has courses from the coach it was published but not courses from other coaches that would come from the game. Why is that the behaviour?

Thank you for your time :)

ameyasoft · October 15, 2020, 4:41pm

Try this:
MATCH (g:Course {course_id: "5ece57514bafe30004054b70"}) 
CALL apoc.path.subgraphNodes(g, {labelFilter:'Coach|Game|Course', maxLevel: 2, limit:20}) 
YIELD node
RETURN node

This should show all the courses connected to Game node.

ameyasoft · October 15, 2020, 4:47pm

You can get same results with this Cypher:

match (c:Coach)-[]-(b:Course)-[]-(g:Game)
where b.course_id = 40
match (d:Coach)-[]-(e:Course)-[]-(g)
Where d.coach_id <> c.coach_id and e.course_id <> b.course_id
return c, b, g, d, e

andrew_bowman · October 16, 2020, 12:05am

When using subgraphNodes(), it's using NODE_GLOBAL uniqueness during traversal, meaning that once you find a node in the graph, that node will never be returned from any other path. Once it's visited, it will not be visited by another means.

So once courses are visited via one path (from a specific coach), then it won't ever be returned by any other path. The first one wins.

It also uses breadth-first expansion by default, so the closest nodes will be found before those at a longer distance away. You can use bfs:false in the config map to change that to a depth-first expansion, which may find nodes at a longer distance away (though this is not guaranteed).

axel · October 16, 2020, 9:28am

Thank you so much @ameyasoft and @andrew_bowman for your time. I think now we are very close to the outcome we were looking for.

Thanks again for your help

axel · October 20, 2020, 3:36pm

@ameyasoft @andrew_bowman
At the end we use the following query, also it's a bit different since we added the [:VIEWED] relationship:

MATCH (c:Course {course_id: "$course_id"}) MATCH (u:Player {player_id: "$player_id"})-[:VIEWED]-(v:Course) CALL apoc.path.subgraphNodes(c, {maxLevel: 2, limit:20, relationshipFilter: "VIEWED|PUBLISHED|BELONGSTO", blacklist: [v], labelFilter:">Course"}) YIELD node RETURN node

We still have some doubts though but we are closer.

Topic		Replies	Views
Apoc: apoc.path.subgraphAll Terminate Node Option Procedures & APOC apoc	7	348	April 1, 2023
Cypher query for getting a subgraph by multiple relationship paths Cypher	8	2688	May 27, 2022
subgraphAll problem Procedures & APOC	1	349	October 7, 2021
Difference in querying relationship depth with/without apoc Newbie Questions apoc , performance , relationship	2	435	January 11, 2021
Help re-writing query Cypher performance , cypher	7	1078	March 11, 2019

How apoc.path.subgraphNodes travers the graph?

Related topics