Using max function in a Cypher query causes two different paths of the same length to merge in the output

Hello to anyone reading this! I am a Neo4j beginner and I've stumbled upon a problem as many before me :slight_smile: This is an example description of my database (not optimized and very basic):

  • nodes Destination (with property 'name'): A, B, C, D and E
  • nodes Experience (with properties 'name' and 'rating'): A1, B1, C1 and D1
  • relationships DISTANCE between Destination nodes (with property 'units')
  • relationships IS_OFFERED_AT from Experience nodes to Destination nodes (with property 'units')

What I would like to do here is the following:

  • choose a starting Destination node (named 'A')
  • find all possible paths from A with limiting no. of visited relationships to 2 (A->B->C, A->B->D, and so on)
  • for every resulting path find one experience that IS_OFFERED_AT each node in the path and has the highest rating
    (one experience node per one destination node) and calculate average rating of respective experiences
  • the final query result should be an itinerary that looks something like this:

Path avgRating units

["C", "C1", "B", "B1", "D", "D1"] 9.33 6.0
["C", "C1", "B", "B1", "A", "A1"] 9.0 8.0

So a list of [Destination1-Experience1-Destination2-Experience2-Destination3-Experience3].

--------My_problem--------
I noticed an annoying problem when dealing with paths that are of the same length (sum of property 'units').
In this example case, when starting at Destination node A with limiting no. of visited relationships to 2,
there should be two paths: A->B->C and A->B->D (both of length 8 units).
So the final query result for itinerary should be: [A, A1, B, B1, C, C1] and [A, A1, B, B1, D, D1].

However, what I am getting is:

Seems like when generated paths are of the same length, my query joins the two paths together by adding the last node of the second path to the first path.
With my limiting knowledge I managed to pinpoint where in my query things start to go wrong (it happens when using max function - I marked it in the query below),
however I cannot manage to find a solution.



MATCH p = (start:Destination {name: 'A'})-[r:DISTANCE*2..2]-(end:Destination)
WHERE size(nodes(p)) = size(apoc.coll.toSet(nodes(p)))
WITH EXTRACT (r in rels(p)|r.units) AS distances, p
WITH EXTRACT(n in nodes(p)|n.name) AS loc, apoc.coll.sum(distances) AS totalKm, p

UNWIND nodes(p) AS pathAll
MATCH pathEx = ((exp:Experience)-[r:IS_OFFERED_AT {units: 0}]-(pathAll)) 
WITH pathAll, max(exp.rating) as max, totalKm  //at this point pathAll changes
MATCH pathExp = (exp:Experience)-[r:IS_OFFERED_AT]-(pathAll)                 
WHERE exp.rating = max      
WITH pathAll.name AS location, COLLECT(exp.name)[..1] AS experience, COLLECT(exp.rating)[..1] AS rating, totalKm
UNWIND experience AS exp
UNWIND rating AS rat

WITH COLLECT(location) AS Loc, COLLECT(exp) AS Exp, COLLECT(rat) AS Rat, totalKm
WITH apoc.coll.flatten(apoc.coll.zip(Loc, Exp)) AS Path, apoc.coll.avg(Rat) AS avgRating, Exp, totalKm
MATCH (exp:Experience) WHERE exp.name in Exp
WITH COLLECT(exp.cost) AS cost, Path, avgRating, totalKm
RETURN Path, ROUND(100*avgRating)/100 AS avgRating, totalKm AS units ORDER BY units

I realize this is probably not the best written query as I am a beginner.
So if you have any advice, it would be much appreciated.
Thank you in advance for all help!

Hi Kulan,

I think your problem might be that a path doesn't normally have branches, and you want a branching pattern. Do you always want an itinerary with three stops? If so, you could make your match very explicit like this.

MATCH 
(dest1:Destination {name: 'A'})->[dist1:DISTANCE]->(dest2:Destination)-[dist2:DISTANCE]->(dest3:Destination),
(exp1:Experience)-[:IS_OFFERED_AT]->(dest1),
(exp2:Experience)-[:IS_OFFERED_AT]->(dest2),
(exp3:Experience)-[:IS_OFFERED_AT]->(dest3)
RETURN dest1, exp1, dest2, exp2, dest3, exp3, 
(exp1.rating + exp2.rating + exp3.rating)/3 AS avgRating, 
dist1.units + dist2.units AS totalKm
ORDER BY avgRating DESC
LIMIT 1

One important thing to note about this query, at the WITH clause here:

WITH pathAll, max(exp.rating) as max, totalKm  //at this point pathAll changes

you're no longer working with paths at all, since at this point these will be the only variables in scope. pathAll isn't a path, it's an individual node per row from your UNWIND. Per pathAll node and the totalKm sum of distances from your very first MATCH pattern, you're getting the max rating of experiences offered at that particular pathAll node. Any context you had about that node as part of a path is gone, so you've lost any way to group or identify those nodes as part of that original path p.