Yes, I know, but this is part of a much bigger cypher (54 lines) that gets all applicants for a vacancy and lot of related data as nested objects. In some cases there about 200 applications, each has 1-2 CVs, and each CVs about 5-20 workphases. I have to select the youngest workphase before collecting the applications, as these info is part of the application map projection.
So before pattern comprehension was introduced, I would have used an optional match, order applications and workphases, collect workphases, cut the first with head()...but this seems quite slow compared to pattern comprehension. Using pattern comprehension but then unwind, sort (all objects in with clause) and collect the workphases again also doesn't seem straightforward to me and probably nullifies the speed advantage of using pattern comprehension (just guessed).
So I came to the following approach, which works quite fast, but is not easy to read:
MATCH (application:Application)-[:APPLIED_FOR]->(positionNode:Position {uuid:"xyz"})
WITH positionNode, application { .*, LastEmployer: apoc.coll.sortMulti([(application)-[:HAS_CV]->(:CV)-[:WORKED_AT]->(wp:WorkPhase) WHERE wp.validEmployment | wp ], ['dateTo','dateFrom']) [0] }
WITH positionNode { .*, applications:COLLECT(application) }
RETURN positionNode
It would be nicer, if the pattern comprehension could return sorted output directly, something like:
[(application)-[:HAS_CV]->(:CV)-[:WORKED_AT]->(wp:WorkPhase) | wp ORDER by wp.dateTo, wp.dateFrom ]
Good to know it's on the backlog, thanks for that information.
Just to mention: I just had a case that throws an error with the pattern comprehension wrapped in apoc.coll.sortMulti:
Failed to invoke function `apoc.coll.sortMulti`: Caused by: java.lang.NullPointerException```
I thought pattern comprehension always returns an array, maybe an empty array, but it seems it returns null in some cases. Wrapping it in coalesce( ... ,[]) works.
It occurs, when the path in the pattern comprehension is build upon a node variable that is null (by an optional match with no result). In my example above it happens, when application is null. I assume this is by design and not a bug.
You can try it with the movie-graph and the following cypher. With the condition released = 2018, movie and the result of the pattern comprehension is null.
MATCH (tom:Person {name: "Tom Hanks"})
OPTIONAL MATCH (tom)-[:ACTED_IN]->(movie:Movie)
WHERE movie.released = 2018
RETURN tom, movie.title, movie.released, [(movie)<-[:ACTED_IN]-(coActor:Person) WHERE coActor.name <> tom.name | coActor.name ] as coActors
But apoc.coll.sortMulti should accept null as an input parameter with an output of null, instead of throwing this error, as apoc.coll.sort or apoc.coll.min does.
Do we have any updates on this feature request? As Reiner mentioned, allowing limit and sort will add big help with complex queries. My queries are getting borderline unreadable without comprehensions!
Sorry, it remains on the backlog for consideration, no movement yet.
In the meantime, you may want to use subqueries to scope the expansion and limiting and sorting of results before collecting. It is more verbose than a pattern comprehension, but it should perform the same operations and be scoped per-row like a pattern comprehension, and the subquery would be able to encapsulate your logic fairly well.
Thanks for the update. I have been using subqueries to maintain the scope and that has helped. But they are definitely more verbose than patterns, which seems like an overhead if I don't plan to reuse those subqueries.
Sorry to go on a tangent here, but can you confirm if there is any performance impact of splitting my logic into multiple subqueries (most of them use apoc functions). I have not read anything on those lines or profiled my code yet, just wondering if new variables being created and passed back and forth with subqueries have any impact. I can add a new question if that's cleaner.
Depends on which APOC functions. If the function has to perform an entirely new query under the hood, then the overhead of having to do that (per row) can have an impact, and native subqueries would often perform better.