Multiple hops query

Hi everyone,

I have the following situation (as shown in the image below), where I have to return always the latest tagged Person node on the path while having to traverse an undefined number of hops in the HAS_PARENT relationship type.

UseCase

Anyone any idea on a performant query that would satisfy all the cases? Thanks!

Hello @busymo16 :slightly_smiling_face:

This query should do what you asked (you may have to change some config parameters or the WHERE clause)

MATCH (family:Family) 
CALL apoc.path.expandConfig(family, {
    relationshipFilter: "<BELONGS_TO|HAS_PARENT>", 
    labelFilter: "+Person", 
    filterStartNode: false, 
    uniqueness: "RELATIONSHIP_PATH", 
    bfs: false
}) 
YIELD path 
WHERE last(nodes(path))).tagged = true 
WITH family, path, length(path) AS hops 
ORDER BY hops DESC LIMIT 1 
RETURN family, path, hops

Regards,
Cobra

Hi @Cobra,

So basically your suggested query returns the family and also as a path is returning the family as well. I extended it to this one:

MATCH (family:Family)<-[:BELONGS_TO]-(person:Person)
CALL apoc.path.expandConfig(person, {
    relationshipFilter: "<HAS_PARENT>", 
    labelFilter: "+Person", 
    filterStartNode: true, 
    uniqueness: "RELATIONSHIP_PATH", 
    bfs: false
}) 
YIELD path
WHERE last(nodes(path)).tagged = true
RETURN last(nodes(path))

which works fine for CASE III, but not for CASE I and II. The reason for it is that I get also the upper tagged Person nodes, where I am interested only in the last tagged one. Would filtering on uniqueness: "RELATIONSHIP_PATH|LABEL" satisfy CASE I and II as well? Thanks!

Hi @Cobra

Thanks for your answer. Actually, there might be another thing important to mention. It exists also the possibility that

(family: Family)-[:HAS_PARENT]->(parentFamily: Family)

so in this case I would also get the results for parentFamily, right? Also, I am not interested in the path of it but in the node itself, as I use it later on in a subsequent query. Is there a way to set the limit: * (just as an unknown depth)?

Regards,

Busymo

uniqueness setting only accept one value so it's not possible. Can you provide some Cypher queries to create your example please?

Hi @busymo16,

It will not get info from the parentFamily due to Family not being whitefiltered.

Please consider that this approach will first open the entire tree and just after retrieving all of the, it will prune non valid paths.

You have two options, you can use the property tagged as a Label, so then you can add it to the labelFilter as a Terminator. Or, if you don't want or can't modify your model, you may need to check the traversal api framework.

Bennu

Hi @Cobra
Sure. The following query creates some fake data with the model.

MERGE (p:Person{id:5})
MERGE (p1:Person{id: 6})
MERGE (p2:Person{id: 7})
MERGE (p3:Person{id: 8})
MERGE (p4:Person{id: 9})
MERGE (p5:Person{id: 10})
MERGE (p6:Person{id: 11})
MERGE (p7:Person{id: 12})
MERGE (p8:Person{id: 13})
SET p.tagged = true,
p1.tagged = false,
p2.tagged = true,
p3.tagged = false,
p4.tagged = false,
p5.tagged = true,
p6.tagged = false,
p7.tagged = true,
p8.tagged = false
MERGE (p)-[:BELONGS_TO]->(f:Family{id:1})
MERGE (p1)-[:HAS_PARENT]->(p)
MERGE (p2)-[:HAS_PARENT]->(p1)
MERGE (p3)-[:BELONGS_TO]->(f1:Family{id:2})
MERGE (p4)-[:HAS_PARENT]->(p3)
MERGE (p5)-[:HAS_PARENT]->(p4)
MERGE (p6)-[:BELONGS_TO]->(f2:Family{id:3})
MERGE (p7)-[:BELONGS_TO]->(f3:Family{id:4})
MERGE (p8)-[:HAS_PARENT]->(p7)

If you then see the model it will look like this:

Screenshot 2022-11-09 at 16.09.38.pngScreenshot 2022-11-09 at 16.11.10.png

While executing the query below:

MATCH (family:Family)<-[:BELONGS_TO]-(person:Person)
CALL apoc.path.expandConfig(person, {
    relationshipFilter: "<HAS_PARENT>", 
    labelFilter: "+Person", 
    filterStartNode: true, 
    uniqueness: "RELATIONSHIP_PATH", 
    bfs: false
}) 
YIELD path
WHERE last(nodes(path)).tagged = true
RETURN last(nodes(path)).id

The result is nodes with id: 5, 7, 10, 12

The expected result would be 7, 10, 12

Hi @bennu_neo,

Thanks for the explanation. The point is that the model cannot change at the moment speaking but also what I think happens, in this case, is that even the shorter paths are considered valid paths. The only issue is that I am interested in the last-tagged person and not the entire path above him. So basically, I only am interested in the last Person node that is tagged, so considering the depth-first search, it would be:

go until the last Person node in the path,
check if it is tagged
         if yes, return it and break
         if not, continue up the tree until (line 3) is the case.
         if (line 3) is not possible, go to the next path

Here is a solution:

MATCH (family:Family)<-[:BELONGS_TO]-(person) 
CALL apoc.path.expandConfig(person, {
    relationshipFilter: "HAS_PARENT", 
    labelFilter: "+Person", 
    filterStartNode: true, 
    uniqueness: "RELATIONSHIP_PATH", 
    bfs: false
}) 
YIELD path 
WHERE last(nodes(path)).tagged = true 
WITH family, nodes(path) AS nodes 
WITH family, apoc.coll.indexOf(nodes, last(nodes)) AS index, last(nodes) AS node 
WITH family, apoc.coll.sortMaps(collect({index: index, node: node}), "index")[0] AS result 
RETURN family.id AS family, result.node.id AS node

Regards,
Cobra

Hi @busymo16 !

I got your point. This fairly more complex than expected. I'll think about it.

B

@busymo16 Yes, you're right. I'm missing some nuance of the requirements that I cannot see no matter how many times I read the thread. I've been testing with your data - what am I missing that is eliminating node 7 from the results?

@busymo16 I did some testing with the new k-hop optimizations in v5. There are some requirements for it to kick in (hence the 0..999 and the distinct in my example), but this seems to work, if I'm understanding your requirements correctly:

match (f:Family)<-[:BELONGS_TO]-(:Person)-[:HAS_PARENT*0..999]->(p:Person {tagged:true})
return distinct p

@steggy No worries. Yes, eliminating nodes 7 and 10 and adding 5 instead. So the result to be correct with the requirements should be 7, 10, 12.

@steggy Not quite what I would like to achieve according to the requirements. If you would create the graph with the above (also in this comment link: https://community.neo4j.com/t5/neo4j-graph-platform/multiple-hops-query/m-p/61907/highlight/true#M36592), the expected result would be 7, 10, 12 as ids to be returned. Meanwhile the query you suggest also in v.5.1 returns only 5, and 12 which is not what I expect.

I tried with neo4j:5.1.0-enterprise, as per the official docker image. Did you tune/add any parameters in the configuration?

Hi @busymo16 ,

Am I missing something? I thought the answer for your example was only 7 and 10. 5 is not the last one on a possible path.

Hi @bennu_neo,

Sorry my mistake in fast typing, it should be: 7, 10, 12

Hi @busymo16 ,

Do you trust me? If you do, please put this jar (https://drive.google.com/drive/folders/1hC2PSUjvmcrdTfKcRSz15WnT4vG8DuRE?usp=sharing) on the plugin folder and execute the query as follows:

MATCH (f:Family)<-[BELONGS_TO]-(n) 
WITH f, collect(n) as nn
CALL bennu.test(nn) yield node
return f, node

Hi @Cobra,

Thanks for your answer. Logically this is the expected behavior but it is quite not performant for bigger use cases. I will have to use it in a graph that is going over through 100mil nodes more or less. I have also part of the query before this logic, and parts of it after the query. Any idea on speeding it up?

Thanks,

As said by @bennu_neo, you should change a bit your data model. For now, I don't have other ideas to speed-up the query :frowning: