Is it possible to properly rewrite `apoc.path.subgraphNodes` query using Cypher?

oleksandr.dashkov · September 6, 2024, 1:01pm

Hello,

Following my previous question I'm looking for a possibility to rewrite an apoc query using pure cypher.

At this stage my main query looks like that :

MATCH (a: A {id: $id})
CALL apoc.path.subgraphNodes(
    a,
    {
        filterStartNode: true,
        maxLevel: -1,
        labelFilter: ">A|L1|L2|L3|L4|-Excluded",
        relationshipFilter: "USED_L1|USED_L2|USED_L3|USED_L4",
        limit: 15000
    }
)
YIELD node
RETURN node.id

This query works pretty good, returns a response in milliseconds. However there is a huge limitation when if I for example I want to filter by relationship property.
So I started to rewrite it using cypher and get a more or less working version, which supports the filtering by properties, however I have huge problems with the performance of the query.

MATCH (a:A {id: 12})
MATCH (a)
    (
        (a_i:A&!Excluded)-
                [r_in:USED_L1|USED_L2|USED_L3|USED_L4 
                    WHERE r_in.excluded is null or r_in.excluded <> true]
            ->(n:(L1&!Excluded)|(L2&!Excluded)|(L3&!Excluded)|(L4&!Excluded)) 
            <-[r_out:USED_L1|USED_L2|USED_L3|USED_L4  
                    WHERE r_out.excluded is null or r_out.excluded <> true]
        - (a_j:A&!Excluded)
    )*  (a_end:A&!Excluded)
return distinct a_end.id limit 15000;

This query resolve all the limitations with filtering I have when use apoc, but it has a very bad performance. It works if I limit the hops to 3 with the medium size clusters, but starting from the hop 4 it's becomes unusable.

I was trying to play with WHERE COUNT {} syntax to reduce duplications but it looks like it make things even worse in the current case.

My data model looks like that:
Each node A is related to the nodes L(1-4) and then related to other nodes with labels A by one or more nodes L(1-4). For example:

Aa  --> L1 <-- Ab --> L2 <-- Ac
    \-> L2 <-/    \-> L3 < -/

The main goal of the query is to get all the nodes A connected to current node A.

Thanks for your help!

glilienfield · September 6, 2024, 10:08pm

You got a lot going on here. The property predicates will impact the execution since they have to be retrieved. Label and type predicates will be faster. How about you use a label like A_Include instead of Exclude, so your label predicate A&!Exclude can be replaced with A_Include. You can do the same for other labels that you want to include/exclude. The same is true for the relationships types. For example, USED_L1 and USED_L1_EXCLUDE. This way you can eliminate the relationship property excluded and those corresponding predicates.

I do wonder about your data model though. It makes your queries complicated.

oleksandr.dashkov · September 11, 2024, 7:15am

@glilienfield , Thanks for your answer.
yes, it's a possible approach. If I use the relationships like USED_L1 and USED_L1_EXCLUDE I can continue to use the apoc.path.subgraphNodes function, and so it works pretty efficient with the existing labels system.

So am I right, that I shouldn't design the data model to filter graph traversal based on the relationships properties as it will be inefficient (I had another use case for the future to filter based on the timestamp on the relationship, so it looks like it won't be possible).

I can't share completely my data model, but the process it two words is like so:

There is a event that creates a node A in the database
It checks if nodes L1, L2, L3 and so on already in database
If they are, it creates relationships between the newly added A and L1..n nodes.
Otherwise it creates L1..n nodes and creates the relationships
same process happens all the time

Another process executes queries to get all the nodes A related to the just added node A to execute some logic based on the received cluster.
However there could be false positive relationships between An and L1..4, and I want to exclude them from the clusters. So far it's made based on the node only where we have (L1-4:Excluded), but I'd want to be able to make an exclusion based on the relationships (as the relation could be false positive for A1 but true positive for An). Finally I don't want to completely remove the relationship, to keep an initial image, and only mark it as excluded.

I hope it makes things clearer for you.

Thanks for your help

Topic		Replies	Views
How to deny traversing subgraph if there is a specific relationship type Cypher apoc , cypher	9	127	September 6, 2024
Help with the post filtering after the apoc.path.subgraphAll Cypher apoc , cypher	4	468	February 9, 2023
Query/Cypher taking too long? Neo4j Graph Platform migrated	5	136	July 13, 2022
Slow Cypher Query Help Needed! Cypher performance	6	942	January 16, 2020
Problems matching a whole subgraph without knowledge about content Cypher cypher	1	386	May 19, 2020

August 🏄 🏖️ 🏊

Is it possible to properly rewrite `apoc.path.subgraphNodes` query using Cypher?

Related topics