Optimising variable length path matches?


(Joe) #1

I've got a variable length path search which I'm trying to optimise. It currently takes a reasonably long time to run, as it does a lot of db hits:

    neo4j-sh (?)$ profile match (f:Feature)<-[:RELATED_FEATURE]-(:Layer0)<-[*0..3]-(s:SalesOrder) return count(s);
    +----------+
    | count(s) |
    +----------+
    | 27860    |
    +----------+
    1 row
    9891 ms

    Compiler CYPHER 2.3

    Planner COST

    Runtime INTERPRETED

    +-----------------------+----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | Operator              | Estimated Rows | Rows   | DB Hits | Identifiers                        | Other                                                                  |
    +-----------------------+----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +ProduceResults       |             98 |      1 |       0 | count(s)                           | count(s)                                                               |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +EagerAggregation     |             98 |      1 |       0 | count(s)                           |                                                                        |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +Filter               |           9524 |  27860 |   27860 | anon[18], anon[39], anon[48], f, s | Ands(none(anon[48] in anon[48] where anon[18] == anon[48]), f:Feature) |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +Expand(All)          |          12698 |  27860 |   55720 | anon[18], anon[39], anon[48], f, s | ()-[:RELATED_FEATURE]->(f)                                             |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +Filter               |          12698 |  27860 |  231934 | anon[39], anon[48], s              | anon[39]:Layer0                                                        |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +VarLengthExpand(All) |          12698 | 231934 |  305868 | anon[39], anon[48], s              | (s)<-[:*]-()                                                           |
    | |                     +----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+
    | +NodeByLabelScan      |           6349 |   6349 |    6350 | s                                  | :SalesOrder                                                            |
    +-----------------------+----------------+--------+---------+------------------------------------+------------------------------------------------------------------------+

    Total database accesses: 627732

I can get a bit of info on the paths that match:

    neo4j-sh (?)$ profile match (f:Feature)<-[:RELATED_FEATURE]-(:Layer0)<-[r*0..3]-(s:SalesOrder) return count(s), extract(x in r | type(x));
    +-----------------------------------------------------------------------+
    | count(s) | extract(x in r | type(x))                                  |
    +-----------------------------------------------------------------------+
    | 3028     | ["RELATED_PROPERTY","billingAddress","HAS_ACCOUNT"]        |
    | 6349     | ["WITH_POSTCODE","INSTALLED_AT_ADDRESS"]                   |
    | 3036     | ["RELATED_PROPERTY","INSTALLED_AT_ADDRESS"]                |
    | 6066     | ["HAS_BLPU","RELATED_ADDRESS","INSTALLED_AT_ADDRESS"]      |
    | 6349     | ["WITH_POSTCODE","billingAddress","HAS_ACCOUNT"]           |
    | 3032     | ["RELATED_BLPU","RELATED_PROPERTY","INSTALLED_AT_ADDRESS"] |
    +-----------------------------------------------------------------------+
    6 rows
    10580 ms

There are many other candidate paths here that don't match, and I'd like to cull those.

It occurs to me that I could restrict the type of the relationship r, but as it's a path variable I'm not sure what the most optimal way of doing this might be.


(Stefan Armbruster) #2

You can either do [r:TYPE1|:TYPE2|:TYPE3*0..3] or use apoc.path.expand - which gives you finer grained control.