Query optimization with relationship depth 1

Hi,

I've a graph with about 6,000 Nodes labeled with Package. These Nodes have DEPENDS_ON relationships, partially with cycles. Nevertheless I want to know for each package on how many other packages it depends. My current cypher query is:

MATCH (p:Package)
OPTIONAL MATCH (p)-[:DEPENDS_ON*1..]->(dp:Package)
WHERE p <> dp
WITH p, coalesce(count(DISTINCT dp), 0) + 1 AS dependsUpon
RETURN p.fqn, dependsUpon

Neo4j Community Version
Cypher 3.4
planner COST
runtime INTERPRETED

Sorry, but I can't offer a screenshot, because the database is on another device :frowning:

Regards,
Jürgen

Try this:

I created this scenario:

jdufner1

Run this query:

MATCH (p:Package)
WITH p,
SIZE ((p)-[:DEPENDS_ON]->()) as Depends
RETURN p.name as Package, SUM(Depends) as TotalDepends

Result:

jdufner2

For these kinds of queries, wanting counts of distinct nodes down variable-length paths, label-based results, and wanting to avoid cycles as well as avoiding nodes/paths that have already been counted, APOC path expander procedures should be very helpful here.

Try this with APOC installed:

MATCH (p:Package)
CALL apoc.path.subgraphNodes(p, {relationshipFilter:'DEPENDS_ON>', labelFilter:'Package'}) YIELD node
WITH p, count(node) as dependsUpon
RETURN p.fqn as fqn, dependsUpon

This procedure uses NODE_GLOBAL uniqueness, so distinct nodes will only be visited once during traversal, and the count in this case will also include the originating node, so you'll get a count of 1 for nodes which don't have any DEPENDS_ON relationships.

1 Like

Thank you. Wow, it's incredibly fast.