Optimizing apoc.path.spanningTree and other cypher related isuue

graph_user_057 · February 28, 2025, 10:58am

This a cypher query I have written on neo4j's "movies" dataset, even though the dataset just a simple one it serves for the use case, please read the query as stated below:

CALL {
match (m:Movie{title:$movie_name})
match (actors:Person)-[:ACTED_IN]->(m)
match (p:Person) WHERE p.name IN $person_names
CALL apoc.path.spanningTree(actors,{
labelFilter : "+Person",
relationshipFilter : "<FOLLOWS",
maxLevel : 4,
termonatorNodes : p
})yiled path
return [node in nodes(path) | node.name] as personNames
}
CALL {
with personNames
with reverse(personNames) as reversedPathOfPersonName
return reversedPathOfPersonName[1] as firstPersonContact
}
CALL {
with personNames
with reverse(personNames) as reversedPathOfPersonName
return reversedPathOfPersonName[2] as SecondPersonContact
}
retrun firstPersonContact,SecondPersonContact

Just to give you some context I am using the apoc's inbuilt function "apoc.path.spanningTree" to find the connection between a "person" or an array of "persons" and the movie actors who have acted in a specific movie upto hop 4, meaning the input "person" can be connected anywhere to the movie actor with in 4 hops.

After finding such paths we again reverse those paths(not necessary but just for my convenience) and find their's first contact and second contact and so on. Just think of it as finding the link of contacts in a LinkedIn contacts list(chain of contacts).

There are couple of issues that I want solution for:

Optimization of spanning tree function - apoc's spanning tree function works well for my use case but when the number of nodes and edges increases in my graph, this takes an considerable amount of time to fetch results as we are fetching connections for each records serially.

Just for sake of example if we have 300000 names that we have to check the contacts for, if execution time for each name takes on average 4-5 seconds, 300000 times 4 is easily 14 days.

So any suggestion's on how to reduce the time for such computations and how we can optimize the query or even the working of apoc functions are very welcome.

Working of sub queries in cypher - The "CALL" sub query in cypher does not always work as intended as per my observation, just an example, we return "personNames" from first sub query and "SecondPersonContact" and "firstPersonContact" from the other two.

Lets say if one of the sub query returns "NULL" or there is no data present in the database as per our filters, all the other parameters becomes null or empty on the final return statement i.e if "SecondPersonContact" is null(has no data), "firstPersonContact" will also be empty even there is some data for "firstPersonContact".

The above example might not be the best suitable one has there is no way for "firstPersonContact" to have data when "SecondPersonContact" is empty but I am just trying to give a general example.

we can also use the following as an example:

CALL{
MATCH (node1:LABEL_1)
where node1.name = 'something'
return node1
}
CALL{
MATCH (node2:LABEL_2)
where node2.name = 'anything'
return node2
}
return node1,node2

if any of node1 or node2 becomes empty(i.e no data for the given filter) the final return statement "return node1,node2" will return nothing. Any suggestion on how to solve this will be a big help.

NOTE: The provided queries are just prototype and I can't provide any query planner's data like "PROFILE" and "EXPLAIN", but I am confident that I have stated my problem statement correctly and any change in the query are welcome to solve the provided problem statement.

Also the necessary nodes and their properties are already 'INDEXED' for so please avoid this solution.

joshcornejo · February 28, 2025, 11:34am

This has syntax errors, and even when corrected my movies DB returns nulls

This bit might need more clarification? why are you performing 2 separate queries that have no relationship in one single statement?

It is easily fixed anyway:

CALL {
  OPTIONAL MATCH (node1:Movie)
    WHERE node1.title = 'Th Matrix'
    RETURN node1
}
CALL{
  OPTIONAL MATCH (node2:Person)
    WHERE node2.name = 'Keanu Reeves'
    RETURN node2
}
RETURN node1, node2

Topic		Replies	Views
Difference between these three CYPHER queries Cypher	7	130	November 11, 2025
Connected Tree query times out if my depth of the tree is more than 5 Cypher	1	134	October 12, 2023
Performance problems with apoc.path.expand and apoc.run.cypher Procedures & APOC apoc , performance , cypher	3	343	May 13, 2021
Help re-writing query Cypher performance , cypher	7	1076	March 11, 2019
Query/Cypher taking too long? Neo4j Graph Platform migrated	5	150	July 13, 2022

Optimizing apoc.path.spanningTree and other cypher related isuue

Related topics