cancel
Showing results for 
Search instead for 
Did you mean: 

Performance of a parallel query execution

michael_horak
Node Clone

Hello,

I have a Cypher query (please see below) that is used to get user data filtered by user permissions.

We are using the Role-based access control where every user has some roles (role can extend another role so we need a variable length path) with specified permissions on a group of nodes (group might extend another group so we need a variable length path).

This query seems to be performing quite well in case of one user but when we have 4 users executing the query in parallel, the execution time increases ≈ 4 times.

Since we are running the test on a 12 CPU machine with 16G of ram, we expected that Neo4j will execute the read queries in parallel so the time should be similar to single execution.

So I would like to ask if there is something wrong with the query or how can we improve this result.

Thanks a lot,

Michael

Neo4j Version: 3.5.3
Driver: neo4j-jdbc-driver (version: 3.1.0)

MATCH (user:N{userId:"1234"}) 
OPTIONAL MATCH (user)<-[:owner]-(i:N) WHERE NOT (i)-[:permission]->() 
RETURN COLLECT(i.t) AS nodes,COLLECT((i)-->()) AS relations 
	UNION
MATCH (user:N{userId:"1234"}) 
	OPTIONAL MATCH (user)<-[:hasRole]-(:N:Role)-[:extendRole *0..]->(r:N:Role)
	OPTIONAL MATCH (r)<-[:permission]-(p:N:Perm) WHERE p.perm = "READ" WITH p
	MATCH (p)<-[:group]-(:N:Group)<-[:groupExtend *0..]-(:N:Group)<-[:nodes]-(i:N) 
RETURN COLLECT(i.t) AS nodes,COLLECT((i)-->()) AS relations

execution plan:

1 REPLY 1

eric13013
Node Clone

After investigation, we have noticed that Neo4j is retrieving too many relationships that we do not need. Regarding this, we modified the query like this :

MATCH (user:N{userId:"1234"})
OPTIONAL MATCH (user)<-[:owner]-(i:N) WHERE NOT (i)-[:permission]->()
MATCH (i)-[r]->() 
RETURN COLLECT(DISTINCT i) AS nodes,COLLECT(DISTINCT r) AS relations  
UNION
MATCH (user:N{userId:"1234"}) 
OPTIONAL MATCH (user)<-[:hasRole]-(:N:Role)-[:extendRole *0..]->(r:N:Role)
OPTIONAL MATCH (r)<-[:permission]-(p:N:Perm) WHERE p.perm = "READ" WITH p
MATCH (p)<-[:group]-(:N:Group)<-[:groupExtend *0..]-(:N:Group)<-[:nodes]-(i:N) 
MATCH (i)-[r]->() 
RETURN COLLECT(DISTINCT i) AS nodes,COLLECT(DISTINCT r) AS relations

The key modification is the COLLECT(DISTINCT r), which seems to return the same number of relationships as before, but faster than COLLECT((i)-->()) we used to do before.

Here is the request PROFILE :

It seems like there is one branch less for the collect.

If someone in the Neo4j staff could explain this.

Thanks.