Query Optimization - We wanna exclude the "distinct" part

amaier · January 31, 2019, 11:03am

Hello community!

We are doing some cypher query optimization operations. And so far everything is good except one thing. Below you see our cypher query:

MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink) 
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces

If we remove the "distinct" part of the return statement...

MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink) 
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces

We expected to receive the same output. But there are some paths which are matched twice. We have large datasets and we don't wanna use "distinct".

On the other hand for smaller TIMs (<4) We got the same output. For example: Here we got the same number of paths as result.

MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink) 
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces

MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink) 
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces

Can anyone explain that phenomenon?

michael.hunger · January 31, 2019, 11:32am

There might be different LINK relationships between two elements. That would produce different paths.
As the uniqueness is on the relationships not nodes.

How much does the distinct really affect your query time?

Did you try:

WITH distinct nodes(path) as nodes
RETURN [s in nodes | id(s)] AS node_traces

I don't think there is a path-uniqueness operation right now built in. As it still requires past paths to be kept in a datastructure to compare with.

Are you using enterprise with slotted runtime?

Topic		Replies	Views
Retrieve all the distinct nodes and relationships from list of paths Cypher performance , cypher , paths	2	1586	July 10, 2023
Using Distinct in a standarized query Neo4j Graph Platform	2	370	January 25, 2021
Count distinct value for a relationship property Cypher	2	1038	January 18, 2021
Return distinct values from a merged collection Cypher cypher	1	285	December 4, 2021
Cypher match and return distinct nodes according to a parameter Cypher	1	266	May 5, 2022

Get Certified in June!

Query Optimization - We wanna exclude the "distinct" part

Related topics