Query Optimization - We wanna exclude the "distinct" part

(Amaier) #1

Hello community!

We are doing some cypher query optimization operations. And so far everything is good except one thing. Below you see our cypher query:

``````MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
``````

If we remove the "distinct" part of the return statement...

``````MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
``````

We expected to receive the same output. But there are some paths which are matched twice. We have large datasets and we don't wanna use "distinct".

On the other hand for smaller TIMs (<4) We got the same output. For example: Here we got the same number of paths as result.

``````MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
``````
``````MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
``````

Can anyone explain that phenomenon?

(Michael Hunger) #2

There might be different LINK relationships between two elements. That would produce different paths.
As the uniqueness is on the relationships not nodes.

How much does the distinct really affect your query time?

Did you try:

``````WITH distinct nodes(path) as nodes
RETURN [s in nodes | id(s)] AS node_traces
``````

I don't think there is a path-uniqueness operation right now built in. As it still requires past paths to be kept in a datastructure to compare with.

Are you using enterprise with slotted runtime?