Hello community!
We are doing some cypher query optimization operations. And so far everything is good except one thing. Below you see our cypher query:
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
If we remove the "distinct" part of the return statement...
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
We expected to receive the same output. But there are some paths which are matched twice. We have large datasets and we don't wanna use "distinct".
On the other hand for smaller TIMs (<4) We got the same output. For example: Here we got the same number of paths as result.
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
Can anyone explain that phenomenon?