Need help in understanding UNWIND and WITH clause

Hello Team,

I need your help in understanding the concept of UNWIND and WITH.

I tried few examples of WITH and UNWIND but did not able to understand the concept and use of them.

Below are my examples:

QUERY 1

WITH [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT(X) as Y

OUTPUT 1

Y
[1, 2, 2, 3, 4, 5, 6, 6]

OBSERVATION 1

Duplicates are available in output even though distinct is used.

QUERY 2

UNWIND [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT (X) as Y

OUTPUT 2

Y
1
2
3
4
5
6

OBSERVATION 2

Duplicates are NOT available in output even though distinct is used.

QUERY 3:

UNWIND [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT (X) as Y
WITH collect(Y)

OUTPUT 3:

Neo.ClientError.Statement.SyntaxError

Neo.ClientError.Statement.SyntaxError: RETURN can only be used at the end of the query (line 2, column 1 (offset: 32)) "RETURN DISTINCT (X) as Y" ^

Kindly help in telling when to use WITH and UNWIND.

Regards
AK

hi,
UNWIND is like an iterator , that is when you unwind a list ,
then next lines in cypher will be run for each of those values in the list ,

WITH is not an iterator , its like variable initiator or for aggregating values up to certain point in cypher query ,

in query-1
WITH [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT(X) as Y
here ,you are saying the cypher to declare that [1,2,2,3,4,5,6,6] as X , i.e , X = [1,2,2,3,4,5,6,6]
when you did DISTINCT(X) , distinct is applied on all the rows , in this query you have only one row ,i.e , [1,2,2,3,4,5,6,6]
so distinct of one row will be itself ..note: here distinct is considering the whole list as a single value

in query-2 :
UNWIND [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT (X) as Y
UNWIND [1,2,2,3,4,5,6,6] as X means , you are iterating through the and list and each value is X , (its like for(X in [1,2,2,3,4,5,6,6]) )
so you have total 8 rows of X passed to DISTINCT operator ,,
the 6 unique values are returned by distinct operator , note : here 'X' is the integer value inside the list

1 Like

Hi Ganesan,

Thanks a lot for quick response and explanation.

Please also consider this query:

QUERY 3:

UNWIND [1,2,2,3,4,5,6,6] as X
RETURN DISTINCT (X) as Y
WITH collect(Y)

OUTPUT 3:

Neo.ClientError.Statement.SyntaxError

Neo.ClientError.Statement.SyntaxError: RETURN can only be used at the end of the query (line 2, column 1 (offset: 32)) "RETURN DISTINCT (X) as Y" ^

you cannot end a cypher query with WITH statement ,

you can do the following ,

UNWIND [1,2,2,3,4,5,6,6] as X
WITH DISTINCT (X) as Y
RETURN collect(Y) as yList

this query does the same job as query2 , instead of streaming the results in column _ Y .... we are collecting the result as list

Above image is for the outcome of the query (3) you shared where i added EXPLAIN.

Above image is for the outcome of the query (3) you shared where i added PROFILE.

I can see that there is difference in rows of incase of UNWIND.

Any idea regarding this?

Regards
Akshat.

EXPLAIN only has estimated rows, and in most cases those are for the planner's use when using estimates, and not useful for us.

To see actual rows use PROFILE, which will actually run the query.

Also, to clarify a bit, UNWIND isn't exactly an iteration structure, though the result is often identical to one.

The key concepts here are:

  1. Cypher operations result in rows
  2. Most Cypher operations execute per row

UNWIND is like a reverse collect(): For every element in the list, you will get a row with that element.
And because Cypher operators execute per row, whatever you do next in the query (MATCH, filter, CREATE, procedure call, whatever) will execute for each of the those elements (since you have a separate element per row).

DISTINCT is a keyword (not a function, you can drop those parenthesis) that operates across rows, to ensure that each row has distinct values from any other row. That's why when you only had one row with a list that it didn't do anything. The single row was already distinct. When you did your UNWIND, then you had an element per row, and that allowed DISTINCT to do its magic to ensure all rows were distinct with no duplicates.

1 Like

Hi AK,

I typically use UNWIND, WITH & also COLLECT to create a matrix/scaffolding to join statistics back into. Because of the NoSQL nature of Neo you don't necessary have samples/events/fields available for the full range of values you expect.

Example: say you want to know the average count of something by the day-of-the-week and hour-of-the-day. If you are missing some values e.g. 23 o'clock Wednesday then your data will have no "Wed" "23" and also won't even have a zero in that reference. By creating the matrix/scaffolding you can join you sample data back into the blank/zeros.

Here is the Cypher to create the matrix:

WITH range(0,23) as hours, ["Mon", "Tue","Wed","Thu","Fri","Sat","Sun"] as days
UNWIND hours as TimeOfDay
UNWIND days as DayOfWeek
WITH toString(TimeOfDay)+":"+DayOfWeek as key, 0 as numEmails, TimeOfDay, DayOfWeek
RETURN COLLECT({key: key, numEmails: numEmails, TimeOfDay: TimeOfDay, DayOfWeek: DayOfWeek}) as rows

Basically we have two lists (one created by ranges and the other explicitly defined) and we UNWIND them. The zeroed numEmails is so we can use it as a place holder for a subsequent counts we want do e.g. the total number of emails send between MERGE (a:Person {personName: "Tarzan"})-[:emailed {DayOfWeek: "Mon", TimeOfDay: 11}]->(b:Person {personName: "Jane"})

To do this from the top:
WITH range(0,23) as hours, ["Mon", "Tue","Wed","Thu","Fri","Sat","Sun"] as days
UNWIND hours as TimeOfDay
UNWIND days as DayOfWeek
WITH toString(TimeOfDay)+":"+DayOfWeek as key, 0 as numEmails, TimeOfDay, DayOfWeek
WITH COLLECT({key: key, numEmails: numEmails, TimeOfDay: TimeOfDay, DayOfWeek: DayOfWeek}) as rows
MATCH (a:Person {personName: "Tarzan"})-[l:emailed {DayOfWeek: "Mon", TimeOfDay: 11}]->(b:Person {personName: "Jane"})
WITH toString(l.TimeOfDay)+":"+l.DayOfWeek as key, 1 as numEmails, l.TimeOfDay as TimeOfDay, l.DayOfWeek as DayOfWeek, rows
WITH rows+COLLECT({key: key, numEmails: numEmails, TimeOfDay: TimeOfDay, DayOfWeek: DayOfWeek}) as rows
UNWIND rows as row
WITH row.key as key, row.numEmails as numEmails, row.TimeOfDay as TimeOfDay, row.DayOfWeek as DayOfWeek
RETURN sum(numEmails) as aveEmails, TimeOfDay, DayOfWeek

Regards, Mike

2 Likes