There are a few things I think are happening here. Let’s start with reviewing aggregation. When you have a line such as the following, cypher will group rows by each pair of distinct values of x and y, then collect the values of z for each unique pair of x and y.
with x, y, collect(z)
This is relevant because your first query's 'with' statement included 'm' with 'n'. The result of this will be that the collecting will be done over unique combinations of 'n' and 'm'. As a result, the collection 'ConnectedTo' has one value in it (which is the m.name for the node ''m'). Since each 'n' node has two 'm' nodes, your data after row 2 should look like the following:
Note: this is one cause of your 'multiple rows' issue you have in query #2. We will discuss that later.
You did not see the issue above because your return statement is now aggregating over just 'n.name' and not 'm' too. As a result, you recollect and get the result you desire:
Note: the first collect is irrelevant and is ignored.
Your first query is overly complicated for the result you want. The following query shows how I would have approached it. I put the 'with' in to collect over the node 'n' instead of n.name. Then I returned n.name and the list. If you return 'n.name, collect(m.name)', then it will group the 'n' nodes if they had the same name.
Note: if you want to get a distinct list of items when collecting, you can use 'Distinct' during collection. You don't need to use the apoc.coll.toSet() function. It would look like the following:
match(n:Label2)-->(m:Label1)
return n.name, collect(distinct m.name)
The other cause of the behavior you did not expect is the reuse of the same variable (ConnectedTo) in your second query. After line 2 in the second query, you will have the result I showed in the first screenshot. Note, each row has a ConnectedTo list with a single value. There are two rows for 'Tom' and two rows for 'Alen'.
Now, on line 5 of the second query you recollect and assign the result to 'ConnectedTo'. You then use the 'ConnectTo' value in your 'case' logic. This is where the second mistake is made. The value of 'ConnectTo' used in the 'case' statement will be the one created one line 2, not the one on line 5. This is because the one on line 5 is not know yet to be used in the 'case' statement. You can't define a variable on a 'with' or 'return' statement and then refer to that value in an expression on the same 'with' or 'return' statement, as it is not defined until after the 'with' or 'return' statement concludes.
Your 'return' statement in this second query has both 'n.name' and the calculated 'checks' values as the pair being aggregated over. Because of this and the fact that the 'case' statement is using the value of 'ConnectedTo' from line 2, the data without aggregating is the following:
Bringing it home, when you add the aggregation of the rows above, you get the result you did not want. This is because 'Tom' had two different rows (one with 'True' and one with 'False), while 'Alen' had two identical rows. The result of aggregating over the pair 'Name' and 'checks' gives the following result:
I think you are trying to determine if the list of label2 nodes associated with label1 node has names that are from a given list of names. The following query will do that, where the list ['Alex', 'Greg'] is the given reference list to compare against. In a production implementation, the list would be passed as a parameter.
Note: 'any' being a predicate, you can return its value directly and don't need a 'case' statement if you want the values 'true' or 'false'
match(n:Label2)-->(m:Label1)
with n, collect(m.name) as ConnectedTo
return n.name, ConnectedTo, any(x in ConnectedTo where x in ['Alex', 'Greg'] ) as check
Is this what you were looking for?