Neo4j case expression compare 2 list

Hie,

I have my data like the below
image
Here Blue nodes are label2 and Yellow nodes are label1.

Below query I am trying to fetch all nodes(label1) connected to my label2 and I get the expected output.
image

But when I try to add a case expression to check if the label1 connected to label2 is present in a list is when the problem happens.


As you can see the from the above image, the output is now split into 3, which I can understand why, but I am trying to get only 2 output like the previous.

Is there a way I can achieve this? Thanks in advance.

Update: Another query I tried gives the same output.

I am expecting only 2 rows something like the below

Name              ConnectedTo                       Checks
Tom               ["Alex", "Sam"]                    True
Alan              ["Gabriel", "Bobby"]               False

There are a few things I think are happening here. Let’s start with reviewing aggregation. When you have a line such as the following, cypher will group rows by each pair of distinct values of x and y, then collect the values of z for each unique pair of x and y.

with x, y, collect(z)

This is relevant because your first query's 'with' statement included 'm' with 'n'. The result of this will be that the collecting will be done over unique combinations of 'n' and 'm'. As a result, the collection 'ConnectedTo' has one value in it (which is the m.name for the node ''m'). Since each 'n' node has two 'm' nodes, your data after row 2 should look like the following:

Note: this is one cause of your 'multiple rows' issue you have in query #2. We will discuss that later.

You did not see the issue above because your return statement is now aggregating over just 'n.name' and not 'm' too. As a result, you recollect and get the result you desire:

Note: the first collect is irrelevant and is ignored.

Your first query is overly complicated for the result you want. The following query shows how I would have approached it. I put the 'with' in to collect over the node 'n' instead of n.name. Then I returned n.name and the list. If you return 'n.name, collect(m.name)', then it will group the 'n' nodes if they had the same name.

Note: if you want to get a distinct list of items when collecting, you can use 'Distinct' during collection. You don't need to use the apoc.coll.toSet() function. It would look like the following:

match(n:Label2)-->(m:Label1)
return n.name, collect(distinct m.name)

The other cause of the behavior you did not expect is the reuse of the same variable (ConnectedTo) in your second query. After line 2 in the second query, you will have the result I showed in the first screenshot. Note, each row has a ConnectedTo list with a single value. There are two rows for 'Tom' and two rows for 'Alen'.

Now, on line 5 of the second query you recollect and assign the result to 'ConnectedTo'. You then use the 'ConnectTo' value in your 'case' logic. This is where the second mistake is made. The value of 'ConnectTo' used in the 'case' statement will be the one created one line 2, not the one on line 5. This is because the one on line 5 is not know yet to be used in the 'case' statement. You can't define a variable on a 'with' or 'return' statement and then refer to that value in an expression on the same 'with' or 'return' statement, as it is not defined until after the 'with' or 'return' statement concludes.

Your 'return' statement in this second query has both 'n.name' and the calculated 'checks' values as the pair being aggregated over. Because of this and the fact that the 'case' statement is using the value of 'ConnectedTo' from line 2, the data without aggregating is the following:

Bringing it home, when you add the aggregation of the rows above, you get the result you did not want. This is because 'Tom' had two different rows (one with 'True' and one with 'False), while 'Alen' had two identical rows. The result of aggregating over the pair 'Name' and 'checks' gives the following result:

I think you are trying to determine if the list of label2 nodes associated with label1 node has names that are from a given list of names. The following query will do that, where the list ['Alex', 'Greg'] is the given reference list to compare against. In a production implementation, the list would be passed as a parameter.

Note: 'any' being a predicate, you can return its value directly and don't need a 'case' statement if you want the values 'true' or 'false'

match(n:Label2)-->(m:Label1)
with n, collect(m.name) as ConnectedTo
return n.name, ConnectedTo, any(x in ConnectedTo where x in ['Alex', 'Greg'] )  as check

Is this what you were looking for?

2 Likes

Hie, thank you for the detailed response and the pointer you gave it was helpful and yeah the last query you provided did it. Thank you so much for it.

1 Like