Collect values if column values are the same

Hi,

I have the following table:

c1.name	c2.name	w1	w2	precision
"Movie1"	"Movie2"	0.2	0.21	2
"Movie1"	"Movie2"	0.4	0.35	2
"Movie2"	"Movie6"	0.21	0.45	2
"Movie1"	"Movie6"	0.2	0.45	2
"Movie1"	"Movie6"	0.28	0.42	2
"Movie1"	"Movie3"	0.4	0.24	2
"Movie2"	"Movie3"	0.35	0.24	2

where I want to collect the w1 and w2 columns if the first two column values are the same. In other words, I want

c1.name	c2.name	w1s	w2s	precision
 "Movie1"	"Movie2"	[0.2 0.4]	[0.21 0.35]	2
"Movie2"	"Movie6"	[0.21 0.2]	[0.45 0.45]	2
"Movie1"	"Movie6"	[0.2	0.28] [0.45 0.42]	2
"Movie1"	"Movie3"	[0.4]	[0.24]	2
"Movie2"	"Movie3"	[0.35] [0.24]	2

How can I achieve this by modifying the following script?

MATCH  ((c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation))
	WHERE id(c1) < id(c2) 
	return c1.name, c2.name, toFloat(p1.probability) as w1, toFloat(p2.probability) as w2, 2 as precision

Thanks a lot!

You would use a collect() aggregation on the variables in question:

MATCH  (c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation)
WHERE id(c1) < id(c2) 
RETURN c1.name as c1Name, c2.name as c2Name, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2, 2 as precision

If names are unique per citation node, then you can aggregate with respect to the nodes first and save the projection for later:

MATCH  (c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation)
WHERE id(c1) < id(c2) 
WITH c1, c2, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2
RETURN c1.name as c1Name, c2.name as c2Name, w1, w2, 2 as precision

@andrew.bowman Thanks! I am additionally wanting to find the max of w1 and min of w2. But I am not able to apply max on the aggregate list:

MATCH  (c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation)
WHERE id(c1) < id(c2) 
WITH c1, c2, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2
RETURN c1.name as c1Name, c2.name as c2Name, w1, w2, max(w1) as max_w1,min(w2) as min_w2, 2 as precision

returns

c1Name	c2Name	w1	w2	max_w1	min_w2	precision
"Movie1"	"Movie2"	[0.2, 0.4]	[0.21, 0.35]	[0.2, 0.4]	[0.21, 0.35]	2
"Movie2"	"Movie6"	[0.21]	[0.45]	[0.21]	[0.45]	2
"Movie1"	"Movie6"	[0.2, 0.28]	[0.45, 0.42]	[0.2, 0.28]	[0.45, 0.42]	2
"Movie1"	"Movie3"	[0.4]	[0.24]	[0.4]	[0.24]	2
"Movie2"	"Movie3"	[0.35]	[0.24]	[0.35]	[0.24]	2
"Movie6"	"Movie3"	[0.23]	[0.23]	[0.23]	[0.23]	2
"Movie3"	"Movie7"	[0.42, 0.23]	[0.65, 0.85]	[0.42, 0.23]	[0.65, 0.85]	2
"Movie6"	"Movie7"	[0.23, 0.12, 0.22]	[0.85, 0.75, 0.95]	[0.23, 0.12, 0.22]	[0.85, 0.75, 0.95]	2
"Movie2"	"Movie4"	[0.21]	[0.26]	[0.21]	[0.26]	2
"Movie3"	"Movie5"	[0.42]	[0.62]	[0.42]	[0.62]	2
"Movie7"	"Movie5"	[0.65, 0.75]	[0.62, 0.92]	[0.65, 0.75]	[0.62, 0.92]	2
"Movie6"	"Movie5"	[0.12]	[0.92]	[0.12]	[0.92]	2

I am trying to both list the weights w1 and w2 and find max among w1 and min among w2 and provide it them all in the same table ...

max() and min() are aggregation functions, just like collect() and count(), so they operate over multiple rows, not lists. You would use them at the same time that you collect() (in the WITH, not the RETURN), though it will only return the max and min values for the same c1 and c2 rows, and not for all rows of data.

1 Like

Thanks - got it working. I was just feeling lazy and was hoping max and min would work for lists :)

MATCH  (c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation)
WHERE id(c1) < id(c2) 
return c1.name, c2.name, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2, max(toFloat(p1.probability)) as max_w1, min(toFloat(p2.probability)) as min_w2
|c1.name|c2.name|w1|w2|max_w1|min_w2|
|---|---|---|---|---|---|
|"Movie1"|"Movie2"|[0.2, 0.4]|[0.21, 0.35]|0.4|0.21|
|"Movie2"|"Movie6"|[0.21]|[0.45]|0.21|0.45|
|"Movie1"|"Movie6"|[0.2, 0.28]|[0.45, 0.42]|0.28|0.42|
|"Movie1"|"Movie3"|[0.4]|[0.24]|0.4|0.24|
|"Movie2"|"Movie3"|[0.35]|[0.24]|0.35|0.24|
|"Movie6"|"Movie3"|[0.23]|[0.23]|0.23|0.23|
|"Movie3"|"Movie7"|[0.42, 0.23]|[0.65, 0.85]|0.42|0.65|
|"Movie6"|"Movie7"|[0.23, 0.12, 0.22]|[0.85, 0.75, 0.95]|0.23|0.75|
|"Movie2"|"Movie4"|[0.21]|[0.26]|0.21|0.26|
|"Movie3"|"Movie5"|[0.42]|[0.62]|0.42|0.62|
|"Movie7"|"Movie5"|[0.65, 0.75]|[0.62, 0.92]|0.75|0.62|
|"Movie6"|"Movie5"|[0.12]|[0.92]|0.12|0.92|

@andrew.bowman Now I am facing an error with

MATCH  (c1:citation) -[p1:has]-> (:BIOTERM) <-[p2:has]- (c2:citation)
WHERE id(c1) < id(c2) 
WITH c1.name, c2.name, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2, max(toFloat(p1.probability)) as max_w1, min(toFloat(p2.probability)) as min_w2
return c1.name, c2.name, w1, w2, max_w1, min_w2, max_w1/min_w2 as weight

when I just wanted to compute the fraction max_w1/min_w2:

#### Neo.ClientError.Statement.SyntaxError

Neo.ClientError.Statement.SyntaxError: Expression in WITH must be aliased (use AS) (line 3, column 6 (offset: 98)) "WITH c1.name, c2.name, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2, max(toFloat(p1.probability)) as max_w1, min(toFloat(p2.probability)) as min_w2" ^

 Neo.ClientError.Statement.SyntaxError: Expression in WITH must be aliased (use AS) (line 3, column 6 (offset: 98)) "WITH c1.name, c2.name, collect(t

with always expects an alias , you must use it as alias and can rename it in next as you want

WITH c1.name as c1_name, c2.name as c2_name, collect(toFloat(p1.probability)) as w1, collect(toFloat(p2.probability)) as w2, max(toFloat(p1.probability)) as max_w1, min(toFloat(p2.probability)) as min_w2

@Kailash Thanks! At last, I get it.

1 Like

Happy Graphing :)
Graph is everywhere!

This works. I was testing the same!

Some scaler functions for these would make sense, it is in the wishlist.

In the meantime, if you have APOC Procedures installed, you can use apoc.coll.min() and apoc.coll.max() to get the min and max values from a given list.

2 Likes