Cartesian product from array


(Hagen Ivar) #1

Hi!

I have an array of unknown length that I want to create the cartesian product for. In the example I have three arrays with 3, 1 and 2 elements each. So I use three UNWINDs, but can this be done for X number of elements in the array with one statement?

WITH [["A1","A2","A3"],["B1"],["C1","C2"]] AS input

UNWIND input[0] AS uno
UNWIND input[1] AS dos
UNWIND input[2] AS tres
WITH [uno,dos,tres] AS cartesian
RETURN *

The result is :

cartesian
["A1", "B1", "C1"]
["A1", "B1", "C2"]
["A2", "B1", "C1"]
["A2", "B1", "C2"]
["A3", "B1", "C1"]
["A3", "B1", "C2"


(Michael Hunger) #2

Probably not trivially, you could generate the Cypher from Cypher and use apoc.cypher.run()

I was thinking about using reduce instead of UNWIND but then realized that it doesn't give you arbitrary stack depth.


(Andrew Bowman) #3

This is a tough one, to do this properly we'd probably need to create a new function to handle this.

You can do this (though not as efficiently) by utilizing apoc.coll.combinations(), which produces all possible combinations, given a collection of things and the number of items desired in each combination. This is currently designed to work with a 1-dimensional list, not 2-dimensional as in your use case, so this will produce many unnecessary results that we'll need to filter out (as we never want more than one entry from each bucket).

If we use apoc.coll.combinations() on the possible buckets (1st level indexes) (where each entry is an [bucket, subIndex] pair), and filter on those so that we only keep groupings where each bucket in the combinations grouping is unique, then we'll have the coordinates we need for the proper results, and we can grab them from the input list.

Try this out, this should work with any 2-dimensional list of inputs, though it may grow slower with the number of values:

WITH [["A1","A2","A3"],["B1"],["C1","C2"],["d1","d2","d3","d4"]] as input // parameterize when done testing
UNWIND range(0, size(input)-1) as bucket
UNWIND range(0, size(input[bucket])-1) as subIndex
WITH input, collect([bucket, subIndex]) as coords
WITH input, apoc.coll.combinations(coords, size(input)) as combos
UNWIND combos as combo
WITH input, combo, size(apoc.coll.toSet([coord in combo | coord[0]])) as bucketsUsed
WHERE bucketsUsed = size(input)
RETURN [coord in combo | input[coord[0]][coord[1]]] as combo

As for what's going on in the last WITH clause, we're counting the number of unique buckets per combination (by collecting the buckets per combination, getting the distinct set, then getting the size of that resulting set) and then making sure that's exactly the number of buckets in the input (so for example, we aren't pulling A1 and A2 in the combo, since they're from the same bucket. We need exactly one entry from each bucket).