# Cartesian product from array

(Hagen Ivar) #1

Hi!

I have an array of unknown length that I want to create the cartesian product for. In the example I have three arrays with 3, 1 and 2 elements each. So I use three UNWINDs, but can this be done for X number of elements in the array with one statement?

``` WITH [["A1","A2","A3"],["B1"],["C1","C2"]] AS input UNWIND input[0] AS uno UNWIND input[1] AS dos UNWIND input[2] AS tres WITH [uno,dos,tres] AS cartesian RETURN * ```

The result is :
``` cartesian ["A1", "B1", "C1"] ["A1", "B1", "C2"] ["A2", "B1", "C1"] ["A2", "B1", "C2"] ["A3", "B1", "C1"] ["A3", "B1", "C2" ```

(Michael Hunger) #2

Probably not trivially, you could generate the Cypher from Cypher and use apoc.cypher.run()

I was thinking about using `reduce` instead of `UNWIND` but then realized that it doesn't give you arbitrary stack depth.

(Andrew Bowman) #3

This is a tough one, to do this properly we'd probably need to create a new function to handle this.

You can do this (though not as efficiently) by utilizing `apoc.coll.combinations()`, which produces all possible combinations, given a collection of things and the number of items desired in each combination. This is currently designed to work with a 1-dimensional list, not 2-dimensional as in your use case, so this will produce many unnecessary results that we'll need to filter out (as we never want more than one entry from each bucket).

If we use `apoc.coll.combinations()` on the possible buckets (1st level indexes) (where each entry is an `[bucket, subIndex]` pair), and filter on those so that we only keep groupings where each bucket in the combinations grouping is unique, then we'll have the coordinates we need for the proper results, and we can grab them from the input list.

Try this out, this should work with any 2-dimensional list of inputs, though it may grow slower with the number of values:

``````WITH [["A1","A2","A3"],["B1"],["C1","C2"],["d1","d2","d3","d4"]] as input // parameterize when done testing
UNWIND range(0, size(input)-1) as bucket
UNWIND range(0, size(input[bucket])-1) as subIndex
WITH input, collect([bucket, subIndex]) as coords
WITH input, apoc.coll.combinations(coords, size(input)) as combos
UNWIND combos as combo
WITH input, combo, size(apoc.coll.toSet([coord in combo | coord[0]])) as bucketsUsed
WHERE bucketsUsed = size(input)
RETURN [coord in combo | input[coord[0]][coord[1]]] as combo
``````

As for what's going on in the last WITH clause, we're counting the number of unique buckets per combination (by collecting the buckets per combination, getting the distinct set, then getting the size of that resulting set) and then making sure that's exactly the number of buckets in the input (so for example, we aren't pulling `A1` and `A2` in the combo, since they're from the same bucket. We need exactly one entry from each bucket).