Cartesian product from array

hagen.ivar · October 8, 2018, 7:39am

Hi!

I have an array of unknown length that I want to create the cartesian product for. In the example I have three arrays with 3, 1 and 2 elements each. So I use three UNWINDs, but can this be done for X number of elements in the array with one statement?


WITH [["A1","A2","A3"],["B1"],["C1","C2"]] AS input
UNWIND input[0] AS uno

UNWIND input[1] AS dos

UNWIND input[2] AS tres

WITH [uno,dos,tres] AS cartesian

RETURN *

The result is :
cartesian ["A1", "B1", "C1"] ["A1", "B1", "C2"] ["A2", "B1", "C1"] ["A2", "B1", "C2"] ["A3", "B1", "C1"] ["A3", "B1", "C2"

michael.hunger · October 9, 2018, 10:06pm

Probably not trivially, you could generate the Cypher from Cypher and use apoc.cypher.run()

I was thinking about using reduce instead of UNWIND but then realized that it doesn't give you arbitrary stack depth.

andrew_bowman · October 9, 2018, 11:28pm

This is a tough one, to do this properly we'd probably need to create a new function to handle this.

You can do this (though not as efficiently) by utilizing apoc.coll.combinations(), which produces all possible combinations, given a collection of things and the number of items desired in each combination. This is currently designed to work with a 1-dimensional list, not 2-dimensional as in your use case, so this will produce many unnecessary results that we'll need to filter out (as we never want more than one entry from each bucket).

If we use apoc.coll.combinations() on the possible buckets (1st level indexes) (where each entry is an [bucket, subIndex] pair), and filter on those so that we only keep groupings where each bucket in the combinations grouping is unique, then we'll have the coordinates we need for the proper results, and we can grab them from the input list.

Try this out, this should work with any 2-dimensional list of inputs, though it may grow slower with the number of values:

WITH [["A1","A2","A3"],["B1"],["C1","C2"],["d1","d2","d3","d4"]] as input // parameterize when done testing
UNWIND range(0, size(input)-1) as bucket
UNWIND range(0, size(input[bucket])-1) as subIndex
WITH input, collect([bucket, subIndex]) as coords
WITH input, apoc.coll.combinations(coords, size(input)) as combos
UNWIND combos as combo
WITH input, combo, size(apoc.coll.toSet([coord in combo | coord[0]])) as bucketsUsed
WHERE bucketsUsed = size(input)
RETURN [coord in combo | input[coord[0]][coord[1]]] as combo

As for what's going on in the last WITH clause, we're counting the number of unique buckets per combination (by collecting the buckets per combination, getting the distinct set, then getting the size of that resulting set) and then making sure that's exactly the number of buckets in the input (so for example, we aren't pulling A1 and A2 in the combo, since they're from the same bucket. We need exactly one entry from each bucket).

Topic		Replies	Views
Trying to create Cartesian product Cypher cypher	13	147	November 20, 2024
Cartesian results General migrated	4	129	October 21, 2022
Avoid cartesian product when create relationships Cypher cypher	10	5159	July 18, 2024
Distinct for sub-arrays Cypher	4	296	December 11, 2020
Cartesian Product of Paths Produces Unexpected Result Cypher	2	513	December 22, 2020

Take the Course Then Join The Aura Agent Hackathon

Cartesian product from array

Related topics

Take the Course Then Join
The Aura Agent Hackathon