Programmers: Is this puzzle a "named problem?" and can you help me with it?

Asked by gorillapaws (31099

) January 22nd, 2019

I’m attempting to write a function that takes as an input an array of key-value pairs of arbitrary number, with the keys being strings and the values being an array of strings for each key, also of arbitrary number. e.g.

[“FOO” : [“a”, “b”, “c”, “d”…],
“BAR” : [“X”, “Y”, “Z”, “Q”, …],
“BAZ” : [“3”, “7”, “13”, “42”, “0”, …], …]

The output would be an array of arrays of key-value pairs (aka an array of dictionaries) that represent all possible combinations of keys/values from the input. For example:

[[“FOO” : “a”, “BAR” : “X”, “BAZ” : “3”],
[“FOO” : “b”, “BAR” : “X”, “BAZ” : “3”],
[“FOO” : “c”, “BAR” : “X”, “BAZ” : “3”],
[“FOO” : “d”, “BAR” : “X”, “BAZ” : “3”],
[“FOO” : “a”, “BAR” : “Y”, “BAZ” : “3”], …]

One way to think of the problem is you have a briefcase with an arbitrary number of tumbler locks that each have an arbitrary number of symbols on each tumbler, write a function that will produce an exhaustive list of all possible combinations to the locks.

Is this puzzle a “named” problem like the “traveling salesman” problem?

When trying to think of the solution, I was considering using a tree structure to model the data with the number of entries in the array for each key representing branching width and each key representing depth of the tree, and then using recursion to walk up the tree and assemble the solution arrays. My solution started to become a lot more complicated though and I suspect there’s a much simpler approach to this that I’m not seeing.

Thanks for any help you can offer!

Observing members: 0

Composing members: 0

16 Answers

I’d call it “enumerating a list of all combinations”. There’s a field of math called “combinatorics” which I did not study but I assume probably has a word for this.

It doesn’t seem like a puzzle so much as a task.

It quickly becomes a massive amount of work with massive output if you have more and more values, because they all multiply together to get the number of possible combinations.

i.e. nCombinations = nFOOs x nBARs x nBAZs…

Your approach sounds fundamentally correct. The tree analogy is a nice visual metaphor, but I think it will be simpler to program it without thinking of it.

That is, I think you can “just” do a recursion. Make a function that takes such a list as input, and what it does is goes through every value for the first key, writes the key and each value in turn, followed by the output of a call to that same function, called with a list of all the key/values except the first one that is being iterated by the current execution. ...

Er… I suppose you also need to take a parameter for the previous text to repeat for each iteration, and tweak it to get the output format you want, but that’s essentially it, I think.

Zaku (30810

)“Great Answer” (3

) Flag as…

I am sure problems like this have been solved, but I can’t think of a specific name. As @Zaku says, it is a problem in the field of combinatorics.

@Zaku ‘s approach is probably the simplest way to tackle the problem. Are you familiar with recursion? You have a function with a parameter that is the index of the outer array. The function loops through its string array entry and after adding an entry, it fills in the rest of the output row by calling itself with the input parameter bumped by 1. When the input parameter goes past the end of the outer array, add the entry to the output or just write it out.

The only problem with recursion is that it is inefficient. It takes both time and memory for each level of function call. If the array is really large that could pose a problem. Still, if you don’t use recursion, you still need to keep track of state, which would mean keeping an index pointer for each entry in the outer array. You could, for example, initialize all these pointers to 1 and then take entries from the last array entry until you got to the end. Then increment the next to last index and set the last index to 1 and so on. The inefficiency in this case is keeping track of which pointers at the back of the array have gone to the end.

LostInParadise (32320

)“Great Answer” (3

) Flag as…

Yes. Another description of what @LostInParadise wrote (in case that’s helpful) might be:

Get a list of the number of values for each key.

Then track a cursor position through all the combinations of that map, so that you don’t have the overhead of recursion. (You do have to write the navigation, but it’s not that complicated.)

Zaku (30810

)“Great Answer” (1

) Flag as…

Thanks for the help guys.

So should I be building each complete combination one at a time and then appending them to my output, or generate all combinations in an incomplete state and then update them all through each pass while trying to keep track of where I am? Is this an O^n problem?

Also, is this a problem that could be addressed with matrices, or is that not really the correct way to approach it?

gorillapaws (31099

)“Great Answer” (0

) Flag as…

I would suggest you do the output as you go, so you only do one pass, and so you don’t accumulate a huge amount of data before outputting it if you get passed a big input.

Either recursion or cursor navigation should let you naturally do the output as you go, since you are just walking through the combinations – at each combination you hit, the data should already be there for what you would write to output.

The list of numbers for cursor navigation is just an efficient description of your irregular matrix of input values. I don’t think you need any more matrix that the input itself already is.

I can describe the pseudocode for the matrix navigation if you like – it’s pretty simple but maybe you’d enjoy figuring it out yourself?

Zaku (30810

)“Great Answer” (1

) Flag as…

Thanks Zaku. Thinking of it as moving a cursor (technically multiple cursors) is so much easier than building nodes with parents and children, roots and leaves, etc. Plus the data structure can remain compact.

gorillapaws (31099

)“Great Answer” (0

) Flag as…

It seems most natural to return one row at a time. That way the caller can decide how to process the output. This is how I would handle the output in Python. I would have a shell function to contain the function that does the work. The shell program could be done without parameters, or to make it more general purpose, you could have the input as a parameter.

Suppose the inner function is f and we use the recursive approach. The shell function would have a single array for the output, whose length would of course be the length of the outer input array.
output = [None]*len(input)

Then it would be:
for _ in f(0):
....yield output

The function f(level) would be:
...if level > length(output):
.......yield
else:
...for i in range(length(input[level])):
......output[level] = input[level][i]
..... for _ in f(level+1):
.. .....yield

LostInParadise (32320

)“Great Answer” (1

) Flag as…

Correction: f(level) should start
if level == length(output):
...yield

LostInParadise (32320

)“Great Answer” (1

) Flag as…

SQL can produce the output with a single query, cross joining the 3 lists (FOO, BAR, BAZ)

— Create a table @t with two columns (keys and vals)
declare @t table (keys varchar(3), vals varchar(2))

—Populate table
insert @t
select ‘BAR’ keys, ‘X’ vals
union all select ‘BAR’ keys, ‘Y’ vals
union all select ‘BAR’ keys, ‘Z’ vals
union all select ‘BAR’ keys, ‘Q’ vals
union all select ‘BAZ’ keys, ‘3’ vals
union all select ‘BAZ’ keys, ‘7’ vals
union all select ‘BAZ’ keys, ‘13’ vals
union all select ‘BAZ’ keys, ‘42’ vals
union all select ‘BAZ’ keys, ‘0’ vals
union all select ‘FOO’ keys, ‘a’ vals
union all select ‘FOO’ keys, ‘b’ vals
union all select ‘FOO’ keys, ‘c’ vals
union all select ‘FOO’ keys, ‘d’ vals

—Cross join BAR rows, BAZ rows and FOO rows
select distinct foos.keys fookey, foos.vals fooval, bars.keys barkey, bars.vals barval, bazs.keys bazkey, bazs.vals bazval
from
(select keys, vals from @t where keys = ‘FOO’) as foos
cross join
(select keys, vals from @t where keys = ‘BAR’) as bars
cross join
(select keys, vals from @t where keys = ‘BAZ’) as bazs

Here are the first 10 results (of 80 total):

fookey fooval barkey barval bazkey bazval
FOO a BAR Q BAZ 0
FOO a BAR Q BAZ 13
FOO a BAR Q BAZ 3
FOO a BAR Q BAZ 42
FOO a BAR Q BAZ 7
FOO a BAR X BAZ 0
FOO a BAR X BAZ 13
FOO a BAR X BAZ 3
FOO a BAR X BAZ 42
FOO a BAR X BAZ 7

Call_Me_Jay (14079

)“Great Answer” (1

) Flag as…

For future reference and anyone finding this question in a search, the problem I’m solving is calculating the “Cartesian Product of any number of sets.” Thanks @Call_Me_Jay for the SQL reference. This is a rather nice approach to the problem here.

gorillapaws (31099

)“Great Answer” (1

) Flag as…

Yep, that’s the recursive method.

Zaku (30810

)“Great Answer” (0

) Flag as…

@Zaku, It is not recursive.
I don’t know why I did not think of Cartesian product.
@gorillapaws, It is a nice compact solution. The only problem is that it gives the full result instead of one line at a time, which is okay if that is what you want and have sufficient memory to hold everything.

LostInParadise (32320

)“Great Answer” (1

) Flag as…

@LostInParadise Oh, right, I misread the function Cartesian as calling itself.

Zaku (30810

)“Great Answer” (0

) Flag as…

As a programming exercise, I wanted to see how compactly I could write the program @gorillapaws listed, as a recursive program. I can do it in 7 lines of code,not including the input and call to the function. Probably not very efficient.

input =[[1,2,3],[‘a’,‘b’],[4,5,6]]
def cartesian(n):
......result =[]
......if n ==1:.
............return [[i] for i in input[0]]
......for x in cartesian(n-1):
............for y in input[n-1]:
..................result.append(x+[y])
......return result

print(cartesian(3))

LostInParadise (32320

)“Great Answer” (2

) Flag as…

Technically, the program can be written with 3 lines of code, but the last line is a whopper.

def cartesian(n):
......if n ==1:
..........return [[i] for i in input[0]]
......return [x+[y] for x in cartesian(n-1) for y in input[n-1]]

LostInParadise (32320

)“Great Answer” (1

) Flag as…

This kept bugging me. I figured there must be a simple way of doing this in a single function without recursion. This is what I came up with. I promise this the end of it.

def cartesian(input):
......result=[[]]
......for table in input:
............newResult=[]
............for value in table:
..................for row in result:
........................rowCopy = list(row)
........................rowCopy.append(value)
........................newResult.append(rowCopy)
............result = newResult
......return result

LostInParadise (32320

)“Great Answer” (1

) Flag as…