Playing Hard Mastermind Games with a SATbased AI
Back in the day, Mastermind was a popular two player codebreaking game, and many variations thereof still exist as both standalone games and puzzles within other games. Although it is difficult for a human player to make optimal guesses of the secret code, or at least guesses that do not conflict with the provided clues, the setting is usually simple enough for an AI to find such candidates via explicit exploration of the game tree.
However, such approaches become unfeasible when the number of possibilities for secret codes grows into the millions. This post illustrates the problem with standard approaches, and how finding consistent candidates can be approached with SAT solving – yielding an AI that can handle ordersofmagnitude harder Mastermind instances.
Introduction
Every now and then, my friends and I gather to play Mansions of Madness – a cooperative boardgame. In line with its Lovecraftian setting, the game is often unforgiving so we try to optimise our every move. Besides the core gameplay, solving various puzzles is often needed to advance. One of these puzzles is in fact a Mastermind variant, and, as some players argue, the instances may be rather difficult to reason about – resulting in wasted turns and decreased chance of winning the scenario. This is how the problem of making educated guesses in Mastermind piqued my interest.
Rules of Mastermind
In the classic Mastermind game, a secret code $s$ is a row of four pegs, where each peg is of one of six colours. For example given the colours $ \textcolor{#272822}● \textcolor{#f92672}● \textcolor{#a6e22e}● \textcolor{#f4bf75}● \textcolor{#66d9ef}● \textcolor{#ae81ff}● $, the following sequence would be a possible secret code:
$$ \tag{1} s=\textcolor{#a6e22e}●\textcolor{#272822}{●●}\textcolor{#f92672}{●} $$This secret is designed by and only known to the codemaker. In every round, the codebreaker makes a guess, and receives a pair of numbers as feedback/clue from the codemaker:
 The number of full matches states how many of the pegs coincide with the secret.
 The number of partial or symbol matches states how many additional full matches can be achieved by reordering the pegs.
For example let $s$ be the secret from $(1)$ and $g=\textcolor{#a6e22e}{●}\textcolor{#f92672}{●●}\textcolor{#ae81ff}{●}$ be the codebreaker’s guess. The codemaker’s feedback to that guess would be $(1, 1)$, since
 only one peg matches exactly (the green one in the first position)
 even though there are two pink pegs, only one of them can become a full match if put in the last position – we can’t put both there
The following code computes such feedback in two phases.
First, we get rid of all full matches from both the guess
and secret
, and then try to match the remaining symbols:


Example Playthrough
A full playthrough of classic Mastermind, i.e. with codes of length four, and six colours/symbols $\Sigma=\{ \textcolor{#272822}●, \textcolor{#f92672}●, \textcolor{#a6e22e}●, \textcolor{#f4bf75}●, \textcolor{#66d9ef}●, \textcolor{#ae81ff}● \}$ to choose from, may look as follows.
The game starts with the codemaker picking a secret $s\in\Sigma^4$. To keep things simple, let this be the familiar combination $s=\textcolor{#a6e22e}●\textcolor{#272822}{●●}\textcolor{#f92672}{●}$ from $(1)$. Without questioning the reasoning of the codebreaker, the following table outlines possible guesses and received feedback of every round.
Round  Guess  Feedback 

1  $\textcolor{#272822}●\textcolor{#ae81ff}{●●}\textcolor{#66d9ef}{●}$  $(0, 1)$ 
2  $\textcolor{#f92672}●\textcolor{#f4bf75}{●}\textcolor{#272822}{●●}$  $(1, 2)$ 
3  $\textcolor{#f4bf75}●\textcolor{#a6e22e}{●}\textcolor{#272822}{●}\textcolor{#f92672}{●}$  $(2, 1)$ 
4  $\textcolor{#a6e22e}●\textcolor{#272822}{●●}\textcolor{#f92672}{●}$  $(4, 0)$ 
This is in fact the log of an actual game played by the SATbased AI that we will end up with in the end of the post.
The corresponding game loop amounts to few lines of code.
It requires a potential player or AI to be able to give a first guess at the beginning of the game (initial_guess()
), and incorporate feedback for subsequent guesses (make_guess(feedback)
):


Where’s the Difficulty?
With four positions and six colours there are only $6^4=1\,296$ possible combinations to choose from, so keeping the set of all combinations in memory and working with it is perfectly feasible when solving classic Mastermind instances. However, there is nothing hindering us from using even longer sequences or introducing further colours. The code length $s$ and set of colours/symbols $\Sigma$ are exactly the parameters that are typically varied to arrive at the different variations of Mastermind.
For example, Grand Mastermind keeps the secret length of the original but uses $25$ symbols ($5$ shapes, $5$ colours) which allows for $25^4=390\,625$ different codes. Overall, most Mastermind variants feature a number of combinations in this order of magnitude (thousands), rendering them amenable to analysis via methods that explicitly explore the game tree. At the time of writing, the Wikipedia page lists only one exception: Mastermind Secret Search uses $26$ letters as symbols to form words up to a length of six characters, which gives rise to $26^6=308\,915\,776$ possible secrets.
But how to approach playing Mastermind variants with $6^{16}=2\,821\,109\,907\,456$ or more combinations without resorting to blind guessing?
Just to give you an idea: even iterating through all these combinations will take forever
and is out of question for a game AI.
Therefore, even when not aiming for optimal play but merely reasonable guesses, explicit and enumerative methods without backtracking or learning from conflicts won’t get us far.
However, they work well for most practical Mastermind variants and are a good place to start getting a feel for the problem domain.
Explicit Approaches
In contrast to symbolic methods for search and reachability checking, which work with implicit representations of state sets, explicit methods actually construct each state (here: secret codes) separately and store each in an accessible form in memory.
In the following implementations, we represent secrets as tuples of integers.
That is, the symbol set
$\Sigma=\{
\textcolor{#272822}●,
\textcolor{#f92672}●,
\textcolor{#a6e22e}●,
\textcolor{#f4bf75}●,
\textcolor{#66d9ef}●,
\textcolor{#ae81ff}●
\}$
is treated as $\Sigma=\{0,1,2,3,4,5\}$, and the secret $s=\textcolor{#a6e22e}●\textcolor{#272822}{●●}\textcolor{#f92672}{●}$ is just the tuple $s=(2,0,0,1)$ to us.
Given $s$ and $\Sigma$, we can now easily enumerate all possible secrets with Python’s product
function:


Refinement of Consistent Candidates
The simplest approach to playing Mastermind is probably to iterate over all_secrets()
and commit each element, since we must eventually encounter the code chosen by the codemaker.
However, this approach completely ignores the feedback received from the codemaker, and will obviously lead to many wasted rounds for the codebreaker.
The next best thing we can try is to avoid making guesses that are inconsistent with the accumulated feedback.
For this, we must merely check that whatever we assume secret
to be, it generates the same feedback for the previous guesses as the actual secret.
For every guessfeedback pair from previous rounds the following function must return $\mathit{true}$:


This is exactly the approach suggested by Shapiro in 1983, and a naïve implementation is provided with ExplicitConsistentAi
:


self.candidates
stores all candidates that are consistent with the received feedback.
Prior to the first guess there is no feedback, so initially all possible secrets are consistent.
Although the implementation always returns the first element, there is nothing wrong with picking a different one.
Unless a guess happens to match the secret exactly, we will receive feedback $f$ that renders some elements of self.candidates
inconsistent with $f$.
Therefore, after every round, such candidates and the last (wrong) guess are removed from self.candidates
.
Even though the approach does not result in optimal play, and many instances can be won in less turns, it is computationally cheap and solves a classic Mastermind instance in $5.765$ turns on average. This is surprisingly close to the theoretical minimum of $4.34$ turns.
Lazy Enumeration of Consistent Candidates
A downside of the above method is that it keeps a list of all consistent candidates in memory. Since the number of possible secrets grows exponentially in the admissible length of secrets, this may already hinder us from solving slightly harder Mastermind variations, e.g. with $6^8$ possible secrets.
LazyExplicitConsistentAi
is a more practical implementation of the previous approach.
It enumerates the consistent combinations but only needs one candidate to be explicitly present in memory at a time, which results in a small memory footprint:


Unlike the previous approach, which incrementally refines the set of candidates w.r.t. the latest feedback, a lazily generated candidate is checked w.r.t. all of the received feedback:


Minimaxbased Guessing
Since my original motivation for looking into the topic was to learn how to play the Mastermind games in Mansions of Madness optimally, I feel like I should outline how to do this, too. Even though the central topic of this post is the (symbolic) computation of consistent guesses for hard Mastermind instances.
Instead of just picking some consistent candidate, one can also analyse how promising the various candidates are and pick the best
one – depending on some heuristic or notion of quality.
The first and probably most popular way for picking a “good” candidate was suggested by Knuth in 1977. The general idea is that the best guess should minimise the set of consistent candidates – no matter the feedback. By assuming the least helpful feedback to be returned for each guess, and picking the (not necessarily consistent) guess in this setting that will yield the smallest set of consistent candidates, Knuth effectively implements a shallow Minimax rule. While a treelike illustration is most helpful for Minimax on longer games, I found the tableoriented argument from the overview to be more apt in the case of Mastermind.
Consider committing some candidate $c$ and, in turn, receiving some feedback $f$. The following table illustrates the possible outcomes depending on the chosen $c$ and the codemaker’s feedback $f$ for the very first guess:
$(0,0,0,0)$  $(0,0,0,1)$  $(0,0,1,0)$  $(0,0,1,1)$  $\dots$  

$(0,0)$  $625$  $256$  $256$  $256$  $\dots$ 
$(0,1)$  $0$  $308$  $308$  $256$  $\dots$ 
$(0,2)$  $0$  $61$  $61$  $96$  $\dots$ 
$(0,3)$  $0$  $0$  $0$  $16$  $\dots$ 
$(0,4)$  $0$  $0$  $0$  $1$  $\dots$ 
$(1,0)$  $500$  $317$  $317$  $256$  $\dots$ 
$(1,1)$  $0$  $156$  $156$  $208$  $\dots$ 
$(1,2)$  $0$  $27$  $27$  $36$  $\dots$ 
$(1,3)$  $0$  $0$  $0$  $0$  $\dots$ 
$(2,0)$  $150$  $123$  $123$  $114$  $\dots$ 
$(2,1)$  $0$  $24$  $24$  $32$  $\dots$ 
$(2,2)$  $0$  $3$  $3$  $4$  $\dots$ 
$(3,0)$  $20$  $20$  $20$  $20$  $\dots$ 
$(4,0)$  $1$  $1$  $1$  $1$  $\dots$ 
For each possible guess, there is a column which states the sizes of the resulting sets of consistent candidates for each possible feedback. To make it more clear, consider picking $c=(0,0,0,0)$ for the first guess. The worst that could happen next would be receiving the feedback $(0,0)$, since this would still leave us with a set of $625$ candidates that are consistent with the feedback. In contrast, the best feedback would clearly be $(4,0)$, as this would imply that our guess was equal to the secret code and we just won.
To develop the idea further, finding the best
guess amounts to constructing such a table, identifying the worst case of each column, and picking the column/candidate with the smallest worst case.
According to the provided table, a best initial guess would be $(0,0,1,1)$, which guarantees to bring the set of candidates down to a size of at most $256$.
ExplicitMinimaxAi
implements this method to determine the best guess:


Note that the best move may even be to gather further information and purposefully make a guess that is inconsistent with the received feedback.
However, if there is both a consistent and an inconsistent candidate that are best
, one should of course prefer the consistent one – it might match the secret after all.
That’s why we also keep the inconsistent candidates (self.not_candidate
) around.
The Minimaxbased approach solves any classic Mastermind instance within five guesses, and needs $4.476$ rounds on average. Unfortunately, the runtime complexity of this and similar methods that optimise rigorously is quadratic in the number of candidates. Considering that even iterating over all possible secrets of a hard Mastermind instance takes too long for a game AI, aiming for optimal guesses is quite a stretch.
SATbased Approach
So far we’ve seen that the Minimaxbased approach is wellsuited for Mastermind instances with small numbers of possible secrets, and that the lazy enumeration of consistent candidates is not optimal, but cheaper to compute, and on average not much worse. However, despite the small memory footprint, the core problem is that enumerative methods don’t scale to larger Mastermind instances.
It would be nice if we could at manage to compute consistent candidates for the hard instances, but even this problem is known to be NPcomplete. Luckily, we can reduce the problem to another wellstudied NPcomplete problem – the Boolean satisfiability problem (SAT) – and leverage the highlytuned solvers existing for it.
All we have to do is devise a logical characterisation of the candidates that are consistent with the feedback received so far and use a SAT solver to acquire such a candidate. Let’s see what kind of variables and constraints we need for this.
Encoding the Symbol Mapping
Since we want to guess some secret $s$, we first of all need variables that indicate which symbol occurs at which position. To this end, we introduce the Boolean variables $$ s_{\mathit{pos},\mathit{sym}} $$ where $\mathit{pos}\in \{0,\dots,s1\}, \mathit{sym}\in\Sigma$.
The intended semantics is that an assignment $s_{1,2}\mapsto\mathit{true}$ denotes symbol $2$ to be at index $1$ of $s$. However, since there are no constraints yet, a solver might just as well choose $s_{1,2}$ and $s_{1,3}$ to be $\mathit{true}$, which suggests both $2$ and $3$ to be at index $1$ of the secret.
To characterise that each position of the tuple $s$ has exactly one symbol, we need add an according cardinality constraint for each position. Assuming the classic Mastermind setting, i.e. $s=4$ and $\Sigma=6$, we add the constraints:
$$ \begin{aligned} s_{0,0} + s_{0,1} + s_{0,2} + s_{0,3} + s_{0,4} + s_{0,5} &= 1,\\ s_{1,0} + s_{1,1} + s_{1,2} + s_{1,3} + s_{1,4} + s_{1,5} &= 1,\\ s_{2,0} + s_{2,1} + s_{2,2} + s_{2,3} + s_{2,4} + s_{2,5} &= 1,\\ s_{3,0} + s_{3,1} + s_{3,2} + s_{3,3} + s_{3,4} + s_{3,5} &= 1\\ \end{aligned} $$If we were to query the satisfiability of our SAT instance now, we would get a satisfying assignment with four of the $s_{\mathit{pos},\mathit{sym}}$ variables set to $\mathit{true}$ – one for each $\mathit{pos}$.
Using the Python bindings for Z3 – a popular solver for SAT and SMT instances – the initialisation of SymbolicConsistentAi
creates the discussed variables and constraints:


Encoding Full Matches
Initially, there is no feedback to constrain the set of consistent candidates, and picking a random candidate is unproblematic. Any subsequent guess, however, should at least be consistent with the feedback accumulated up to that point. To characterise such candidates, we next consider which constraints some feedback $f$ for a committed candidate $c$ imposes.
Consider the first part of the feedback: the number of full matches. This is clearly another cardinality constraint, but this one does not constrain the $s_{\mathit{pos},\mathit{sym}}$ variables directly. Instead we need some auxiliary variables $$ \mathit{fm}_\mathit{pos} $$ where $\mathit{pos}\in \{0,\dots,s1\}$, to identify in which position we have a full match and count the number of matches. These are the variables for which we create the constraint. Assuming the classic Mastermind setting and some feedback $f=(2,1)$, we simply add $$ \mathit{fm_0} + \mathit{fm_1} + \mathit{fm_2} + \mathit{fm_3} = 2 $$
Although the $\mathit{fm}_\mathit{pos}$ variables are now constrained, they are not related to any $s_{\mathit{pos},\mathit{sym}}$ yet. Intuitively, a variable $\mathit{fm}_\mathit{pos}$ should be $\mathit{true}$, if and only if the symbol at index $\mathit{pos}$ of a committed candidate has the same symbol as the secret at index $\mathit{pos}$ – that’s what a full match is. With this in mind, and assuming the candidate to have been $c=(3,2,0,1)$, the following constraints establish the missing link:
$$ \begin{aligned} s_{0,3} &= \mathit{fm}_0,\\ s_{1,2} &= \mathit{fm}_1,\\ s_{2,0} &= \mathit{fm}_2,\\ s_{3,1} &= \mathit{fm}_3 \end{aligned} $$The first part of make_guess
mirrors the discussed idea, but introduces fresh variables $\mathit{fm}_\mathit{pos}$ in every iteration since the feedback of different rounds is unrelated:


Encoding Symbol Matches
What remains is the creation of constraints which enforce consistency w.r.t. the second part of the feedback: the number of symbol matches.
Unlike the full match flag
that we introduced for every position, a symbol match relates two positions – one from the guess with one from the secret.
To this end, we introduce the Boolean variables
$$
\mathit{sm}_{\mathit{src},\mathit{dst}}
$$
where $\mathit{src},\mathit{dst}\in \{0,\dots,s1\}$.
A true literal $\mathit{sm}_{2,3}$ shall denote that the symbol at index $2$ of the guess can be moved to index $3$ to achieve a full match.
With this in mind, we can easily construct an according cardinality constraint for some feedback $f=(2,1)$:
$$ \mathit{sm}_{0,1} +\mathit{sm}_{0,2} + \dots + \mathit{sm}_{3,2} = 1 $$Note that no constraint should have a variable of the form $\mathit{sm}_{\mathit{pos},\mathit{pos}}$, as such a case would actually describe a full match – not a symbol match.
To characterise the intended relation between the $\mathit{sm}_{\mathit{src},\mathit{dst}}$ variables and the $\mathit{s}_{\mathit{pos},\mathit{sym}}$ variables, we define a matching scheme that takes existing full matches into consideration and prioritises low indices. A variable $\mathit{sm}_{\mathit{src},\mathit{dst}}$ should be $\mathit{true}$, if and only if all of the following conditions are met:
 There is no full match at position $\mathit{src}$: $$ \neg \mathit{fm}_\mathit{src} $$
 The symbol $\mathit{sym}$ at index $\mathit{src}$ of the guess is the same as the symbol at index $\mathit{dst}$ of the secret: $$ s_{\mathit{dst},\mathit{sym}} $$
 There is no full match at position $\mathit{dst}$: $$ \neg \mathit{fm}_\mathit{dst} $$
 No smaller (higher priority) position has matched the symbol at $\mathit{dst}$: $$ \bigwedge_{\substack{0\leq\mathit{prev}<\mathit{src}\\ \mathit{prev} \neq \mathit{dst}}} \neg \mathit{sm}_{\mathit{prev},\mathit{dst}} $$
 The symbol at index $\mathit{src}$ has not matched any symbol on a smaller (higher priority) position: $$ \bigwedge_{\substack{0\leq\mathit{prev}<\mathit{dst}\\ \mathit{prev} \neq \mathit{src}}} \neg \mathit{sm}_{\mathit{src},\mathit{prev}} $$
The rest of the make_guess
method – started in the previous section – implements the proposed encoding:


Note that we treat the $\mathit{sm}_{\mathit{src},\mathit{dst}}$ variables similar to the feedbackspecific $\mathit{fm}_\mathit{pos}$ variables: we introduce fresh ones for every feedback.
Computing a Guess
We now have all the constraints together for the $s_{\mathit{pos},\mathit{sym}}$ variables to characterise the code sequences that are consistent with the received feedback. To actually compute a guess, we just need to find an assignment that satisfies these constraints. Luckily, we can delegate this task to a SAT solver.
To construct the guess, we identify the $s_{\mathit{pos},\mathit{sym}}$ variables that are set to $\mathit{true}$:


That’s all there is to it.
Feel free to experiment with the provided implementation of the discussed methods.
Using the SymbolicConsistentAi
you can now easily play Mastermind games with huge numbers of secret code combinations.
On average, it takes only $10.51$ guesses to find the secret within $2\,821\,109\,907\,456$ possible combinations of the classic Mastermind variant with codes of length $16$ (see experiments below).
Experiments
For reference, I’ve run some experiments to measure the performance on an i77700K CPU, using Python 3.8 and Z3 v4.8.7.
The following table provides both a qualitative and quantitative comparison of the lazy enumeration AI and the SATbased AI, with measurement cells formatted as (Explicit / Symbolic)
:
$s,\Sigma$  Games Played  Avg. Turns  Max. Turns  Avg. Game Time 

4, 6  all 1 296  5.765 / 4.659  9 / 7  2ms / 17ms 
5, 6  all 7 776  6.218 / 5.100  11 / 8  11ms / 34ms 
6, 6  all 46 656  6.735 / 5.572  12 / 9  71ms / 60ms 
7, 6  10 000  7.301 / 6.072  11 / 9  466ms / 101ms 
8, 6  1 000  7.874 / 6.560  12 / 10  3s / 161ms 
9, 6  10 000  ? / 7.237  ? / 11  ? / 263ms 
10, 6  10 000  ? / 7.901  ? / 12  ? / 413ms 
16, 6  1 000  ? / 10.51  ? / 15  ? / 6.6s 
As to be expected, the lazy enumeration works well for Mastermind variants with small numbers of possible secrets but becomes unusably slow on harder instances.
Such measurements were aborted and are denoted by ?
in the table.
Furthermore, once the average game times went into the seconds, I switched from evaluating the performance for each possible secret to a fixed number of secrets that are evenly spaced over all possible combinations.
Accordingly, for those nonexhaustive runs, the measured maximal number of turns is not a guaranteed worst case for all instances.
Although both approaches implement the same idea, i.e. make guesses that are consistent with the feedback, the qualitative results differ. This is because our implementation of Shapiro’s idea picks the smallest consistent candidate but the SATbased approach picks some consistent candidate. The latter seems to be advantageous.
Do Try This at Home!
As usual, there is a lot of room for further improvement and experimentation. Here are some points you might want to explore on your own (easiest first):
 There are many low hanging fruits regarding performance improvements. Python is convenient for prototyping but its interpreted nature slows down the execution. Try alternative Python implementations like Pypy to get a significant speedup of the explicit approaches for free, or port the approaches to your favorite compiled language. Furthermore, even though Z3 has tons of features and is a joy to use, I expect dedicated SAT solvers like Lingeling to be even faster at solving the resulting constraints.
 Try to devise a SAT/SMTbased solver for a Mastermind variation with different forms of feedback. For example, Number Mastermind is played with numbers and the secret’s sum is an additional clue.
 The proposed logical characterisation is just what seemed most straightforward to me. Try reformulating the encoding to achieve even better solver performance – I might have overlooked a thing or two.
 Investigate whether it is feasible to implement the Minimaxstyle approach (or some other heuristic) symbolically. Depending on the chosen approach, it might help to look into MAXSAT or (approximate) model counting.