A Path to a New Sudoku Algorithm

08/11/2021

Warning: This one's a pretty long read. I might add a table of contents to my blog generator one day, but don't hold out hope.
Sorry.

You're all familiar with Sudoku, the puzzle game? You're given a partially filled in grid of numbers, something like this:

+-----+-+-----+-+-----+
|     | |9    | |2 1 8|
|  7  | |    4| |     |
|2    | |  3  | |  4  |
+-----+-+-----+-+-----+
|9 2  | |     | |8    |
|  1  | |  6  | |  7  |
|    8| |     | |  9 4|
+-----+-+-----+-+-----+
|  3  | |  5  | |    9|
|     | |7    | |  2  |
|5 9 4| |    2| |     |
+-----+-+-----+-+-----+

And then you need to turn it into a fully filled in grid of numbers, like this:

+-----+-+-----+-+-----+
|3 4 5| |9 7 6| |2 1 8|
|8 7 9| |2 1 4| |3 5 6|
|2 6 1| |5 3 8| |9 4 7|
+-----+-+-----+-+-----+
|9 2 7| |1 4 5| |8 6 3|
|4 1 3| |8 6 9| |5 7 2|
|6 5 8| |3 2 7| |1 9 4|
+-----+-+-----+-+-----+
|7 3 2| |4 5 1| |6 8 9|
|1 8 6| |7 9 3| |4 2 5|
|5 9 4| |6 8 2| |7 3 1|
+-----+-+-----+-+-----+

Where the filled-in grid follows a few rules.

Each 3×3 box in the Sudoku needs to contain the digits 1–9;
Each row in the Sudoku needs to contain the digits 1–9; and
Each column in the Sudoku needs to contain the digits 1–9.

Sudoku is a very popular puzzle game, and a lot of very smart people have come up with algorithms to solve Sudokus either by hand or programmatically. When I decided I wanted to make my own solver, I started looking into those algorithms a lot. I nearly ended up just re-implementing an existing algorithm, but instead I ended up having a very interesting idea. I can't say that the idea I had was a particularly good one, and I haven't noticed anything spectacular in the performance of the solver that it led to, but it was a fun adventure.

What I'm about to describe isn't something I've seen before in other solvers, and so I hope that means I've come up with something unique. However, Sudoku solving algorithms are hardly a small or unexplored field, so it's quite possible that everything I'm about to do has already been done by someone else. If you can help direct me to a blog post or repo or person that's done this before, by all means go ahead, I'd be keen to see it.

Mainstream Solvers

Before we get into what I came up with, I want to quickly summarise how regular Sudoku solving algorithms work. This is partly so I can write some comparisons with them later⁰, but it's also because I did a lot of research into this before swapping over to my current algorithm and I don't want that knowledge to go to waste. So with that said, and at risk of sounding a lot like my algorithms lecturer from uni, there are two key facts about Sudoku as a puzzle that influence how most solvers are designed.

Sudoku (with an N×N grid, instead of 9×9) fits into the "Nondeterministic Polynomial" ( $NP$ ) complexity class of problems. That is to say, if you had a computer with infinite CPU cores¹, you could solve a Sudoku puzzle in polynomial time.
Any other $NP$ problem can be transformed into a Sudoku (again the N×N variant) and back again in polynomial time, with the solution to the Sudoku being transformed into a solution for the original problem.

The first of these properties is reasonably simple, at least compared to the other — you could write a polynomial time algorithm to see if an assignment of digits is valid, and then just get each core to pick a different permutation of the 81 digits to test, reporting back when it's done.

The second property is incredibly complex. There is a small group of problems that have been discovered with this property, but it's only ever been directly proven for one problem (that I know of). Everything else, including N×N Sudoku, is shown to have this property transitively: you can reduce any $NP$ problem to $x$ , and you can reduce $x$ to an N×N Sudoku, therefore you can reduce any $NP$ problem to an N×N Sudoku.

If you know both of these properties about a problem, it gives you a shortcut solution that you can use instead of writing your own algorithm. A problem with both of these properties is called " $NP$ Complete" and any $NP$ Complete problem can be solved by transforming it into a different $NP$ Complete problem without any real performance loss.

Property 1 of Sudoku, combined with property 2 of our target $NP$ Complete problem, shows that the transformation is possible in polynomial time.
Property 2 of Sudoku, combined with property 1 of our target $NP$ Complete problem, shows that the solution for our target problem is most likely at least as fast as the solution for any direct Sudoku algorithm, as you could always solve it by transforming it back at only polynomial cost.
The unproved yet quite likely hypothesis that $P != NP$ implies that the actual cost of solving the $NP$ Complete problem is way higher than the cost of the polynomial transformation between the problems, so the overhead isn't worth considering.

And all of this combines with the fact that the poster child of $NP$ Complete problems, Boolean Satisfiability (SAT), has been solved by dozens of hyper-optimised engines that are almost always going to be faster than anything you write for yourself, and you're left with a neat solution to any $NP$ Complete problem you're stuck with. You transform it into a SAT problem, you run the off-the-shelf SAT solver on it, and you transform your solution back.

And as far as I can tell, this is how most Sudoku-solving algorithms work today. There's some caveats where you might try to write a custom SAT solver that's more tuned to the exact type of SAT problems your Sudoku turns into, but it's all essentially the same underlying SAT algorithms underneath.

The First Breakthrough

So for a while, I was on a nice trajectory from reading how SAT algorithms work to implementing my own custom SAT solver. It was going to be a nice learning experience, and I'd have a shiny project to put on my GitHub. Where did it all go wrong?

What happened is, I was browsing the SAT-solver algorithm / Sudoku corner of Wikipedia, when I stumbled across a lovely page: Mathematics of Sudoku — Solutions. The discussion here is around the number of different ways to fill out a Sudoku grid. The first number they give is $6.67 \times 10^{21}$ , which is ridiculous. But later down on the page, they give a number for how many "essentially different" Sudoku solutions are available (essentially different meaning that isn't just another Sudoku with some rows swapped around or something like that). And that number is just under five and a half billion. This is where the breakthrough happened, and the downward spiral began. Because, when I saw that number on my screen, I had a wonderful thought:

That sounds like it would fit in ram!

Now there are some caveats here: you'd need a lot of ram. Unfortunately my first estimate of "5.5 billion solutions" is roughly "5.5 GigaBytes" was way off, because you'd need more than one byte per Sudoku puzzle. There's also generally some overhead involved with indexing the data for fast lookups, but the bottom line is still the same. It might not fit in the ram on my computer, but the GCP instance I'd need to hire to fit it all in RAM would probably not break the bank if I didn't leave it running 24/7. I started coming up with a rough algorithm:

Parse the Sudoku;
Perform a series of flips, row/column swaps and number-reassignments to get the board into a "canonical representation" that matches my database;
Perform a (hopefully) $\log{N}$ time lookup through the database to see which puzzle solution matched the hints I was provided with;
Apply the reverse of the transformations I did in step 2 to the canonical-form solution; and finally
Print the answer.

When you write it out like that, it honestly seems pretty simple. There are only two teeny tiny issues that I completely underestimated.

Indexing the solution database for fast lookups
Inventing a canonical representation of a solved Sudoku that I could transform partial puzzles into.

Everything Going Wrong

I promise, I was going to write something here. But by the time I got to the end of the blog post and was ready to come back to it, everything was already way too long. In summary, I couldn't find a solution to either problem. I also almost think it might not be possible to solve the second problem, but I'm not sure there.

Moving on…

A Path Forward

I went back to the drawing board, and started playing with some easy Sudokus to clear my mind while I thought. And as I was playing, I realised that the way I looked at Sudokus while I was playing was different to how I was thinking about them while writing code. I came up with a (very preliminary) brand new idea that would fit a few fun criteria that I was looking for in my algorithm:

Still involve generating a giant database in memory and searching it;
Stay close to how I think about Sudokus in my head while solving them; and
Not be too complicated to implement (or hopefully optimise).

And it all revolves around a concept of Paths (made-up term, because I couldn't find anything online of people taking the same approach as me). So let's get into this, with a small example from the easy-level Sudoku I was working on at the time.

+-----+-+-----+-+-----+
|  3  | |    4| |  7 8|
|     | |9 3  | |1    |
|2    | |1    | |  9 3|
+-----+-+-----+-+-----+
|3   9| |4    | |    7|
|  1  | |3 8 9| |5 4  |
|  2 8| |     | |  3  |
+-----+-+-----+-+-----+
|  9 7| |8 4 3| |    6|
|6 4 3| |5   1| |7    |
|  8  | |6    | |3 5  |
+-----+-+-----+-+-----+

At this point I've solved all the 3s, and I want to get started on the 1s. Let's take some of the clutter away from the board for a second, and have a new look.

+-----+-+-----+-+-----+
|? x ?| |    x| |  x x|
|     | |x x  | |1    |
|x    | |1    | |  x x|
+-----+-+-----+-+-----+
|x   x| |x ?  | |  ? x|
|  1  | |x x x| |x x  |
|  x x| |  ?  | |  x ?|
+-----+-+-----+-+-----+
|? x x| |x x x| |  ? x|
|x x x| |x   1| |x    |
|? x ?| |x    | |x x ?|
+-----+-+-----+-+-----+

I've put a 1 in all the squares that contain a 1, an x in all the squares that contain a 2-9, and a ? in all the squares that appear legally able to contain a 1 at this point in time. There are eleven question marks on the board so far, and of those five need to become 1s before the puzzle is completed. Naively, this is looking like (checks combinatorics notes that I haven't used in many years) 462 possible ways of filling in the question marks.

By thinking in Paths, we can reduce this to just three possibilities. I'm going to start by zooming in on the middle box, and the box immediately to its right.

+-----+-+-----+
|x ?  | |  ? x|
|x x x| |x x  |
|  ?  | |  x ?|
+-----+-+-----+

On their own, these two boxes have four question marks, and need two 1s to be complete. The maths says there's six ways this could go down, but practically you can look at the Sudoku puzzle and narrow it down to two possibilities. I'll redraw the two boxes using L to show where the 1s would go for one of the possibilities, and using R to show where they go for the other possibility.

+-----+-+-----+
|x R  | |  L x|
|x x x| |x x  |
|  L  | |  x R|
+-----+-+-----+

Any other way of choosing how to allocate the 1s aside from these two would lead to conflicts. Now that we know that the middle box depends entirely on the box to its right (or vice versa) I'm going to hide the middle box for a while, and focus on the center-right box and the bottom-right box.

+-----+
|  L x|
|x x  |
|  x R|
+-----+
|  ? x|
|x    |
|x x ?|
+-----+

The two possibilities we had from our previous exercise carry across here: we can assign L and R to the question marks in this new box as well. The problem of selecting three 1s from six ?s is ultimately still down to a binary choice between the L assignment and the R assignment.

+-----+-+-----+
|x L  | |  R x|
|x x x| |x x  |
|  R  | |  x L|
+-----+-+-----+
        |  L x|
        |x    |
        |x x R|
        +-----+

Once you add in the next connected box, the one on the bottom left (which shares rows with the bottom-right box, that you can use to cancel things out), things get more complicated. We go from two valid assignments of 1s among the question marks to three, and some of the assignments overlap, so I'm not going to be able to draw them in the same diagram any more. However, that makes this a good time to swap from talking about assignments to talking about Paths.

What is a Path?

A Path is a series of nine locations across the Sudoku grid, such that one location from the path is present in each row, column and box of the overall grid. In a solved Sudoku, you could assign one path to each digit, making the locations in the path the locations on the grid that that digit is present. Any set of nine paths that don't overlap with each other, and therefore fill up the board, would constitute a solution to the Sudoku.

Here's an example Path, which is the Path that I filled with 3s when I first started solving the example Sudoku. The exclamation marks show which locations in the grid are part of the Path.

+-----+-+-----+-+-----+
|  !  | |     | |     |
|     | |  !  | |     |
|     | |     | |    !|
+-----+-+-----+-+-----+
|!    | |     | |     |
|     | |!    | |     |
|     | |     | |  !  |
+-----+-+-----+-+-----+
|     | |    !| |     |
|    !| |     | |     |
|     | |     | |!    |
+-----+-+-----+-+-----+

My next breakthrough, and what actually led to the workable solver that I have now, was deciding to throw away the "fill the grid with numbers" problem and replace it with a "find 9 Paths that don't overlap" problem. And that brings us to coding.

Feasibility of Paths
I did stop to think, before diving straight into coding, about whether it would be feasible to solve path-based problems. I had an idea that my algorithm would probably involve looking at all possible different paths, so I did a quick calculation to see how many there were. I figured that any given Path would need one spot in each row, and those spots would need to be arranged so they each fell into a different column. The actual number is lower than this, because not all permuations are valid, but there are $9!$ ways to permutate the "put each position into a different column" problem, which means we have an upper bound of 362,880 different Paths. Definitely a small enough number to store in memory or loop through, so we move on.

Representing States (the Glue Code)

Before I could get hacking on anything like an algorithm, I needed some glue code and a bit of a foundation to build with. I started with these two abstractions:

A Bitfield abstraction to make all my "does this intersect" maths faster, and give me a compact way of storing things in-memory.
A generate_paths() function which lists all possible Paths through a Sudoku, because I'm going to need it sooner or later surely.

The code for these two steps is here. In particular look at src/bitfield.rs and src/path.rs. Fun fact, I did the testing and the upper bound of 362,880 valid Paths from before was very conservative. The actual number is only 46,656.

And then after that I needed a way to build and represent a board. A completed Sudoku could be represented as 9 Paths, one for each digit, but an incomplete Sudoku I don't have enough information to do that. Instead, I decided to represent it as 9 Bitfields — the idea being that I could turn them into Paths (or lists of potential Paths) easily enough with some bitwise arithmetic.

You can check out the board representation here. The important thing (to me) is that I can write code like this:

// Find a list of candidates for the Path that the digit 3 follows through the board
let potential_three_paths = generate_paths()
    .filter(|&path| path.contains(board[Digit::_3]))
    .filter(|&path| (path & board[Digit::_1]).is_empty())
    .filter(|&path| (path & board[Digit::_2]).is_empty())
    .filter(|&path| (path & board[Digit::_4]).is_empty())
    .filter(|&path| (path & board[Digit::_5]).is_empty())
    // and so on
    .filter(|&path| (path & board[Digit::_9]).is_empty())
    .collect::<Vec<_>>();

Or this:

// We found a new cell to fill in
board[Digit::_3] |= Bitfield::new(5, 5);

And I'm hopeful that those two operations are all I'll really need to write a Sudoku solver. Oh, I also have helper code to parse and pretty-print boards, which is pretty important. So let's jump right in and start solving!

Some Experimentation with Real Sudokus

We're at the point where I'm gonna need to start looking at concrete Sudokus now, and so I've put together a small list of boards to test with. Here they are:

I've got a Sudoku app on my phone. It's called Sudoku Master Edition and it's available here. Most of the puzzles I've started testing with came straight from the generator in this app. This one comes from the "Easy" difficulty.

+-----+-+-----+-+-----+
|2   3| |    5| |8   1|
|  4 6| |     | |  3 2|
|1 5  | |    4| |7    |
+-----+-+-----+-+-----+
|    1| |8 4 7| |9 5  |
|    5| |  1  | |  2  |
|4    | |  2 3| |     |
+-----+-+-----+-+-----+
|     | |     | |1   5|
|     | |  6 9| |     |
|9 8  | |4 5  | |  7  |
+-----+-+-----+-+-----+

Moderate Sudoku

Same app, next difficulty up: "Moderate". This is the difficulty that I spent most of my hours grinding away Sudokus at when I was first getting into the game.

+-----+-+-----+-+-----+
|5    | |8   7| |2    |
|     | |9 2  | |6    |
|7 4 2| |     | |  8  |
+-----+-+-----+-+-----+
|     | |  9 5| |    6|
|9 3 5| |7 6  | |    2|
|1 7  | |4   2| |    8|
+-----+-+-----+-+-----+
|     | |     | |     |
|2 9  | |     | |4    |
|3   1| |    6| |     |
+-----+-+-----+-+-----+

Master Sudoku

This is the highest difficulty offered by the app. I skipped a few difficulties in the middle here, as I'm trying to keep the dataset down to just a few Sudokus so I can look at results by hand.

+-----+-+-----+-+-----+
|    6| |2    | |    9|
|  4  | |6 7  | |2   3|
|     | |    9| |    4|
+-----+-+-----+-+-----+
|4    | |     | |  2  |
|    5| |  8  | |1    |
|  7  | |     | |    6|
+-----+-+-----+-+-----+
|2    | |9    | |     |
|6   1| |  3 4| |  7  |
|5    | |    6| |4    |
+-----+-+-----+-+-----+

Now that I've got these puzzles, I wanted to get a feel for how helpful the Path approach actually is. Using all the glue code from before, I put together this little script:

fn analyse_board(board: &Board) {
    println!("{}", board);
    let total_clues = Digit::iter()
        .map(|digit| board[digit])
        .fold(Bitfield::default(), BitOr::bitor);

    for digit in Digit::iter() {
        let our_clues = board[digit];
        let opposing_clues = total_clues & !our_clues;

        let possible_paths = sudoku::generate_paths()
            .filter(|&path| path.contains(our_clues))
            .filter(|&path| (path & opposing_clues).is_empty())
            .count();

        println!("There are {} possible Paths for {}", possible_paths, digit)
    }
}

And so let's check out how many possible Paths there are for each digit across the sample Sudokus.

Easy Sudoku:

1 possible Paths for 1
3 possible Paths for 2
18 possible Paths for 3
2 possible Paths for 4
1 possible Paths for 5
12 possible Paths for 6
15 possible Paths for 7
5 possible Paths for 8
1 possible Paths for 9

Moderate Sudoku:

55 possible Paths for 1
2 possible Paths for 2
55 possible Paths for 3
2 possible Paths for 4
20 possible Paths for 5
1 possible Paths for 6
11 possible Paths for 7
12 possible Paths for 8
3 possible Paths for 9

Master Sudoku:

42 possible Paths for 1
2 possible Paths for 2
56 possible Paths for 3
2 possible Paths for 4
42 possible Paths for 5
2 possible Paths for 6
8 possible Paths for 7
154 possible Paths for 8
8 possible Paths for 9

All in all, this is awesome! The worst scenario is 154 possible paths for 8, in the master difficulty Sudoku — but looping over all 154 times… and looping over the total number of Paths for all the other digits… is still only… okay it's around seven billion iterations. Still, that means that the master Sudoku is gonna get solved in a handful of seconds, which is way faster than I can solve it by hand.

Just to confirm that this hypothesis that I can loop through possible path combinations to solve a Sudoku, I wrote up the following little solver:

fn solver_helper(mask: Bitfield, paths: &[Vec<Bitfield>]) -> Option<Vec<Bitfield>> {
    match paths {
        [] => panic!(),
        [one] => {
            let &choice = one.iter().find(|&&path| (mask & path).is_empty())?;
            Some(vec![choice])
        },
        [head, tail @ ..] => {
            head.iter().filter(|&&path| (mask & path).is_empty()).find_map(|&path| {
                let mut tail_solution = solver_helper(mask | path, tail)?;
                tail_solution.insert(0, path);
                Some(tail_solution)
            })
        }
    }
}

let solution = solver_helper(Bitfield::default(), &possible_paths_per_digit).unwrap();
for (digit, bits) in Digit::iter().zip(solution) {
    board[digit] = bits;
}

And then I ran it. Good news and bad news: It did work! But it took six to seven seconds for all three Sudokus.

Hey, what if you tried cargo run --release instead of cargo run?

We are now down to 260ms, 291ms, and 288ms per Sudoku². This has been yet another reminder that LLVM is magical, and that I should always run my code through its optimisers before I do anything.

Issues with the compiler aside, though, we've now got a proof of concept that Paths are viable as an option. Now it's time to go back to the drawing board and see if there's a way we can speed this up further.

Easy Wins

I was halfway through writing the section after this one, on the algorithmic improvements I figured would be the next step in making the solver fast, when I realised something: I forgot to actually profile my code. Tragically, clicking the "Profile" button in my IDE didn't help much — I am being too fancy with lazy iterators for the profiler to know where I'm actually spending time beyond "inside the iterator.next function" or "inside the Vec::from_iter function". However, that's not going to stop me, because I have cheap hacks on my side!

let start = Instant::now();

// run part of the code

let first_part_time = start.elapsed();

// run next part of the code

let second_part_time = start.elapsed() - first_part_time;
println!("First part took {:?}", first_part_time);
println!("Second part took {:?}", second_part_time);

And here's what I learned:

Easy Sudoku:

Finding total clues: 76ns
Finding possible paths: 279.055611ms
Solving: 20.696µs

Moderate Sudoku:

Finding total clues: 74ns
Finding possible paths: 307.142538ms
Solving: 86.654µs

Master Sudoku:

Finding total clues: 76ns
Finding possible paths: 317.606837ms
Solving: 85.373µs

I had really thought that the actual solving would take the bulk of the program time. Apparently that was wrong, and I guess that just goes to show why you should never trust your hunches about performance in code.

Alright so let's look at the "finding possible paths" code — and apply the traditional (and completely trustworthy) method of "staring at it until I think I can guess what's taking so long".

let possible_paths = Digit::iter()
    .map(|digit| {
        let &our_clues = &board[digit];
        let opposing_clues = total_clues & !our_clues;

        sudoku::generate_paths()
            .filter(|&path| path.contains(our_clues))
            .filter(|&path| (path & opposing_clues).is_empty())
            .collect::<Vec<_>>()
    })
    .collect::<Vec<_>>();

So my first thought is "why am I calling generate_paths() every single time?" Let's calculate that once at the start — cache it — and go again. It's worth noting here that I am still just guessing, which isn't a great way to go about optimisations, but at least I'm guessing and measuring. It's a step up. Anyway, here's our results:

Generation of all Paths: 32.999076ms

# Possible path times
Easy Sudoku: 279ms -> 656µs
Moderate Sudoku: 307ms -> 669µs
Master Sudoku: 317ms -> 712µs

On the bright side, I've cut down the slowest part of my solving process by a factor of approximately 450! On the confusing side, generating the path list on its own is ten times faster than generating it and immediately filtering it down. This is gonna have to be one of those "out of scope" things, I think. And now I think we've hit the "Neil, this blog post is too long" point, so let's wrap up where the solver is at.

State of the Solver

So the code is here. I've built it into a little command-line tool, that takes the Sudoku pattern in text representation as a command line argument and prints it out on screen once it's solved. The whole thing looks like this:

λ sudoku .......49.....38..7.6.2.1.....3...6.6..784..5.9...1.....2.5.4.8..84.....37.......
+-----+-+-----+-+-----+
|8 2 3| |1 7 5| |6 4 9|
|5 1 9| |6 4 3| |8 7 2|
|7 4 6| |8 2 9| |1 5 3|
+-----+-+-----+-+-----+
|4 8 5| |3 9 2| |7 6 1|
|6 3 1| |7 8 4| |9 2 5|
|2 9 7| |5 6 1| |3 8 4|
+-----+-+-----+-+-----+
|1 6 2| |9 5 7| |4 3 8|
|9 5 8| |4 3 6| |2 1 7|
|3 7 4| |2 1 8| |5 9 6|
+-----+-+-----+-+-----+
Solution took 845.83µs

Based on the limited testing I've done, most of the time taken by the solver is spent in one of two places:

Generating a list of Paths that fit the initial set of clues; and
Testing all selections of those Paths to see which set we can use to fill up the board.

Of those two, I'm very surprised to report that the first step takes much longer than the second. I'll need to look into that… And that brings me to my next point: future posts! I've never written a series of blog posts before, so I don't know what this is going to look like. However, here are some things I'd like to investigate in future. If I get the chance, I'll update the list below with some hyperlinks as we progress.

How does this Sudoku solver perform against larger (and harder) sets of puzzles? Was I just really lucky to find some testing Sudokus that work really well with my algorithm?
Can I index the Path database in a way that makes searching it for Paths that match the clues faster?
Can I do better than a brute-force search through the possible paths for the final part of my algorithm? Do I need to, to speed things up? And how come it isn't prohibitively slow at the moment?
How does this algorithm compare to mainstream solvers? From a performance perspective and a simplicity perspective.

⁰Yeah so this didn't happen. Maybe in a future post?return

¹This isn't technically what $NP$ means. There are differences between a computer with infinite CPU cores and a nondeterministic Turing Machine (which is what the actual $NP$ definition references), and what each of them can do in polynomial time. However, those differences are too subtle to be in scope for this post.return

²This is actually faster than I'd expected it would be, especially for the Master Sudoku. I think I underestimated just how many combinations of Paths would be cancelled out before I got too far into recursing.return