A Su Doku Solver in C

Author: Bill DuPree
Name: sudoku_solver.c
Version: 1.20 (2008-08-17)
Language: C
Inception: Feb. 25, 2006
License: GPL
5   6     7     4
        6   7    
  7     3 5     2
1   3            
  8 5   9   3 1  
            5   6
4     2 1     7  
    1   4        
8     6     2   1
Difficulty: starhalf star (score: 108) (Solution)
Fig. 1: Sample Sudoku Game

Introduction:

This is a console-based Linux program, written in C language, that solves Su Doku puzzles (aka Sudoku, Number Place, etc., see figure 1) using deductive logic. It will only resort to trial-and-error and backtracking approaches upon exhausting its deductive moves.

Puzzles must be of the standard 9x9 variety using the (ASCII) characters 1 through 9 for the puzzle symbols. Puzzles should be submitted as 81 character strings which, when read left-to-right will fill a 9x9 Sudoku grid from left-to-right and top-to-bottom. In the puzzle specification, the characters 1 - 9 represent the puzzle "givens" or clues. Any other non-blank character represents an unsolved cell.

The puzzle solving algorithm is "home grown." I did not borrow any of the usual techniques from the literature, e.g. Donald Knuth’s "Dancing Links." Instead I "rolled my own" from scratch. As such, its performance can only be blamed on yours truly. Still, I feel it is quite fast. On a 333 MHz Pentium II Linux box it solves typical medium force puzzles in approximately 1100 microseconds or about 900 puzzles per second, give or take. On an Athlon 64 4000+ (2.4 GHz San Diego core) it solves about 7,100 puzzles per sec.

Description of Algorithm:

The puzzle algorithm initially assumes every unsolved cell can assume every possible value. It then uses the placement of the givens to refine the choices available to each cell. This is the markup phase.

After markup completes, the algorithm then looks for singleton cells with values that, due to constraints imposed by the row, column, or 3x3 box, may only assume one possible value. Once these cells are assigned values, the algorithm returns to the markup phase to apply these changes to the remaining candidate solutions. The markup/singleton phases alternate until either no more changes occur, or the puzzle is solved. I call the markup/singleton elimination loop the "Simple Solver" because in a majority of cases it solves the puzzle.

If the simple solver portion of the algorithm doesn’t produce a solution, then more advanced deductive rules are applied. I’ve implemented two additional rules as part of the deductive puzzle solver. The first is "naked/hidden" subset elimination wherein a row/column/box is scanned for X number of cells with X number of matching candidate solutions. If such subsets are found in the row, column, or box, then the candidates values from the subset may be eliminated from all other unsolved cells within the row, column, or box, respectively.

The second advanced deductive rule scans boxes looking for candidate values that exclusively align themselves along rows or columns within the boxes, aka chutes. If candidate values are found aligning within a set of N chutes aligned within N boxes, then those candidates may be eliminated from aligned chutes in boxes outside of the set of N boxes.

Note that each of the advanced deductive rules calls all preceeding rules, in order, if that advanced rule has effected a change in puzzle markup.

Finally, if no solution is found after iteratively applying all deductive rules, then we begin trial-and-error using recursion for backtracking. A working copy is created from our puzzle, and using this copy the first cell with the smallest number of candidate solutions is chosen. One of the solutions values is assigned to that cell, and the solver algorithm is called using this working copy as its starting point. Eventually, either a solution, or an impasse is reached.

If we reach an impasse, the recursion unwinds and the next trial solution is attempted. If a solution is found (at any point) the values for the solution are added to a list. Again, so long as we are examining all possibilities, the recursion unwinds so that the next trial may be attempted. It is in this manner that we enumerate puzzles with multiple solutions.

Note that it is certainly possible to add to the list of applied deductive rules. The techniques known as "X-Wing" and "Swordfish" come to mind. On the other hand, adding these additional rules will, in all likelihood, slow the solver down by adding to the computational burden while producing very few results.

Program Invocation:

This program is a console (or command line) based utility and has the following usage:

$ sudoku_solver -?
sudoku_solver version 1.20
Usage:
        sudoku_solver {-p puzzle | -f <puzzle_file>} [-o <outfile>]
                [-r <reject_file>] [-1][-a][-c][-d][-G][-g][-m][-n][-s]
where:
        -1      Search for first solution, otherwise all solutions are returned
        -a      Requests that the answer (solution) be printed
        -c      Print a count of solutions for each puzzle
        -d      Print the recursive trial depth required to solve the puzzle
        -e      Print a step-by-step explanation of the solution(s)
        -f      Takes an argument which specifes an input file
                containing one or more unsolved puzzles (default: stdin)
        -G      Print the puzzle solution(s) in a 9x9 grid format
        -g      Print the number of given clues
        -m      Print an octal mask for the puzzle givens
        -n      Number each result
        -o      Specifies an output file for the solutions (default: stdout)
        -p      Takes an argument giving a single inline puzzle to be solved
        -r      Specifies an output file for unsolvable puzzles
                (default: stderr)
        -s      Print the puzzle's score or difficulty rating
        -?      Print usage information

The return code is zero if all puzzles had unique solutions,
(or have one or more solutions when -1 is specified) and non-zero
when no unique solution exists.

A sample invocation is shown below to illustrate command line usage. The invocation uses the -G and -p options. The -p option allows the entry of the puzzle data on the console command line as an inline argument. The -G option formats the resulting answer in the familiar 9x9 layout. Thus, to solve the puzzle:

                 
8     3   5     2
    6       9    
  4   5   6   8  
7   1       4   9
      9   1      
9 7     6     3 5
    3       1    
    4   2   7    

One would enter the command:

$ sudoku_solver -Gp .........8..3.5..2..6...9...4.5.6.8.7.1...4.9...9.1...97..6..35..3...1....4.2.7..
sudoku_solver version 1.20

+---+---+---+
|425|697|318|
|897|315|642|
|136|482|957|
+---+---+---+
|349|576|281|
|751|238|469|
|268|941|573|
+---+---+---+
|972|164|835|
|683|759|124|
|514|823|796|
+---+---+---+

Puzzles: 1, Solved: 1, Unsolved: 0, Bogus: 0

Puzzle Scoring:

A word about puzzle scoring, i.e. rating a puzzle’s difficulty, is in order. Rating Sudoku puzzles is a rather subjective thing, and thus it is difficult to really develop an objective puzzle rating system. I, however, have attempted this feat (several times with varying degrees of success ;-) and I think the heuristics I’ve developed closely track the relative difficulty of solving a puzzle.

The following is a brief rundown of how it works. The initial puzzle markup is a "free" operation, i.e. no points are scored for the first markup pass. I feel this is appropriate because a person solving a puzzle will always have to do their own eyeballing and scanning of the puzzle. Subsequent passes are scored at one point per candidate eliminated because these passes indicate that more deductive work is required. Secondly, the reward for solving a cell is set to one point, and as long as the solution does not require bifurcation, i.e. trial-and-error or recursion, then this level of reward remains unchanged. Also, puzzle "bottlenecks," wherein there are very few singletons (three or less) available after a round of deduction, are scored using the the formula:

(20 * unsolved_cells_before_pass) / (solved_cells * (solved_this_pass + 1)) + 1

where the minimum resulting value is constrained to be zero.

Other bottleneck penalties are assessed for the various deductive techniques applied during the solving process. They are computed as follows:

Tuple elimination bottlenecks occur if such an elimination occurred but there were fewer than four changes to the puzzle markup. These are scored as:

20 - 5 * number_of_markup_changes

Box-line alignment bottlenecks occur if this rule was successfully applied but there were fewer than three changes to markup. These are scored as:

15 - 5 * number_of_markup_changes

In addition, box-line rules assess a penalty of five points per application, and naked tuple eliminations use the following calculation for their penalty:

10 + 2 * (5 - abs(5 - number_of_members_in_tuple)

The reason for such a calculation is that naked tuples form a disjoint subsets with hidden tuples. Thus when the number of members in a naked subset is low, the difficulty is low. As the number increases so does the relative difficulty of finding the naked subset. However, for larger values of membership (five or better) it turns out that what we are really detecting are smaller and smaller hidden subsets, thus the penalty decreases.

If a trial-and-error approach is called for, then the "reward" is set to the recursive depth times ten. Trial solutions are also penalized by a weighting factor that is computed as follows:

unsolved_cells * trial_depth * trial_depth * alternatives_to_be_tried * 5

(I’ve seen a pathological puzzles from Gordon Royle’s site: Minimum Sudoku require 16 levels of recursion and score hundreds of thousands of points using this scoring system!)

And that brings me to this topic: What do all these points mean?

Well, who knows? This is subjective, and the weighting system I’ve chosen for point scoring is based upon my own heuristics. But based upon feedback from a number of individuals, a rough scale of difficulty plays out as follows:

Degree of DifficultyScore
Trivial Half-Star80 points or less
Easy Star81 - 125 points
Medium StarStar126 - 225 points
Hard StarStarStar226 - 350 points
Very Hard StarStarStarStar351 - 760 points
Diabolical StarStarStarStarStar761 and up

Experience shows that the Diabolical puzzles will likely require one or more levels of trial-and-error (bifurcation), so why waste your time? These are best left to masochists, savants and automated solvers. YMMV.

Change Log:

RevisionDateInitialDescription
1.002006-02-25WDInitial release
1.012006-03-13WDBug fix for return code. Added sign-on message.
1.102006-03-20WDAdded explain option and more speed optimizations
1.112006-03-25WDCleanups and speed optimizations
1.202008-07-19WDFix early recursion. Rewrite markup, subset and box-line interaction. Add bottleneck detection and other scoring enhancements. Allow linkage to sudoku_engine as an object module. (Thanks to Giuseppe Matarazzo for his suggestions.)

Download:

The following archives contain the complete C source code for the sudoku_solver, a small test suite, a Makefile, and some ancillary documentation. Aside from archive format, they are all identical in terms of their content. Download the one that best suits your needs.

Ver. 1.00 Compressed Tar (gzip): solver_1.00.tar.gz
Ver. 1.00 Compressed Tar (bzip2): solver_1.00.tar.bz2
Ver. 1.00 Zip File: solver_1.00.zip
 
Ver. 1.10 Compressed Tar (gzip): solver_1.10.tar.gz
Ver. 1.10 Compressed Tar (bzip2): solver_1.10.tar.bz2
Ver. 1.10 Zip File: solver_1.10.zip
 
Ver. 1.11 Compressed Tar (gzip): solver_1.11.tar.gz
Ver. 1.11 Compressed Tar (bzip2): solver_1.11.tar.bz2
Ver. 1.11 Zip File: solver_1.11.zip
 
Ver. 1.20 Compressed Tar (gzip): solver.tar.gz
Ver. 1.20 Compressed Tar (bzip2): solver.tar.bz2
Ver. 1.20 Zip File: solver.zip

The code was developed and tested using GCC 3.3.6 on Slackware Linux 10.2 running in an x86 environment. There should be few obstacles to porting this to other processor architectures or operating systems (although the ASCII character set is assumed.) Simple remakes are known to run on PPC-based MacOS X, Alpha-AXP Linux (Debian), and PA-RISC Linux on an old HP 9000 model 712/80 running Debian (although you’ll need to comment out or modify PROC_OPT in the Makefile for these other CPU types.) Another report indicates success on MS Windows using Cygwin tools to build the binary.

License:

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Contact:

Email: Bill DuPree (bdupree_AT_techfinesse_DOT_com)
Post: Bill DuPree, 609 Wenonah Ave, Oak Park, IL 60304 USA

Copyright © 2008, Bill DuPree. All rights reserved.