A Su Doku Solver in C

Author: Bill DuPree
Name: sudoku_solver.c
Version: 1.11 (2006-03-25)
Language: C
Inception: Feb. 25, 2006
License: GPL
    7     6 9   4
6       3   7    
      1          
      7       5 9
          3 1    
4     8 2     7  
  4     1     2  
  2 1     8 4 9  
  8 9 4         7
Difficulty: starstarstar (Solution)
Fig. 1: Sample Sudoku Game

Introduction:

This is a console-based Linux program, written in C language, that solves Su Doku puzzles (aka Sudoku, Number Place, etc., see figure 1) using deductive logic. It will only resort to trial-and-error and backtracking approaches upon exhausting its deductive moves.

Puzzles must be of the standard 9x9 variety using the (ASCII) characters 1 through 9 for the puzzle symbols. Puzzles should be submitted as 81 character strings which, when read left-to-right will fill a 9x9 Sudoku grid from left-to-right and top-to-bottom. In the puzzle specification, the characters 1 - 9 represent the puzzle givens or clues. Any other non-blank character represents an unsolved cell.

The puzzle solving algorithm is home grown. I did not borrow any of the usual techniques from the literature, e.g. Donald Knuth's "Dancing Links." Instead I rolled my own from scratch as a personal challenge. As such, its performance can only be blamed on yours truly. Still, I feel it is quite fast. On a 333 MHz Pentium II Linux box it solves typical medium force puzzles in approximately 620 microseconds or about 1,600 puzzles per second, give or take. On an Athlon XP 3000 it solves about 10,800 puzzles per sec. (Solving time is dependent upon degree of difficulty, so YMMV.)

Description of Algorithm:

The puzzle algorithm initially assumes every unsolved cell can assume every possible value. It then uses the placement of the givens to refine the choices available to each cell. I call this the markup phase.

After markup completes, the algorithm then looks for singleton cells with values that, due to constraints imposed by the row, column, or 3x3 region, may only assume one possible value. Once these cells are assigned values, the algorithm returns to the markup phase to apply these changes to the remaining candidate solutions. The markup/singleton phases alternate until either no more changes occur, or the puzzle is solved. I call the markup/singleton elimination loop the Simple Solver because in a large percentage of cases it solves the puzzle.

If the simple solver portion of the algorithm doesn't produce a solution, then more advanced deductive rules are applied. I've implemented two additional rules as part of the deductive puzzle solver. The first is subset elimination wherein a row/column/region is scanned for X number of cells with X number of matching candidate solutions. If such subsets (or tuples) are found in the row, column, or region, then the candidates values from the subset may be eliminated from all other unsolved cells within the row, column, or region, respectively.

The next deductive rule examines each region looking for candidate values that exclusively align themselves along a single row or column, i.e. a vector. If such candidate values are found, then they may be eliminated from the cells outside of the region that are part of the aligned row or column.

Note that each of the advanced deductive rules calls all preceeding rules, in order, if that advanced rule has effected a change in puzzle markup.

Finally, if no solution is found after iteratively applying all deductive rules, then we begin trial-and-error using recursion for backtracking. A working copy is created from our puzzle, and using this copy the first cell with the smallest number of candidate solutions is chosen. One of the solutions values is assigned to that cell, and the solver algorithm is called using this working copy as its starting point. Eventually, either a solution, or an impasse is reached.

If we reach an impasse, the recursion unwinds and the next trial solution is attempted. If a solution is found (at any point) the values for the solution are added to a list. Again, so long as we are examining all possibilities, the recursion unwinds so that the next trial may be attempted. It is in this manner that we enumerate puzzles with multiple solutions.

Note that it is certainly possible to add to the list of applied deductive rules. The techniques known as "X-Wing" and "Swordfish" come to mind. On the other hand, adding these additional rules will, in all likelihood, slow the solver down by adding to the computational burden while producing very few results. I've seen the law of diminishing returns even in some of the existing rules, e.g. in subset elimination I only look at two and three valued subsets because taking it any further than that degraded performance.

Program Invocation:

This program is a console (or command line) based utility and has the following usage:

Example shows command line options

A sample invocation is shown below to illustrate command line usage. The invocation uses the -G and -p options. The -p option allows the entry of the puzzle data on the console command line as an inline argument. The -G option formats the resulting answer in the familiar 9x9 layout. Thus, to solve the puzzle:

                 
8     3   5     2
    6       9    
  4   5   6   8  
7   1       4   9
      9   1      
9 7     6     3 5
    3       1    
    4   2   7    

One would enter the command:

Command line usage example

Puzzle Scoring:

A word about puzzle scoring, i.e. rating a puzzle's difficulty, is in order. Rating Sudoku puzzles is a rather subjective thing, and thus it is difficult to really develop an objective puzzle rating system. I, however, have attempted this feat (several times with varying degrees of success ;-) and I think the heuristics I'm currently applying aren't too bad for rating the relative difficulty of solving a puzzle.

The following is a brief rundown of how it works. The initial puzzle markup is a free operation, i.e. no points are scored for the first markup pass. I feel this is appropriate because a person solving a puzzle will always have to do their own eyeballing and scanning of the puzzle. Subsequent passes are scored at one point per candidate eliminated because these passes indicate that more deductive work is required. Secondly, the reward for solving a cell is set to one point, and as long as the solution only requires simple markup and elimination of singletons, this level of reward remains unchanged.

This reward changes, however, when advanced solving rules are required. Puzzles that remain unsolved after the first pass through the simple solver phase have a higher reward, i.e. it is incremented by two. Thus, if subset or vector elimination is required, all subsequently solved cells score higher bounties. In addition, the successful application of these deductive techniques score their own penalties.

Finally, if a trial-and-error approach is called for, then the reward is incremented by another five points. Thus, the total penalty for each level of recursion is an additional seven points per solved cell, i.e.

(recursive_depth * 7) + 1 points per solved cell.

Trial solutions are also penalized by a weighting factor that is based upon the number of unsolved cells that remain upon reentry to the solver and the depth of recursion. (I've seen a pathological puzzle from the Minimum Sudoku web site require 16 levels of recursion and score a whopping 228,642 points using this scoring system!)

And that brings me to this topic: What do all these points mean?

Well, who knows? This is still subjective, and the weighting system I've chosen for point scoring is is largely arbitrary. But based upon feedback from a number of individuals, a rough scale of difficulty plays out as follows:

Degree of DifficultyScore
Trivial Half-Star80 points or less
Easy Star81 - 150 points
Medium StarStar151 - 250 points
Hard StarStarStar251 - 350 points
Very Hard StarStarStarStar351 - 900 points
Diabolical StarStarStarStarStar901 and up

Experience shows that puzzles in the "Hard" category, in a very few cases, will require a small amount of trial-and-error, while the "Very Hard" puzzles will likely require trials and backtracking, and in some cases more than one level of that technique. As for the "Diabolical" puzzles--why waste your time? These are best left to masochists, savants and automated solvers.

Change Log:

RevisionDateInitialDescription
1.002006-02-25WDInitial release
1.012006-03-13WDBug fix for return code. Added sign-on message.
1.102006-03-20WDAdded explain option and more speed optimizations
1.112006-03-25WDCleanups and speed optimizations

Download:

The following archives contain the complete C source code for the sudoku_solver, a small test suite, a Makefile, and some ancillary documentation. Aside from archive format, they are all identical in terms of their content. Download the one that best suits your needs.

Ver. 1.00 Compressed Tar (gzip): solver_1.00.tar.gz
Ver. 1.00 Compressed Tar (bzip2): solver_1.00.tar.bz2
Ver. 1.00 Zip File: solver_1.00.zip
 
Ver. 1.10 Compressed Tar (gzip): solver_1.10.tar.gz
Ver. 1.10 Compressed Tar (bzip2): solver_1.10.tar.bz2
Ver. 1.10 Zip File: solver_1.10.zip
 
Ver. 1.11 Compressed Tar (gzip): solver.tar.gz
Ver. 1.11 Compressed Tar (bzip2): solver.tar.bz2
Ver. 1.11 Zip File: solver.zip

The code was developed and tested using GCC 3.3.6 on Slackware Linux 10.2 running in an x86 environment. There should be few obstacles to porting this to other processor architectures or operating systems (although the ASCII character set is assumed.) Simple remakes are known to run on PPC-based MacOS X, Alpha-AXP Linux (Debian), and PA-RISC Linux on an old HP 9000 model 712/80 running Debian (although you'll need to comment out or modify PROC_OPT in the Makefile for these other CPU types.) Another report indicates success on MS Windows using Cygwin tools to build the binary.

License:

This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Contact:

Email: Bill DuPree (bdupree_AT_techfinesse_DOT_com)
Post: Bill DuPree, 609 Wenonah Ave, Oak Park, IL 60304 USA

Copyright © 2006, Bill DuPree. All rights reserved.