Smarter, faster, bigger.

Original Initial, probably buggy, load of a literate python program. Smarter, faster, bigger. Editable
version 2 of 2

Nonograms are a puzzle where you are given an incomplete run-length-encoded description of a black and white image, and you must find the picture. A good overview can be found on the Wikipedia page.

Writing a basic nonogram solver is simple. It is about as hard as a sudoku solver. If you have not read it already, take a moment to see a most elegant solution:
Solving Every Sudoku Puzzle

I had written a sudoku solver before reading Norvig's writeup. Our solvers bore a superficial resemblance, largely because neither of us like to use object-oriented features if we can help it. Like his, mine was a constraints based solver but I had missed one important feature: try the most likely guesses first. Norvig's solver works on the row with the least possibilities first. Both solvers spend their time barking up the wrong trees, but Norvig started with the smallest forest. It is an important part of the algorithm to take into consideration.

It is easy to create huge nonograms. Compare the work required to make and solve a jig saw puzzle. A small puzzle is easier to solve than a big puzzle. However, a big jig saw is just as easy to make as a small puzzle. Nonograms (and prime factorizations) are like this. It is very easy to make something very difficult to solve.

The old version used a cute and simple twist on brute force. Finding multiple solution was fairly important, and brute force always will. Unlike naive brute force, it did not check every possible leaf. Instead it checked every node as it was added to the tree. Entire branches could quickly be pruned off at the first sign of total contradiction. A parent node would look at its children and prune off the contradictory children. Children could prune parents too. If nothing is possible, then it is the parent's fault and the hopeless parent is pruned.

This was slow but workable. It seemed good enough, until Jan Wolter entered it in his Paint By Numbers Survey. There this little script was thoroughly trounced but a consolation prize for being very very small.

So, here is the second take. This new one is smart enough to line-solve anything in Mario's Picross but not smart enough to deduct its way though all line-solvable puzzles.

The solver takes as input a pair of lists, one for rows and one for columns. So

    3 1 1 2
1   ? ? ? ?
1,1 ? ? ? ?
4   ? ? ? ?


1 1


This is saved into a file, named as a command line argument. The file is parsed into a dictionary. In order to make backtracking easier, the entire board is stored in a single dictionary. The dictionary holds several different sets of data, kept separate by their keys. Cells have a tuple key of (x,y). Rows have a negative integer for their key, columns are positive integers. Both are 1-indexed to avoid a collision between +0 and -0.

Rows and cols are exactly the same. They contain two lists, one list of clue numbers and one list of cells. The cells are dummy objects, so changing the Nth element of a row changes the underlying object and the change is automatically reflected when examining the Nth column. Each cell has three states: unknown, filled and empty.

It is a rats nest of cross references, but also the easiest way. After parsing, the above mini-puzzle looks like

{(1,1): Point(1,1), .... , (4,3): Point(4,3),
1: ([3], [Point(1,1), Point(1,2), Point(1,3)]),
.... ,
-2: ([1, 1], [Point(1,2), Point(2,2), Point(3,2), Point(4,2)]),
.... }

Solving works by applying a series of axioms to each line. (Known as line-solving.) These axioms are applied with the map2d() function, which also tracks if any progress was made. An overview of the rules:

  • blank() - If the clue is [0], mark every cell empty.
  • empty() - If there are no clues, mark every unknown cell as empty.
  • clean_points() - Empty cells on the end of a line can be removed.
  • center() - If a chunk is long enough to overlap itself, some points must be filled.
  • edge() - If a line starts with a filled point, then it continues for the length of the first clue.
  • edge_dot() - A special case of edge(), where the line starts with "unknown, filled" and the clue is 1.
  • unique() - Find and isolate the longest unique part of a clue.
  • cant_fit() - Chunks of unknowns too small to fit the clue must be empty.
  • cant_reach() - If the start of the last clue is known, points past its reach must be empty.

The term "must" is taken quite literally. Before writing to a point, an assert confirms it is writable. Failures here indicate an erroneous puzzle or a bad guess in the constraint solver. Once a clue number is filled out, the clue number is popped from the clue list and the filled out (solved) portion of the line's point list is removed. The solver works by picking away at the edges of each line.

When these rules get stuck, it makes a guess at the edge of one of the lines. I still have not learned Norvig's lesson. It picks the first line it sees, instead of finding the most probably place to guess. An improvement to make later.

Todo: At 285 lines, it is twice as long as the old one. Too long. And it can't look for multiple solutions. And it could be faster.