Faster and slightly less stupid.

Original Another toy genetic algorithm. Faster and slightly less stupid. Editable
version 2 of 2

Low Complexity Artwork appeared on HN last month. It struck me as being particularly clever and elegant, with a very iconic form. In the paper, the author mentions how hard it was to produce the drawings. Part of me thought "Fractal-ish forms are hard for people to manipulate, but are a good fit for genetic algorithms." But no compact representation immediately came to mind, so it got dropped. A few days ago it showed up on Proggit, and this time my subconscious had an answer prepared. Pulled an all nighter whipping it up, code is at the bottom.

I can't summarize the prior art and do it justice, so please read In a nut shell, pictures are made from arc segments and flood fills. The trick is making an GA friendly representation, then checking the results with a Haar transform.

Generally, "GA friendliness" requires one thing: error proof transposability. You should be able to shuffle and mix any number of forms to produce a new form that is guaranteed to be valid. Historically, Lisp with its tree-like code has been used for this. Tree-syntaxes have one major problem; they are a poor fit for modeling a process over time. A "process over time" is just a list of sequential instructions. As a hobby Lisper, it feels wrong saying that Lisp is not the best option for a processing a list, but there is something better: concatenative languages.

A concatenative language feels like a lispy Forth. Like Forth, they are stack based. Each instruction pops arguments and pushes results. Unlike Forth, there are no "special words", no words that look ahead in the stream, no confusion between compile and run time. Unlike Forth, there is a list primitive. (One could argue the list primitive, being circumfix brackets, is a special word pair.) Like Lisp, data is code and manipulating data is just like manipulating lists. My favorite concatenative language is Joy but Factor is considered more useful.

Low Complexity Art needs a concatenative representation, where any string is a valid piece of code. The core of LoCoArt involves walking a fractal pattern of circles. Every circle has six neighbors of identical radius sitting on its perimeter. A seventh circle with twice the radius is the "parent" and an eighth circle with half radius is the "child". Thus, a series of numbers (0 through 7) describes a walk through an arbitrary series of circles. We are only interested in a few of the circles, so we will mark the circles of interest with a marker, or the character "M". The picture is made from arcs, which are built from three circles. When drawing an arc, it will look for the last three marks. Three circles can describe two arcs, so we need two commands for the "T"op and "B"ottom arcs. Two extra commands ("+" and "-") adjust the weight/thickness of the following arcs. Two more commands make the next flood fill "L"ighter or "D"arker. The final command "F"loods the area under the current circle. For those keeping track, this is exactly 16 commands. Technically, it is not a "language" (no programability), but rather a concatenative grammar that efficiently represents the fractal image space.

This language is not quite perfect. Immediately drawing an arc before any marks are placed, creates an invalid program. Prefixing every program with three marks (which will all mark the unit circle) prevents this bug. I have not yet decided how to deal with other glitches such as three circles that do not form an arc. Either ignore it (yuck) or draw a 360° arc (yuckier). SVG would make a lot of sense as an output format. But I have no idea how to do a flood fill in SVG without manually laying out a polygon. The other option is to use bitmap drawing at a finite resolution.


The left illustrates the normal case. Legal arcs neither pass through circles, nor intersections. Both of these rules fail to produce unambiguous arcs on the second set of circles.

Second prototype indeed renders bitmaps, and takes several other shortcuts. PIL had a (very slow) floodfill(), so it won by default. PIL's arc() is a little annoying: it works in degrees instead of radians, 0° starts at 3 o'clock. And there is no way to draw thick arcs, so I rolled my own. Finally, I got really lazy with the arc calculations. From two pairs of two points, you can draw eight arcs. Four combinations, and each can be clockwise or counterclockwise. Later I will add code for picking the correct two arcs, but for now there are eight different arc commands. It also does not help that Schmidhuber missed an edge case.

Now, for a slightly different topic, I am going to nerd rage on a common misconception with evolving images. Similarity between two pictures, essential to grading progress. How would you do it? If you said "Compute the difference of the red/green/blue pixels between each image and sum the squares of the differences", you fail. Pixels (especially RGB ones) are a very inefficient representation of visual data. Any such inefficiency hurts the GA. You want the scoring process to be blunt, not caught up in minutia. You want a picture that "looks similar" to the target. Using an RGB scoring algorithm will evolve towards a similar binary file, which coincidentally might look similar. A different color space (one which matches the human eye) does much better. For example, the human eye is more sensitive to shades of green than shades of red. You can discard half the red data without visibly effecting the result.
YCbCr is fairly standard color space for this kind of work. Note an entire channel is devoted to the luminosity. A good chuck of our retina is devoted just to luminosity. YCbCr eliminates invisible redundancy, which would otherwise waste the GAs time. For this project, choice of color space is entirely academic, as the image will be grey scale. More important is how the picture is encoded.

I will reiterate: working with a grid of pixels is really really dumb. Relatively small tweaks, such as sliding the image to the side, produce huge amounts of error. By our standards, the displaced image is very close to the original. To a pixel-difference-er, they are barely the same image. This makes life hard for a GA. The correct choice is to use a representation that does not get caught up in little localized details. Some people use a 2D Fourier Transform, but I prefer the 2D Haar Transform. It works very well for matching similar images. For a practical application of this, check out the Imgseek photo manager.

So, when you start evolving images, use YCbCr for the color space and a Haar Transform to gauge similarity. Convert both images to YCbCr, convert both to Haar. Now take the squared difference of these two mangled images. You will get a score that actually means something.

And that is what I've got now. The code is tiny, just 240 lines. It loads a 512x512 image and crunches away. Porting the original algorithm from python-that-looked-like-C to Numpy produced an 8x speedup. The rest of the code still needs huge amounts of polishing.

The process of connecting multiple arcs into a closed polygon requires a large leap in complexity that my current GA system can not produce. Trying to evolve towards an image is pretty futile with this scheme. Several thousand generations managed to create a pair of colored circles that did not look much like Lena. The code is now set up to favor detailed complexity, without any specific target. Measuring the mean average deviation of the Haar of the image gives a good estimate of how visually interesting/noisy we would find it. Might use this as the basis for a vote driven GA.

If you have more CPU cycles than patience, give it a whirl: locoart.tar.gz