Program that Writes Brainf**k

Using genetic programming, I wrote a program that can program better than I can. In Brainfuck, at least (note: Brainfuck is a programming language!).

Brainfuck is an incredibly confusing language that uses just 8 characters: +, -, >, <, ., ,, [, and ]. The language was made with this esoterism in mind, making it very difficult for humans to write programs in it. The natural question is: would a computer be similarly confused?

Traditionally, genetic programming has been applied strictly to languages that are hierarchical in nature and therefore conducive to crossover (a post explaining genetic programming is coming soon). For example, Lisp). But that’s been done a million times before. I wanted to see if GP could be applied to procedural languages. Would GP offer any benefit over randomly generated programs? And if so, would this benefit be significant enough to surpass my own ability to write Brainfuck?

The Plan

Genetic programming is a fancy, super cool, bio-inspired search algorithm. Ignoring all the analogies to real world evolution, it’s nothing more than a method of traversing some space of possible answers. For this problem, that answer space is the set of all possible Brainfuck programs. If we require that each program be $32$ characters long, then there are $8^{32}=7.9\times10^{28}$ potential answers because each character can be one of $8$ commands. Keep in mind that not all strings in this answer space are valid programs. Here are a few examples of $32$ character long programs that are:

Imagine a billion billion billion of those. Now imagine going through all billion billion billion of those and finding one that solves the problem at hand. Genetic programming doesn’t iterate through every possibility, but it still arrives at pretty good answers. It’s based on the idea that okay solutions are similar to good solutions which are similar to great solutions. GP relies on the fact that success builds up. This has been shown to be absolutely true for high level languages with clear, tree-like structures, but it isn’t immediately obvious that this property applies to Brainfuck.

Another method is to search randomly. This is superior to an organized, one after another approach because you don’t have to waste any time probing unsuccessful regions of the search space. As long as you keep track of the best program you’ve seen so far, you can take a comprehensive survey of the answer space with very little work. Keep in mind, search spaces are often tremendous, so this approach is unlikely to work well.

My hypothesis that genetic programming is better than chance (for Brainfuck) can be tested by comparing how long it takes each method to solve a certain problem. For simplicity’s sake, I chose the problem of symbolic regression, which is the process of finding an equation to model some data set. I tested how long it took randomly generated programs to model a simple set of points, and then I looked at how long it took genetic programming to converge on the solution to that same set of points.


This table shows various measures of how long it took each method to solve a simple symbolic regression problem. The genetic programmer was run with a maximum program size of $32$ characters and a population size of $50$. Each method underwent $50$ trials. As these are all times, smaller numbers are better.

MethodAvg (s)Min (s)Max (s)Std. Dev.

Both methods solved the problem pretty quickly some of the time, but GP maxed out at around $20$ seconds while random chance took as long as $3$ minutes. Comparing their averages, GP performed $5.85$ times as quickly. The following is a smoothed histogram* displaying the percentage of trials that took a certain amount of time to run.

Histogram data for GP and random chance plotted as a smooth line.

Genetic programming is the clear winner.
*Note: the data was smoothed heavily, so area isn’t proportional to number of samples like it should be.

Brainfuck Language

At this point, I knew that using GP to write Brainfuck wasn’t a waste of time, so I began competing with it to answer my second question. Before explaining the test cases I chose, I’ll describe Brainfuck’s language features. There’s no formal specification of the language, so not all implementations define each of the eight commands in the same way. I’ll describe my interpreter, which is pretty standard.

Brainfuck is characterized by its ultra-simple method of keeping track of information. Each program begins with a couple hundred cells all filled with the value $0$. Programs can only focus on one cell at a time, moving between them with the > and < characters, right and left. The + character increments the current cell by one, and the - character decrements it. Cell values are capped at $255$, looping around to $0$ whenever they overflow.

Brainfuck also has input and output, with the , and . characters, respectively. My , character reads bytes from an input stream (one byte per ,) and loads them into the current cell. My . character adds the current cell’s value to an accumulator, which starts at $0$ and is returned when the program terminates.

Lastly, there are the [ and ] characters, which work together to enable loops. The Brainfuck code they sandwich is run again and again until the cell that’s in focus at the end of the loop is $0$. The program:
increments the first cell to two and loops the following subroutine twice: add two to the cell to the right, move back left, and decrement. Then, it moves to the cell it was incrementing and outputs it. If this program were written in Javascript, it’d look something like:

var cell_0 = 0, cell_1 = 0;
while (cell_0 !== 0) {

A simpler version would be “$++++.$”. There’s always the risk of infinite loops, so I opted for the simple solution of setting a time limit to the execution of a Brainfuck program.


I knew the limitations of my Brainfuck implementation, so I didn’t choose very glamorous test cases. I tested the genetic programmer on constant generation (outputting a constant number like $86$ for instance), on adding a constant to an input, on summing two inputs, and on multiplying two inputs. I was able to write Brainfuck programs that did all these things pretty easily, and so was the genetic programmer. Then I tried division by two.

I gave myself a half hour to solve the problem, but I wasn’t able to come up with a Brainfuck program that did what I wanted! The best I was able to do was “$,[>+.-<–]$”, which works for even numbers only. The genetic programmer, however, was able to do it:
I still don’t understand how its division algorithm works. I’ve implemented the algorithm in Javascript and followed its execution, but I still lack a clear understanding of why it works.

Hmm, I thought. If only I could recognize even and odd numbers, then I could write programs for each of the two separate cases and combine them. So began the next challenge, finding the remainder of a number when divided by two. I wasn’t able to solve this problem either (only solved it for even numbers). But of course the genetic programmer was:
In eleven minutes, even (pun intended)! Something that amazes me about this specific program is that it purposely puts itself in an infinite loop for even numbers, because when I first evolved this program, my interpreter output $0$ if programs took more than a few milliseconds to run. $0$ just so happens to be the correct answer for even numbers, so it works! Evolution cares about one thing and one thing only: practicality. The genetic programmer found a niche, timing out to force a $0$ output, and it exploited it perfectly to solve the problem at hand.

Brainfuck Interpreter

Here’s an interpreter for my flavor of Brainfuck. It only supports integer outputs so you can’t work with characters, but you’d literally have to modify less than five lines of code if you wanted ASCII output. All you’d need to do is change the behavior of the . command. Fork this project on github.

The , character loads the leftmost unseen input into the current cell. Inputs must be integers in the range $[0,256)$.

Output: N/A

Try copy pasting the divide-by-two program, loading the dividend in the first input section. Then play around with the program’s source to see which characters are essential and which aren’t. The programs work, but by no means are they optimal.

Next Steps

I initially wrote the genetic algorithm in Java back in 2012, and I only recently ported it to JavaScript. I have a generic UI I use to visualize genetic algorithms/programs I write for the web, but I need to modify it slightly to provide meaningful information on the Brainfuck genetic programmer. It’s also significantly slower than my Java implementation, so it isn’t as able to solve interesting problems in reasonable amounts of time. In the meantime, I’m currently working on a general blog post on genetic algorithms/programming, so stay tuned for that.

Fork me on GitHub