Home Page

Input, state and their relationship to bugs

Wednesday, November 14th, 2012

Why are there so many programmers who don't know what state is, much less the impact it has on programming? Recently I was having a discussion online and some programmers kept misinterpreting everything being said by me and other programmers because they had no or little grasp of the concept of state. This is not the first time I've noticed that, for such an important concept, there are a surprising number of programmers who don't understand the concept and implications of state.

The meaning of state

From Wikipedia:

the state of a digital logic circuit or computer program is a technical term for all the stored information, at a given point in time, which the circuit or program has access to. The output of a digital circuit or computer program at any time is completely determined by its current inputs and its state.

Just to be clear, "all the stored information" includes the instruction pointer, CPU registers/caches and so forth. So if at any given point in time your program is in a while loop or conditional branch, that is also part of the state.

"The output of a digital circuit or computer program at any time is completely determined by its current inputs and its state."… Think about that for a moment, because it is important. Everything that happens from that moment on in a computer program is determined by the current inputs and its state. That means, as long as there is no other input, the output of the program is completely predictable at any time. That's a lot (or rather, exactly) like the pseudo-random number generator in most programming languages. If you know the seed, you can predict every random number that will be produced. Each number the pseudo-RNG returns is the result of its input and its current state.

If programs are that predictable, how come there are still so many bugs in them? Well, for one thing, programs are not that predictable at all.

On the Origin of Bugs

Look at the following example:

a = 5
b = 10
c = a + b

Tell me, how many bugs does the program have? If we assume that the syntax is correct and that each variable can hold an integer number, this program contains zero bugs. Now tell me, how many bugs does this program have?

a = int(file('in', 'r').readline())
b = 10;
c = a + b;

How many bugs? One? Three? Do you know? Because I sure don't. Here's a few I can come up with from the top of my head:

I could go on, but the point should be clear. We changed only one line in our program, and now there are who-knows how many bugs in our program. Why is that? Because in our first example, the state of the program could never be influenced from outside the program at any time, since there was no input! (Yes, the memory might get corrupt from a random flipped bit or the universe might decide to suspend the laws of physics, but again: let's not get pedantic).

When we write a piece of code, we write that code with certain assumptions. If I write:

while (i < 10) { /* ... */ };

I'm assuming variable 'i' contains some kind of number, which at the start of the loop is smaller than 10, and which during the loop will not suddenly change into, say, an open file pointer. Now if I'm not a total idiot, and I write my code properly, that last thing will never happen. Given a piece of code, it's generally not difficult to make correct assumptions about that code. But making assumptions about input into that code (in this case the contents of variable 'i'), regardless of where it came from (interactive input, the network, the filesystem, passed in as a argument into a function, whatever) is much more difficult and dangerous.

I'm going to make a sweeping generalization: Given that "the output of a digital circuit or computer program at any time is completely determined by its current inputs and its state", and given that a program's current state is determined by its starting state and all the input, and that we can thus disregard the "and its state" part of the first statement... I'm going to claim that

All bugs in a program are a result of the input into that program.

Of course, like I said, that statement is sweeping, and therefor false. But for now, assume it's true and think about the implication of it for a moment, while I correct my previous statement.

Predictable state

Of course, not all bugs in a program are a result of input into that program. Sometimes us programmers screw up, and make mistakes. This is another source of bugs in our programs. Sometimes we're tired, and we make mistakes in the syntax of our programs. Sometimes we don't pay attention, and a certain library works differently than we thought. Such bugs are easily solved. In fact, many can be automatically detected by compilers or asserts or whatever.

Another class of bugs is the one where we make assumptions about the current state of a part of our program, and write new code in accordance with those assumptions. Those assumptions could be correct, resulting in working code, or they may be incorrect, resulting in bugs.

A coworker of mine once had to write some code for an ATM. Since ATMs are reasonably sensitive, he wasn't allowed to dynamically allocate memory. That is, he could not use malloc and was only allowed to use what memory was statically allocated by the compiler. Not allowing dynamically allocated memory gets rid of entire range of problems, all of which are directly related to unpredictable state. Buffer overflows, out-of-bounds array referencing, bugs in string copying, reading input... all those problems are greatly mitigated by not having dynamically allocated memory.

You always want the state of your program to be predictable. Input makes the state unpredictable. As soon as we get input into our program, we can only hope that that input adheres to the assumptions the code makes. The current state may be valid (though unpredictable), however we cannot make any claims about future states. Only after we have sufficiently validated and sanitized any input in such a way that it adheres to the code's assumptions can we speak about a predictable state again.

This is why unclear code is such a problem. Unclear code makes it hard to reason about code, thus hard to determine whether code might lead to unpredictable states. Basically, we want to make the assumptions of code as transparent as possible.

I hope everything I've said so far makes absolutely basic sense. I hope that at this point in the article you're thinking "Well, duh", because from what I've seen of many programmers.. they don't have a clue about the relation between input, state and bugs.

"So what?"

It is important to grasp this basic concept of state because it helps you to reason about why and when certain programming techniques are a good or bad idea.

I could go on, but I hope you get the point. Please note that these point have to be considered in context. Don't take these as absolutes, because that would go entirely against what I'm trying to say. Goto can most certainly be used to create state that is more predictable and easier to reason about. In general however, it is considered detrimental to predictable state.

Back to input

Lets jump back to input for a moment. The age-old adage goes: "Validate your input!" What does that mean? Quick question.. Given this function:

def foo(a, b):
    c = a / b
    return c

How many bugs are in that function? We can immediately spot a potential division by zero bug and an overflow bug. There are others, but the point is you don't know! The input into this function is not validated. We're making assumptions about the calling code which may or may not be true. The key here is that the concept of "input" doesn't just apply to what you read from they keyboard, a file or the network. It applies to every line of code. Programmers who do not realize this are what I like to call Pavlovian Programmers... Sure, you can teach them a trick (goto is bad!), but they'll never understand the reasoning behind that trick, and they won't be able to apply it to new concept. This is why programmers are jumping on the Event-driven programming bandwagon, despite what an insanely bad idea it is. Basically event-driven programming (the kind that binds anonymous functions as callbacks to events. Yes, javascript et al) is a kind of goto, except for being harder to debug.

Ideally, every function and every constructor would validate its input arguments. (Contracts, anyone?) The stricter you validate input arguments, the less chance there is of winding up in an unpredictable state. Of course I say "ideally", because it is not practical to validate each and every bit of input. A decent compromise would be to validate the input of all constructors that are part of library code. Then again, 99% of your code should be library code. But that's a discussion for another time.


The conclusions of this article are simple:

I'm not trying to be funny here. The conclusions are the same as what you've already heard a thousand times. I just hope that you now view them in a bit of a different light. I believe we can accomplish the above and create much better code if everybody would just realize a few basic concepts:

I leave you with this quote, which I think I once read somewhere else, but cannot find again, so I will attribute to myself:

Programming is the messy art of constantly bringing a program back to a predictable state after receiving a tiny bit of input.