Why you should never use the bool datatype.

...because packing.

In the C99 standard the datatype bool was added. When thinking about a bool people typically imagine something which can be either true or false, zero or one; a single bit.

In a modern computer you cannot actually address single bits individually. Everything is addressed in multiples of at least 8 bits (i.e. a byte). You could write a simple program to figure out the size of a bool.

#include "stdio.h"
#include "stdbool.h"

int main(int argc, char *arg[]){

 printf("bool size: %lu byte(s)\n", sizeof(bool) );
 return 0;

}

This will print:

bool size: 1 byte(s)

In a mordern computer the byte is the smallest unit which can be directly addressed thus your bool cannot be smaller than that. The story doesn't end there however. The computer is most efficient at addressing memory which lies on 32 bit boundaries (i.e. the address is a multiple of 4 bytes). Imagine using the following struct.

#include "stdio.h"
#include "stdbool.h"

struct foo{

 bool happy;
 int full_emotion_spectrum;
 bool depressed;

};

int main(){

 struct foo f;

 printf("foo takes up %lu byte(s)\n", sizeof(f));

}


This will print:

foo takes up 12 byte(s)

Struct foo is now 12 bytes as each variable within the struct is placed on 32 bit boundaries for more efficient access even though the datatypes would suggest less space has been used. This technique is known as packing. This waste can be mitigated by re-arranging the variables within the struct:

struct foo{

 bool happy;
 bool depressed;
 int full_emotion_spectrum;

};


The above struct will only take up 8 bytes as the two bools are placed into consecutive bytes and the int is placed on the next available 32 bit boundary. Although those things may vary with your OS/compiler/arch you can clearly see that using a bool is a total waste and you would use no less space if you had simply used an int. The real problem starts when you need many booleans.

Masking

If you need multiple booleans than it is much better to rely on a technique called masking.

#define SET_FULLSCREEN 1
#define SET_LOAD_LAST_GAME 2
#define SET_AUTOMATIC_SAVE_ON_EXIT 4
#define SET_AUTO_RESPAWN 8

...

unsigned int settings = 0;

//setting the fullscreen bit to true
settings = settings | SET_FULLSCREEN;

// setting the auto respawn bit to false
settings = settings & (~SET_AUTO_RESPAWN);


Notice that all the defines are set to a power of two. That means that when written in binary a single bit will be set to one. I use an int for the settings which allows me up to 32 different booleans. I choose the unsigned flavour of the int. This is not really necessary but it is better to be safe. I set a bit by using the binary or. This is represented by the vertical line. A binary or will perform the or operation on the bits individually. So the first bit of settings will be or'ed with the first bit of SET_FULLSCREEN, likewise for all the other bits. Both expressions above only affect one bit in settings.

Linus Torvalds, the creator of Linux, does not recommend the use of the bool datatype (https://lkml.org/lkml/2013/8/31/138). He probably knows how to code a little bit and thus it is not a bad idea to follow his lead. I never use bool and I wish it weren't in the C language. It misleads beginners into thinking they are using less space than they really are and it is just yet another opportunity to trip up.

Addedum

I originally posted a link on reddit to this article yesterday and I got quite a few comments; mostly negative. Some points were raised which I would like to address here.

1. Readability of code

Clearly written code is a whole topic onto itself but in my opinion the name of the variable is more important than the choice of the datatype. This is somewhat subjective but I consider the code snippet above to be very readable.

2. Lots of space on modern hardware

I don't buy into the notion that just because there is lots of space, one shouldn't worry about wasting it. Nowadays software typically relies on a lot of dependencies. Tiny inefficiencies each step of the way build up to significant differences.

Also, the space argument only applies in cases where you use multiple booleans. What I am suggesting is that when you have only one boolean, just use an int as this is how much memory will be used anyway.

3. "Never" using bool is too extreme/insane

Writing good code is in part about cultivating good habits. The bool datatype never gives you something more than you would have with an int. If you never use it, you don't have to think about when it would be okay to use it. It reduces cognitive load.

Let's take an example where you write code for a till in a shop. People can either pay with cash or card. You feel confident there are only two options so you use a bool (e.g. bool payment_option; ). Later it becomes possible to pay with gift cards. Now you need to go back and change the datatype. If you had used an int, you simply add another option.

4. Pre-mature optimization

I was quite surprised about this argument. My article is not about optimization at all (pre-mature or otherwise). Optimization means you first write the code one way and make sure it works fine. After that, you go back over it and think about how to do the job better; either in terms of speed or memory usage.

My suggestion was simply never to use bool and always use an int since it typically uses the same amount of memory anyway. I agree that one shouldn't optimize prematurely and focus on making the code work first.

5. I shouldn't assume an int is always 32 bits.

I agree with that one and it would have been better to use a uint32_t in my code snippet above.

Other things you might like:

Factual Information about Bitcoin
Basic concepts in hashing and password storage

  /u/gregg_ink
@Gregg_Ink

© 2024 Gregg Ink