Pointers ad nauseam

By wouter April 19, 2013 C++, Programming No Comments

Do you understand pointers? This blog post will be a bit of a rant. My apologies up front. I’m by no means an expert on pointers but it seems they are misunderstood by some. I’m talking about pointers in C++ (and I guess in C too). I’m not even talking about pointer arithmetic or smart pointers, just a normal pointer.

My beef is with an anti-idiom I have seen quite often:

Object* ptr = new Object(); // or whatever
...
if (ptr != NULL)
{
  delete ptr;
  ptr = NULL;
}

Now, I might be mistaken but as far as I can tell that code after the ‘…’ is completely and utterly useless. Well, the delete statement is ok :). Long story short: the if only checks for NULLs but NULLs are actually safe to delete. The cases where a delete leads to an error are not caught by that check. Setting the pointer to NULL afterwards is arguably not that bad, but in my experience you should not reuse that pointer later on, making that line redundant as well. If you do see code that needs this assignment then consider refactoring the pointer to a smaller scope.

The four states a pointer can have

People sometimes look at me shocked when I proclaim that pointers can have four different states. That’s probably because I made it up. Still, in my head they can have four states. The above-mentioned idiom seems to imply pointers are either good or bad. It’s not that black-and-white. For me, a pointer can:

Point to a proper object :).
Point to 0 (or NULL, or better nullptr).
Point to an invalid address outside your program’s address space – ex. uninitialised pointer.
Point to an invalid address inside your program’s address space – ex. dangling pointer.

The good

The first pointer is just fine. The object it’s pointing to is happily living in the Free-Store (the heap, whatever). The second pointer is pointing to 0. You can call it 0 or NULL, it’s basically the same. It’s not a bad pointer. You just cannot use it. Its semantics are well defined though. Surprisingly you can delete it! It’s just a no-op. These first two are basically good pointers. When used in a boolean context (an if for instance) the first will evaluate to true while the second will evaluate to false. Handy.

Here is an image showing a good pointer. I apologise for the ugly arrows, I’m an engineer, not a graphics artist. A pointer is just a number. A number which is interpreted as a memory address, i.e., what it points to. In this case it’s showing a possible situation for this pointer:
char* ptr = "BlueSun";

So the pointer ptr has the value 0x00001208 and at that location in memory we find some zero-terminated string. Here is a picture of a 0/NULL pointer. In some (all?) standard libraries NULL is just defined as ((void*)0).

The bad

The second two pointers are bad. You cannot use (i.e., dereference) them. Bad things will happen. The behaviour of the bad things will be different though. An uninitialised pointer usually points to a random address. If you’re lucky it will point outside of your program’s (process’) address space. The green zone denotes the process’ address space.

The ugly

One of the most difficult bugs can occur with the following pointer though. It happens after deleting a pointer. Deleting a pointer just means to free up the memory it points to. It does not have to change the actual pointer value. In other words, the pointer still points to the same region in memory. Freeing memory just means marking it as available. So in the worst case the pointer still points to the same region of memory, and that memory might still contain exactly what it had before being deleted. In this case dereferencing this pointer might lead to perfectly understandable results even while it is an error to do so! The region of memory that is freed still belongs to the process’ address space, so no access violation will occur. Note that an unitialised pointer might also point inside the address space.

People usually like tables. Here is one with a summary.

	Good pointer?	Can dereference?	Can delete?	If misused?
Good pointer	Yes	Yes	Yes	Nope, it’s fine
Nullptr	Not bad	No	Yes	Fail-fast error, easily recognisable by `0x00000000`
Uninitialised pointer	Bad	No	No	Usually fail-fast error
Dangling pointer	Evil!	No	No, that would be a double delete	Horrible unexpected behaviour. Brew a new pot of coffee, you’re in for an all-nighter

Let’s see some code!

// --good pointer--
int* p = new int(4); // allocate some space in the FreeStore, give it the value 4
                     // and assign the location to p
cout << p << endl;   // this will print some address which is the location of that 
                     // space we just allocated. example 0x01AB23FA
cout << *p << endl;  // prints 4. The value that is found when dereferencing the pointer

delete p;            // Ok, mark that location as free again


// --null pointer--
int* p = 0;          // Create a NULL pointer. More about this in a minute.

cout << p << endl;   // this will again print the address to which p is pointing, i.e.
                     // 0x00000000
cout << *p << endl;  // Crash! What will you see? (1)

delete p;            // Ok, deleting 0 is ok! It is a no-op

How does (1) crash? Knowing errors mean can help you track down the bug. Unfortunately the exact error depends on compiler, operating system, runtime, and debug or release mode, etc. In this case it’s usually easy to spot. I’ve tested using Clang 4.1 on Mac OS X and Visual Studio express 2012 for windows desktop on windows 8.
null_release_out
Using Mac OSX I got Segmentation fault: 11. Not that insightful. Compiling and running this code using Visual Studio yields this familiar “FailSoftware.exe has stopped working” dialog… If you run the program in the debugger (i.e., from within the IDE) then you’ll see something like the dialog below. Even though it looks scary, you can actually read that it’s trying to read the memory at location 0x00000000. Makes sense right?
null_release_in

// --uninitialised pointer--
int* p;              // Create a pointer. Points to random location

cout << p << endl;   // this will again print the address to which p is pointing =
                     // something random
cout << *p << endl;  // Crash? What will you see? (2)

delete p;            // Crash! what will you see? (3)


// --dangling pointer--
int* p = new int(4); // allocate some space in the FreeStore, give it the value 4
                     // and assign the location to p
cout << p << endl;   // this will print some address which is the location of that 
                     // space we just allocated. Let's say 0x00001208
cout << *p << endl;  // prints 4. The value that is found when dereferencing the pointer

delete p;            // Ok, mark that location as free again

// here be dragons..
cout << p << endl;   // this will again print the address to which p is still pointing
                     // this will print 0x00001208 just like before
cout << *p << endl;  // Depending on what happened in the meantime this will print
                     // either '4' or some other number. Usually -2031851551. No of
                     // course not that exact number but something like it can and
                     // will happen.
delete p;            // Crash! What will you see? Double-delete (4)

Running the code for the uninitialised pointer using clang does not crash in (2). It prints a weird number. It crashes at (3) though with the message: malloc: *** error for object 0x7fff4fe89bc0: pointer being freed was not allocated. Obvious. Visual studio on the other hand shows the following when compiled in debug mode.
uninit_debug_out
And it shows this when compiled in release mode.
uninit_release_in
So in debug mode visual studio is trying to help by explicitly initialising uninitialised pointers to some magic value and can do runtime tests on that. This is one reason why code is slower in debug mode even when not running in the debugger. The value could be something like 0xCCCCCCCC. Look here on wikipedia for a list of these magic debug values. If you see them while debugging you now know they are not coincidental values.

The double delete has similar errors as the deletion of an uninitialised pointer. Visual Studio won’t warn you about using an already deleted pointer either. So you will be using incorrect data before even crashing on the double delete. It seems the Visual Studio debug runtime does try to give some more information than the usual access violation or segmentation fault as shown here:
double_debug_out

Wrapping up

I should point out that good compilers, with the appropriate warning levels set, will warn about using uninitialised variables. Unfortunately I’ve worked on more projects than I care to admit where people gave up checking all the warnings because there were way too many. There is no excuse for letting a codebase get into that shape though.

Also note that the following two lines are effectively the same and compile and run fine:

Object* ptr = 0;
Object* ptr = NULL;

But this does not compile:

Object* ptr = 1;

We see that 0 is handled as a special case by the compiler. Fortunately they’ve tried to clean things up in C++11 by introducing nullptr.

This has been a nice story but I must admit that I practically never use raw pointers in my code anymore. It turns out that smart pointers do not have these problems. By design! They are either valid or invalid. Only the good states one and two. What’s not to like? (Sure you can corrupt a smart pointer but you’d have to try on purpose.) I’m talking about owning pointers of course. Aliasing pointers are usually just fine, although for those I prefer to use references if possible.

So now everyone should understand why the code at the top of this post is not very useful. The if check is useless and redundant. One might argue about assigning NULL to pointers after deleting them. This would avoid the dangling pointer problem, right? In my experience you almost always can avoid it altogether by reducing the scope of the pointer, just like with scopes of normal variables. One of the few places I can imagine it is useful is in the implementation of a smart pointer (in std::shared_ptr::reset() for instance). Even in cases where NULL is used to indicate the absence of the pointer (hello Maybe Monad) I prefer to use something like boost::optional to be explicit about the use of the pointer. Just to be pedantic, check this tweet from one of the best programmers in the world :p.

I once had a colleague with (self-proclaimed) 10 years of experience who was confronted with using this anti-idiom. His reply was something like: “I’m going to keep it in, just to play it safe”. If you made it to the end of this post (wow, congratz :)) I hope you see that that reply is unacceptable. This idiom just creates more lines of code to maintain. It confuses beginners about what pointers really are. Basically, it just shows your colleagues that you clearly don’t understand how pointers work!

WTF?

Take a look at the following code and its output.

#include <iostream>

using namespace ::std;

int main()
{
  int* i = new int(3);
  cout << i << endl;
  cout << *i << endl;
  delete i;

  int* j = new int(5);
  delete i; // <-- ???
  cout << j << endl;
  cout << *j << endl;
  cout << *i << endl; <-- ???

  return 0;
}

Output: 0x7f8108c000e0 3 0x7f8108c000e0 5 5

Can you explain the output? Can you explain why this code seems to compile and run just fine? It seems to output what the author expected. Kind of. And it also does not have a memory leak. Still this code is bugged. It’s sad that a larger program that contains this kind of code might run without error for years…

Eviltwin games