volatile != thread synchronization

Everybody knows that writing correct multithreaded code is hard, even when using proper synchronization primitives like mutexes, critical sections, and the likes. (Ab)using the volatile keyword for synchronization purposes makes a programmer’s life even harder – read on if you care to know why, and help spreading the word.

When working with large, somewhat aged codebases, I’ve more than once seen volatile variables used for synchronization across threads in one way or another, which was fine on single-processor, non-PowerPC-based architectures. However, running the same code on consoles (which are all PowerPC-based) will lead to hard-to-find race conditions because of how the underlying memory model works, and what the C++ standard actually has to say about volatile.

What volatile is for

The purpose of declaring variables as being volatile is to tell the compiler/optimizer, that the value of a variable might be changed from “outside”, e.g. by some hardware device. In fact, the only legal reasons I ever had for using the volatile keyword was when interfacing with certain hardware on console platforms.

As an example, considering the following piece of code:

int* ptr = (int*)0xABCD;
*ptr = 10;
*ptr = 20;
*ptr = 30;
ME_LOG0("Test", "%d", *ptr);

When compiling this code with optimizations turned on, the optimizer makes the reasonable assumption that writing different values to the same memory address three times in a row is unnecessary, and hence generates the following code:

00D41A62  push        1Eh
00D41A64  push        offset string "%d" (0D98F2Ch)
ME_LOG0("Test", "%d", *ptr);

The code simply pushes 30 on the stack, and calls one of Molecule’s logging functions. If you were to talk to a certain piece of hardware, which e.g. internally gathers writes into multiples of 32 byte (like the Write-Gather Pipeline on the Gamecube), the above would not work because the optimizer simply would strip away many of the writes to the hardware register (sitting at a certain address). That’s where the volatile keyword comes into play:

volatile int* ptr = (int*)0xABCD;
*ptr = 10;
*ptr = 20;
*ptr = 30;
ME_LOG0("Test", "%d", *ptr);

This code results in the following assembly being generated:

00F21A60  mov         eax,0ABCDh
00F21A66  mov         dword ptr [eax],0Ah
00F21A6D  mov         dword ptr [eax],14h
00F21A73  mov         dword ptr [eax],1Eh

As can be seen, by telling the compiler that a memory location (or variable) is volatile, it can no longer assume that its value isn’t changed by some outside effects. This guarantess that no reads/writes from/to variables declared as volatile will ever be optimized away, but nothing more. It says nothing about threads, atomicity of operations, order of memory operations, etc.

What volatile is not for

Because of the way volatile variables behave, they have often been used to synchronize certain operations among several threads, as in the following simplified example:

// somewhere in global scope
volatile bool g_updateFinished;

// thread 1: update data
g_updateFinished = false;
g_updateFinished = true;

// thread 2: render data
while (!g_updateFinished)
  // wait until other thread finishes

Thread 1 updates some global data (sprites, particles, etc.) and sets g_updateFinished = true to signal the other thread that updating has finished. Thread 2 busy-loops until g_updateFinished has been set, and carries on rendering the previously updated data. Note that g_updateFinished is declared as being volatile, otherwise the while-loop would not work in an optimized build, because the value of g_updateFinished would be stored in a register once when entering the loop, leading to an infinite loop.

Even though this might look right in C++ code, it will horribly fail on all current-generation consoles such as the Xbox360 and the PS3. It might fail after 5 minutes, it might fail after 20 hours -  the code comprises a very subtle race condition.

The reason for this is that the PowerPC architecture exhibits a so-called weakly-consistent memory model, which basically means that even if the code (and generated assembly!) makes writes to memory in the order 1, 2 and 3, the order of writes seen by other threads/processors might be 1, 3, and 2, or any other order you can think of. This means that the above code might access the data to be rendered while the other thread is still updating it.

To make matters more complicated, the above might work on other multi-processor architectures, because some compilers (e.g. MSVC) promise more than what the standard asks for, introducing memory barriers and/or compiler barriers when encountering the volatile keyword, giving a false sense of security.

The bottomline is: don’t ever use the volatile keyword for synchronizing access to shared data across several threads! It might have worked in the past, there’s published papers and algorithms out there using it, it might appear to work – but it won’t, so please don’t do it. Use a proper synchronization primitive instead, and don’t even try to write your own lockless data structures/algorithms, unless you are really, really experienced in this field.

I can wholeheartedly recommend Bruce Dawson’s whitepaper on this subject for further reading.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s