RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Sunday, April 16, 2006

I wrote about torn reads previously, in which, because loads from and stores to > 32-bit data types are not actually "atomic" on a 32-bit CPU, obscure magic values are seen in the program from time to time. This isn't as scary as "out of thin air" values, but can be troublesome nonetheless. I noted that, by using a lock, you can serialize access to the location to ensure safety.

You can of course write such thread-safe code that avoids taking a lock, usually motivated by performance. Vance has a pretty detailed write-up of this on MSDN. Most of the time, you shouldn't try to be so clever, as it will get you in trouble sooner or later, and is even worse to debug than a typical race. But for really hot code-paths, it can make a measurable difference. (Note the key word: measurable. If you've measured a problem, you might consider such techniques... but otherwise, stay far, far away. (Have I made enough qualifications and disclaimers yet?))

If you access individual pointer-sized byte segments of the data structure, such as 32-bit aligned segments (e.g. volatile or __declspec(align(x)) in VC++, all values on the CLR), you can load and store in a known order. Furthermore, you need to use the appropriate types of loads and stores with fences in the appropriate places; load/acquire and store/release are usually adequate. You can then use the intrinsic properties of this order to make statements about the correctness of your algorithm.

For example, imagine you have some code that increments a 64-bit counter on a 32-bit system. Aside from overflow, the value always increases. If you always increment the low 32-bits, followed by the high, and if you always read the high, followed by the low, you'll be guaranteed that, should you read a torn value, it will be too low rather than too high (not counting for overflow, of course). Sometimes it can be really low, such as when the low 32-bits wrap back to 0, in which case the higher 32-bit increment needs to carry one. Depending on your situation, this might be precisely what you are looking for. (I wrote some code last week that needed exactly this.)

For example, your typical code might read and write under a lock, in VC++/Win32:

ULONGLONG ReadCounter_Lock(
   
volatile ULONGLONG * pTarget, CRITICAL_SECTION * pCs)
{
    ULONGLONG val;

    EnterCriticalSection(pCs);
    val = *pTarget;
    LeaveCriticalSection(pCs);

    return val;
}

ULONGLONG IncrCounter_Lock(
    volatile ULONGLONG * pTarget, CRITICAL_SECTION * pCs)
{
    ULONGLONG val;

    EnterCriticalSection(pCs);
    val = *pTarget;
    *pTarget = val + 1;
    LeaveCriticalSection(pCs);

    return val;
}

But, using the load/store order described above, it can become lock free:

#define LO_LONG(x) (reinterpret_cast<volatile LONG *>((x)))
#define HI_LONG(x) (reinterpret_cast<volatile LONG *>((x)) + 1)

ULONGLONG ReadCounter_NoLock(volatile ULONGLONG * pTarget)
{
    ULONGLONG val;

#ifdef _Win64

    val = *pTarget;

#else

    // Read high 32-bits first, then low:
    *HI_LONG(&val) = *HI_LONG(pTarget);
    *LO_LONG(&val) = *LO_LONG(pTarget);

#endif

    return val;
}

ULONGLONG IncrCounter_NoLock(
    volatile ULONGLONG * pTarget)
{
    ULONGLONG oldVal;

#ifdef _Win64

    oldVal = static_cast<LONGLONG>(
        InterlockedIncrement64(static_cast<LONGLONG *>(pTarget)));

#else

    // Increment the low 32-bits first, then high:
    if ((*LO_LONG(&oldVal) =
        InterlockedIncrement(LO_LONG(pTarget))) == 0)
    {
        *HI_LONG(&oldVal) = InterlockedIncrement(HI_LONG(pTarget));
    }
    else
    {
        *HI_LONG(&oldVal) = *HI_LONG(pTarget);
    }

#endif

    return oldVal;
}

It's obvious which is simpler to code, understand, and maintain. But the latter technique can come in handy when you're in a pinch.

For information on other similar techniques, including multi-word CAS and object-based STM, Tim Harris's recent "Concurrent programming with locks" paper is an excellent read. Most of it isn't built and ready for you to use today, but the details of the algorithms are in there if you'd like to play around a little. And there's a lot of literature out there about creating lock-free data structures. Interestingly, you can end up worse off than if you'd used a lock in the first place. Many such lock free algorithms are optimistic meaning that they do a bunch of work hoping not to run into contention; when they do, they have to throw away work, rinse, and repeat. Your mileage can vary dramatically based on workload.

4/16/2006 6:32:52 PM (Pacific Daylight Time, UTC-07:00)  #   

 

Recent Entries:

Search:

Browse by Date:
<April 2006>
SunMonTueWedThuFriSat
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456

Browse by Category:

Notables: