RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Tuesday, October 17, 2006

The CLR's approach to monitor acquisition (i.e. Monitor.Enter and Monitor.Exit) during shutdown is very different from native CRITICAL_SECTIONs and mutexes (as described in my last post). In particular, the CLR does not ensure requests to acquire monitors on the shutdown path succeed, preferring instead to cope with the risk of deadlock rather than the risk of broken state invariants.

Managed code is run during orderly shutdowns in two places: the AppDomain.ProcessExit event and inside the Finalize method for all finalizable objects in the heap. (The term "orderly shutdown" is used to distinguish an Environment.Exit from a P/Invoke to kernel32!TerminateProcess, for instance.) Just as with the example described for native code, threads can be suspended while they hold arbitrary locks and have partially mutated state to the point where invariants do not hold any longer. Instead of permitting the shutdown code to observe this state--possibly causing corruption or unhandled exceptions on the finalizer thread--the CLR treats lock acquisitions as it normally does.

If a lock was orphaned in the process of stopping all running threads, then, the shutdown code path will fail to acquire the lock. If these acquisitions are done with non-timeout (or long timeout) acquires, a hang will ensue. To cope with this (and any other sort of hang that might happen), the CLR annoints a watchdog thread to keep an eye on the finalizer thread. Although configurable, by default the CLR will let finalizers run for 2 seconds before becoming impatient; if this timeout is exceeded, the finalizer thread is stopped, and shutdown continues without draining the rest of the finalizer queue.

This is typically not horrible since many finalizers are meant to cleanup intra-process state that Windows will cleanup automatically anyway. This covers things like file HANDLEs. But it does mean that any additional logic won't be run, like flushing file write-buffers. And for any cross-process state, you're screwed and had better have a fail-safe plan in place, like detecting corrupt machine-wide state and repairing upon the next program restart. (For what it's worth, DLL_PROCESS_DETACH notifications aren't run in all process exits either, so this really is not any worse than what you have with native code today.)

AppDomain unloads are very different beasts. Any reliability-critical code that will run as part of unload (CERs, critical finalizers, and generally any Cer.Success/Consistency.WillNotCorruptState methods) should strictly only ever acquire locks that are always dealt with in a reliable manner throughout the code-base. That statement is actually a little too strong. In reality, either (1) locks must never be orphaned (aside from process exit) or (2) the associated broken state invariants that will occur (e.g. in the face of asynchronous exceptions) can be tolerated.

Unfortunately, we don't give you access to Monitor.ReliableEnter (the BCL team gets to use it, though, as it's internal to mscorlib), which means almost nobody is equipped to do (1) today. It's impossible to write code that will reliably release a monitor in the face of possible asynchronous thread aborts and out of memory exceptions without it. Only a very tiny fraction of the BCL actually deals with locks in such a strictly reliable manner, so as a general rule of thumb very little of it actually acquires and releases locks while executing reliable-critical code. Without the risk of deadlock that is. Hosts will of course use policy to escalate to rude AppDomain unloads in the face of hangs, much like the CLR does by default for process exit.

(Note: Thanks to Jan Kotas--a SDE on the CLR team--for noticing that I confused AppDomain unloads with process exit in my last post, in addition to pointing out that appearances are deceiving: the multi-threaded CRT can actually suffer from the sort of shutdown problems outlined in the last post.)

10/17/2006 12:42:42 PM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, October 14, 2006

When a Windows process shuts down, one of the very first things to happen is the killing of all but one thread. This sole remaining thread is then responsible for performing shutdown duties, both in kernel and in user mode, including executing the appropriate DLL_PROCESS_DETACH notifications for the DLLs loaded in the process. A great treatise on shutdown and the associated subtleties can be found on, of course, Chris Brumme’s weblog.

It’s entirely possible that at least one of those threads was executing under the protection of one or more critical sections when the shutdown was initiated. Since threads are killed in a fairly hostile manner (not like, say, asynchronous thread aborts which are at least a little less rude, even the so-called rude version of a thread abort), these critical sections will have been left in an acquired state. And any associated program state is apt to be left very inconsistent indeed. Worse, you might imagine that if the shutdown thread later needed to acquire one of those oprhaned critical sections, the shutdown process would deadlock.

Although that’s intuitively what you may expect to occur, the OS actually does something a little funny during shutdown to avoid this problem. It effectively ignores calls to kernel32!EnterCriticalSection and kernel32!LeaveCriticalSection. A call to enter a CRITICAL_SECTION will first check to see if it's owned by another thread and, if it is, the section is automatically re-initialized before acquiring it. The result? If one of the previously killed threads, t0, held on to critical section A, for instance, and had partially modified some state protected by it just before the shutdown began, then the shutdown thread, t1, is permitted to freely “acquire” critical section A too, even though it was found as being officially owned by t0.

This means that code running during shutdown must tolerate any corrupt state that may have been left behind as a result. For obvious reasons, this is quite difficult. It's especially difficult if you write some code that somebody believes they can call during shutdown without you having gone through that thoughts exercise. The multi-threaded CRT uses locks internally for malloc/free, for instance, and reportedly cannot reliably tolerate process exit code-paths, which means can't even safely rely on memory allocation and freeing during process exit without spurious AVs, heap corruption, and other bad things. Other services are obviously apt to suffer from similar problems, particularly if they comprise of arbitrary application logic. You simply can't rely on invariant safe-points holding at lock boundaries when a shutdown is in process.

Mutexes also enjoy this same "weakening" behavior, at least on Windows XP. This policy doesn’t, however, apply to waits on other kernel synchronization objects, like events and semaphores. If you rely on these during shutdown you’re just asking for a deadlock. Actually if you are regularly using any sort of synchronization in your DllMain—including acquiring critical sections and mutexes—you’re asking for loads of trouble. Shutdown callbacks run under the protection of the OS loader lock, demanding extreme care, but that’s another topic altogether.

Here is a sample VC++ program that shows off this behavior. We declare a bunch of code in the DllMain: process attach initializes a CRITICAL_SECTION and a mutex, and then detach attempts to acquire them. We then define an exported function, GetAndBlock, that acquires the synchronization objects and sleeps for a long time:

#include <stdio.h>
#include <windows.h>

CRITICAL_SECTION g_cs;
HANDLE g_mutex;

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved)
{
    switch (fdwReason) {
        case DLL_PROCESS_ATTACH:
            InitializeCriticalSection(&g_cs);
            g_mutex = CreateMutex(NULL, FALSE, NULL);
            break;
        case DLL_PROCESS_DETACH:
            printf("%x: Acquiring g_cs during shutdown...", GetCurrentThreadId());
            EnterCriticalSection(&g_cs);
            printf("success.\r\n");

            printf("%x: Acquiring g_mutex during shutdown...", GetCurrentThreadId());
            WaitForSingleObject(g_mutex, INFINITE);
            printf("success.\r\n");

            DeleteCriticalSection(&g_cs);
            CloseHandle(g_mutex);
            break;
    }

    return TRUE;
}

__declspec(dllexport) DWORD WINAPI GetAndBlock(LPVOID lpParameter) {
    // Acquire the mutual exclusion locks.
    EnterCriticalSection(&g_cs);
    WaitForSingleObject(g_mutex, INFINITE);

    printf("%x: g_cs and g_mutex acquired.\r\n", GetCurrentThreadId());

    // And just wait for a little while...
    SleepEx(25000, TRUE);

    return 0;
}

And finally we have an EXE that just invokes GetAndBlock and initiates a process shutdown on separate threads. The result is that the shutdown thread acquires the synchronization objects which the GetAndBlock thread currently has ownership of. Post Windows 95, the shutdown thread is always the thread that initiated the shutdown, whereas before that it was (seemingly) chosen at random; so when run on a modern OS at least, this sample is guaranteed to demonstrate the desired behavior:

#include <windows.h>

DWORD WINAPI GetAndBlock(LPVOID lpParameter);

int main() {
    HANDLE hT1 = CreateThread(NULL, 0, &GetAndBlock, NULL, 0, NULL);
    SleepEx(100, TRUE);
    ExitProcess(0);
}

The results of running are a little non-eventful:

C:\...>shutdown.exe
664: g_cs and g_mutex acquired.
d18: Acquiring g_cs during shutdown...success.
d18: Acquiring g_mutex during shutdown...success.

As expected, no hangs occur. If you want to see what happens when a hang does happen, just replace CreateMutex with CreateEvent. It's not pretty.

Update 10/17/2006: Thanks to Jan Kotas for pointing out that the multi-threaded CRT is actually not safe from the sort of issues I talk about in this article. I wasn't able to get it to happen in a test program--one of the great things about repro'ing race conditions :)--but have fixed that part up.

10/14/2006 10:30:11 PM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, October 10, 2006

It's probably old news on the street, but I just (happily) received my copy of the new of the dragon book yesterday. Yes, after 20 years, there is now a 2nd edition of the cult classic, Compilers: Principles, Techniques, and Tools. I have to admit I like the old cover much better--what can I say, I'm a cheesy cartoons over cheesy 3d models kind of guy--but the fact that there are 3 entirely new chapters on topics near and dear to my heart more than makes up for it: one on runtimes, and two on parallelism.  
10/10/2006 8:45:08 AM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, October 03, 2006

I am often confronted with the question of whether concurrency programming models that employ shared memory are evil. I was asked this question directly on the concurrency panel at JAOO’06 earlier this week, for instance, and STM makes a big bet that such models are tenable.

Without shared memory, it’s tempting to think that traditional concurrency problems go away, as if by magic. If no two pieces of code are simultaneously working on the same location in memory, for instance, there are (seemingly) no race conditions or deadlocks. Most people believe this, and it (on the surface) seems somewhat reasonable. Until you realize that it’s fundamentally flawed.

Shared memory systems are just an abstraction in which data can be named by its virtual memory address. In fact, one could argue that it’s an optimization—that the same sort of systems could be built by mapping virtual memory addresses (at a logical level) to some other location (at a physical level) using an algorithm that doesn't rely on page-tables, TLBs, and so on. Distributed RPC systems in the past have tried this very thing: to map object references to data residing on far-away nodes, and have mostly failed in the process. I’m not trying to convince you that alternative mapping techniques are a good thing, only that abstractly speaking at least, all of the same concurrency control problems will arise in systems that exhibit this fundamental property. Interestingly, shared memory systems have turned into tiny distributed systems with complex cache coherency logic anyway, so one has to wonder where the boundary between shared memory and message passing really lies.

There is a fundamental, undeniable law here:

Any system in which two concurrent agents can name the same piece of data must also exhibit the standard problems of concurrency: broken serializability, race conditions, deadlocks, livelocks, lost event notifications, and so on. Concurrency control is simply a requirement if correctness is desired.

So in reality, the real question at hand should be, would a system in which every concurrent agent operates on its own, completely isolated piece of data be more attractive? I personally think that’s farfetched and unrealistic. Systems with shared data need to have shared data; it's a property of the system being modeled. Even with isolated data, concurrency control would be required if, say, a central copy is rendezvoused with periodically (which, by the way, is the only way I can see such a system remaining correct). And then you have to wonder what copying buys you. It certainly costs you. Data locality is crucial to achieving adequate performance in most low-to-mid-level systems software. Yet copy-on-send message passing systems throw this out the window entirely. I refuse to believe that this will ever be the dominant model of fine-grained concurrency, at least on the current hardware architectures available by Intel and AMD. And certainly not without a whole lot more research and perhaps hardware support.

A distributed system in which many simultaneous clients might access the same piece of data on the server has all the same issues. AJAX systems, for instance, easily lull the author into a false sense of security. But, unfortunately(?), a transaction is a transaction, and if concurrency control isn’t in place, such systems are effectively executing without any isolation or serialization guarantees whatsoever—I just read an article in the latest DDJ where this was explained. I'm surprised a dedicated article actually needs to point this out: concurrent access to data under any other name is still concurrent access to data. And of course, once you start to employ concurrency control, you are susceptible to deadlocks and so on—unless you have a system that can transparently resolve them.

Interesting research has been done recently by MSR on static verification to prove the absence of sharing (across processes)—called Software Isolated Processes (SIPs)—building on the type safe, verifiable subset of IL. STM of course also builds on top of the shared memory programming model; but, although threads can name the same location in memory, this is completely hidden—concurrency control is still employed in the implementation where necessary. I believe this systems are promising. They also have the benefit of building on the same foundational memory performance equations that software developers are used to relying on today.

10/3/2006 4:13:32 PM (Pacific Daylight Time, UTC-07:00)  #   

 Monday, October 02, 2006

Here are the slides for my JAOO'06 talk Concurrency and the composition of frameworks:

JAOO-06__ConcurrencyAndFrameworks.ppt (2.06 MB)

10/2/2006 7:16:10 AM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, September 22, 2006

An article I wrote (seemingly ages ago) just appeared in the September issue of Dr. Dobb's journal:

Application Responsiveness: Using concurrency to enhance user experiences
Thanks to recent innovation in both hardware graphics processors and client-side development frameworks, GUIs for Windows applications have become more and more visually stunning over time. But throughout the evolution of such frameworks, one problem hasn't gone away—poor responsiveness. Studies show that positive user experiences are linked to application responsiveness and, conversely, that frustrating experiences are often caused by poor responsiveness. More often than not, this lack of responsiveness is due to a series of subtle (and sometimes accidental) design choices made during development. In this article, I examine the root of the responsiveness problem, and then suggest some best practices for eliminating it.

My article only touches on some important issues that are described in detail elsewhere.  Here are the references I used:

  1. D. Duis, J. Johnson. Improving User Interface Responsiveness Despite Performance Limitations. Proc. IEEE Computer Society Intl. Conference. February 1990.
  2. J. Duffy. No More Hangs: Techniques for Avoiding and Detecting DeadlocksMSDN Magazine. April 2006.
  3. G. H. Forman. Obtaining Responsiveness in Resource-Variable Environments. PhD Dissertation, University of Washington. 1998.
  4. I. Griffiths. Windows Forms: Give Your .NET-based Applications a Fast and Responsive UI with Multiple Threads. MSDN Magazine. February 2003.
  5. N. Kramer. Threading Models (Windows Presentation Foundation). Weblog essary. June 2005.
  6. G. Maffeo, P. Silwowicz. Win32 I/O Cancellation in Windows Vista. MSDN. September 2005.
  7. V. Morrison. Concurrency: What Every Dev Must Know About Multithreaded Apps. MSDN Magazine. August 2005.
  8. M. E. Russinovich, D. A. Solomon. Microsoft Windows Internals. ISBN 0-735-61917-4, MS Press. December 2004.
  9. C. Sells. Safe, Simple Multithreading in Windows Forms, Part 1. MSDN. June 2002.
  10. C. Sells, I. Griffiths. Programming Windows Presentation Foundation. ISBN 0-596-10113-9, O'Reilly. September 2005.

Thanks go to Jeff Richter, Nick Kramer, Alessandro Catorcini1, and Vance Morrison for reviewing early drafts.  Enjoy.

1. Alessandro, man, you need a blog! ;)

9/22/2006 11:27:36 AM (Pacific Daylight Time, UTC-07:00)  #   

 Wednesday, September 13, 2006

LINQ coaxes developers into writing declarative queries that specify what is to be computed instead of how to compute the results. This is in contrast to the lion's share of imperative programs written today, which are huge rat nests of for-loops, switch statements, and function calls. The result of this new direction? Computationally intensive filters, projections, reductions, sorts, and joins can be evaluated in parallel... transparently... with little-to-no extra input from the developer. The more data the better.

If you buy the hypothesis--still unproven--that developers will write large swaths of code using LINQ, then by inference, they will now also be writing large swaths of implicitly data parallel code. This, my friends, is very good for taking advantage of multi-core processors.

If you want to get a little glimpse of what I've been spending my time working on, check out these (brief) stories about Parallel LINQ (aka PLINQ), a parallel query execution engine for LINQ:

We've spent many, many months now cranking out a fully functional prototype. The numbers were impressive enough to catch the eye of some key people around the company. And the rest is history... (well, not quite yet...)

I'll no doubt be disclosing more about this in the coming weeks.

(Note: I am in no way committing to any sort of product or release timeframe. This technology is quite early in the lifecycle, and, while unlikely, might never actually make the light of day... Label this puppy as "research" for now.)

9/13/2006 4:48:33 AM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, September 01, 2006

Tim Harris, a Microsoft colleague I've had the please to work a lot with lately, joined Simon Peyton Jones, of Glasgow Haskell fame, to do a Channel9 interview on Software Transactional Memory (STM). I encourage you to check it out.

Update: I didn't say this before, but I would love any feedback about this technology. What if you had this in C# today? (Do you want it in C# today?) Would it make your life simpler? What are the major challenges you'd encounter if you were to start using it in your programs and libraries? What are the major benefits? Feel free to leave comments (either here or in the Channel9 post) or send me email directly at joedu AT microsoft DOT com.

9/1/2006 6:10:53 PM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, August 22, 2006

A common technique to avoid giving up your time-slice on multi-CPU machines is to use a hand-coded spin wait. This is appropriate when the cost of a context switch (4,000+ cycles) and ensuing cache effects are more expensive than the possibly wasted cycles used for spinning, which is to say not terribly often. When used properly, however, very little time is spent spinning, and the spin wait is only ever invoked rarely when very specific cross-thread state is seen, such as lock-free code observing a partial update. There are some best practices that must be followed when writing such a spin wait to guarantee good behavior across different machine configurations, i.e. HT, single-CPU, and multi-CPU systems.

A correct wait must issue a yield/pause instruction on each loop iteration to work well on Intel HT machines:

while (!cond) {
    Thread.SpinWait(20);
}

Many implementations should also fall back to a more expensive wait on, say, a Windows event or CLR monitor after spinning a while. This handles the worst case situation in which the thread that is destined to make 'cond' true is not making forward progress as quickly as you'd hoped. A complementary and alternative technique is to simply give up the time-slice in such cases using the Thread.Sleep API:

uint loops = 0;
while (!cond) {
    if ((++loops % 100) == 0) {
        Thread.Sleep(0);
    } else {
        Thread.SpinWait(20);
    }
}

This approach ensures that, if the machine is saturated, the spin wait doesn't prevent the thread which will set the event from being scheduled and making forward progress.

All of this is pure nonsense and ludicrousness on single-CPU machines. If you're waiting for another thread to set an event... well... it clearly isn't going to do that if you're actively using the one and only CPU to waste cycles spinning! Therefore a natural extension to the above approach is to check for a single-CPU machine and respond by yielding to another thread:

uint loops = 0;
while (!cond) {
    if (Environment.ProcessorCount == 1 || (++loops % 100) == 0) {
        Thread.Sleep(0);
    } else {
        Thread.SpinWait(20);
    }
}

OK, this is looking rather nice now. But wait. There's a subtle but nasty problem lurking here.

Sleep(0) actually only gives up the current thread's time-slice if a thread at equal priority is ready to run. Don't believe me? Check out the MSDN docs. If you're writing a reusable API that will be called by a user app, they might decide to drop a few of their threads' priorities. Messing with priorities is actually a very dangerous practice, and this is only one illustration of what can go wrong. (Other illustrations are topics for another day.) In summary, plenty of people do it and so reusable libraries need to be somewhat resilient to it; otherwise, we get bugs from customers who have some valid scenario for swapping around priorities, and then we as library developers end up fixing them in service packs. It's less costly to write the right code in the first place.

Here's the problem. If somebody begins the work that will make 'cond' true on a lower priority thread (the producer), and then the timing of the program is such that the higher priority thread that issues this spinning (the consumer) gets scheduled, the consumer will starve the producer completely. This is a classic race. And even though there's an explicit Sleep in there, issuing it doesn't allow the producer to be scheduled because it's at a lower priority. The consumer will just spin forever and unless a free CPU opens up, the producer will never produce. Oops!

You can solve this problem by changing the Sleep to use a parameter of 1:

uint loops = 0;
while (!cond) {
    if (Environment.ProcessorCount == 1 || (++loops % 100) == 0) {
        Thread.Sleep(1);
    } else {
        Thread.SpinWait(20);
    }
}

This fixes the problem, albeit with the disadvantage that the thread is unconditionally removed from the scheduler temporarily. (We also call SleepEx with an alertable flag which is more expensive due to APC checks, but I digress.) It's unfortunate that a quick 5 minute audit turns up plenty of Sleep(0)'s in the .NET Framework. I hope to get an FxCop rule created to catch this.

The kernel32!SwitchToThread API doesn't exhibit the problems that Sleep(0) and Sleep(1) do. Unfortunately, you can't reliably get at it from managed code. You can P/Invoke, but it's actually dangerous to do if you end up running in a host. We've overridden thread yielding behavior on the CLR such that we can call out to a host for notification purposes. This was used primarily for fiber mode in SQL Server (which was cut), so that it could use this as an opportunity to switch fibers, but other hosts are free to do what they please. If you don't care about working in a host, then feel free to do this, but please document it clearly and use the following HPA signature so people don't use your type incorrectly unknowingly:

[DllImport("kernel32.dll"), HostProtection(SecurityAction.LinkDemand, ExternalThreading=true)]
static extern bool SwitchToThread();

We're looking at adding a Thread.Yield API in the next rev of the CLR that does this in a host-friendly way. For now, you'll have to rely on Sleep(1).

Thankfully, the starvation problem is not quite *that* bad. The Windows scheduler combats this problem. It uses a balance set manager: a system daemon thread whose responsibility it is to wake up once a second to check for threads that are being starved because of a lower priority than other runnable threads. The goal of this service is to prevent CPU starvation and to minimize the impact of priority inversion. If any threads are found by the balance set manager which have been starved for ~3-4 seconds, those starved threads enjoy a temporary priority boost to priority 15 ("time critical"), virtually ensuring the thread will be scheduled. (Although this won't strictly guarantee it: if your other threads have real-time priorities, i.e. >15, then starvation will continue indefinitely... you're playing with dynamite once you enter that realm.) And once the thread does get scheduled, it also enjoys a quantum boost: its next quantum is stretched to 2x its normal time on client SKUs, and 4x its normal time on server SKUs. The priority decays as each quantum passes, continuing until the thread reaches its original lower priority.

In our example above when Sleep(0) is used, we hope this will unstick the machine and let the producer produce and finally the consumer to consume. Indeed with some testing, we see it unstick after a little more than 3 seconds. This is still long enough, however, to kill performance on a server application, cause a noticeable perf degradation on the client, and destroy responsiveness in a GUI app. Here's a simple test that exposes the problem (on a single-CPU machine):

using System;
using System.Diagnostics;
using System.Threading;

class Program {
    public static volatile int x = 0;

    public static void Main() {
        Stopwatch sw = new Stopwatch();
        sw.Start();

        SpawnWork();
        while (x == 0) {
            Thread.Sleep(0);
        }

        sw.Stop();
        Console.WriteLine("Sleep(0) = {0}", sw.Elapsed);

        x = 0;

        sw.Reset();
        sw.Start();

        SpawnWork();
        while (x == 0) {
            Thread.Sleep(1);
        }

        sw.Stop();
        Console.WriteLine("Sleep(1) = {0}", sw.Elapsed);
    }

    private static void SpawnWork() {
        ThreadPool.QueueUserWorkItem(delegate {
            Thread.CurrentThread.Priority = ThreadPriority.BelowNormal;
            x = 1;
        });
    }
}

And here's some example output which is quite consistent from run to run:

Sleep(0) = 00:00:03.8225238
Sleep(1) = 00:00:00.0000678

As we can see, in the case of Sleep(0), the balance set manager stepped in and boosted our producer thread after ~3-4 seconds as promised. We avoid the problem altogether with Sleep(1).

The moral of the story? Priorities are evil, don't mess with them. Always use Sleep(1) instead of Sleep(0). The Windows balance set manager is cool.

8/22/2006 10:18:05 PM (Pacific Daylight Time, UTC-07:00)  #   

 Monday, August 21, 2006

[Update - 8/22/06 - fixed typos and paid homage to VSTS 2005's code analysis which checks for this problem.]

From the department of Spolsky's Law of Leaky Abstractions, we turn today to accidental lock conflicts across AppDomain boundaries.

The CLR supports various cross-AppDomain marshaling mechanisms, one of which is known by the lovely name of marshal-by-bleed. This simply means that pointers from multiple AppDomains actually refer to the same location in memory. Most of the time some form of marshaling is required for objects so that we can safely isolate separate AppDomains from one another.

In managed code, you can lock on any object through the Monitor type, exposed in C# and VB via the 'lock' and 'SyncLock' keywords, respectively. The implementation of Monitor.Enter/Exit uses space in the object header and/or the object's sync-block to record exclusive ownership of the lock. The fact that objects typically don't bleed across AppDomains is a GoodThing(tm), as this is how add-ins, SQL Server, and other hosts isolate failures between components. When writing code, we typically assume state in one AppDomain can't corrupt state in another, totally independent, AppDomain.

Unfortunately, domain neutral Type objects (as well as other Reflection types, e.g. XXInfos) are actually shared across all AppDomains in the process. They are marshal-by-bleed. Strings also fall into this camp. A string argument to a remoted MarshalByRefObject method invocation may be bled, as can be any process-wide interned string literal. The System.Threading.Thread object (called the thread-base-object, aka TBO, internally) also bleeds across AppDomains. What a bloody mess! (Ha ha.)

So why does this all matter?!

Recall that lock owner information is tied to the instance. If you use any of these things as a target of Monitor.Enter, code running in one AppDomain can actually interfere with code in another AppDomain. That's because they are using the same object and thus the same lock information underneath. What a lousy abstraction--this was never meant to leak through! And it can cause trouble too. If one AppDomain orphans the lock (forgets to release it), it may cause deadlocks in other AppDomains. Even sans deadlocks, this fact can simply yield false conflicts, which can subsequently negatively impact scalability.

For example, consider this code:

lock (typeof(object)) {
    ...
}

Code in AppDomain A uses the same Type object to represent 'typeof(object)' as code in AppDomain B. Therefore they share lock information.

If we run such code from multiple AppDomains, the code yields a conflict:

WaitHandle wh = new ManualResetEvent("XXX", false);
lock (typeof(object)) {
    AppDomain ad2 = AppDomain.CreateDomain("2");
    ad2.DoCallBack(delegate {
        ThreadPool.QueueUserWorkItem(delegate {
            WaitHandle wh2 = new ManualResetEvent("XXX", false);
            lock (typeof(object)) {
                wh2.Set();
            }
        });
    });
    wh.WaitOne();
}

If one AppDomain is waiting for a synchronization event from another--as in this example--this can actually yield a deadlock. If you replaced the lock statements in this example with, say, lock ("Foo") { ... }, you'll see the same result due to string literal interning.

Clearly this is nasty problem, especially if Framework code were to use such patterns. This is one reason you'll notice we strongly discourage locking on Type objects. Even if you're not in mscorlib (by default the only domain neutral assembly), your type can be loaded domain neutral based on hosting policy, among other things. And therefore you may not even catch said bugs during testing.

Note that MarshalByRefObjects aren't subject to these problems. Although operations in one AppDomain can refer to the same instance in another, these accesses go through a proxy. Locking on the proxy is different than locking on the raw underlying object, and thus no false conflicts.

This is enforced with the DoNotLockOnObjectsWithWeakIdentity VSTS 2005 code analysis rule.

If all of this is making you feel rather queasy, fear not. We have a weekly "CLR Foundations" meeting where a large portion of the CLR Team meets to discuss the history of the CLR and .NET Framework. A couple weeks back this topic came up in passing. Most people on the team were quite surprised, and many even seemed to be enshrouded in disbelief. At least we can recognize a mistake after it's been made. ;)

8/21/2006 8:50:14 PM (Pacific Daylight Time, UTC-07:00)  #   

 

Recent Entries:

Search:

Browse by Date:
<October 2006>
SunMonTueWedThuFriSat
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234

Browse by Category:

Notables: