Friday, July 22, 2005

The CLR was designed to work very well in a COM world. This design choice is not at all surprising given the history of programming on Windows, and that the CLR began life as the COM+ 2.0 Runtime (among other temporary names). When it comes to concurrency in this world, however, there's a whole host of crap that can go wrong. Thankfully most of the time it doesn't.

Before moving on, if you have a finite amount of time, I'd recommend reading Chris Brumme's weblog on Apartments and Pumping. It's exponentially more worth your time than this post. I'm going to assume you have been introduced to at least a few of the concepts there. Most mortals on this planet haven't. You'll also want to come to my Programming w/ Concurrency talk at PDC, where I'll discuss such "to the metal" details.

OK. I've hyped it up. But I don't really have that much to say.

Using monitors for critical sections

When somebody accesses a shared piece of memory from multiple units of parallel execution, some form of locking is usually necessary. For a very small class of programmers, avoiding locks and retaining correctness is possible, but it's rocket science. Most people give up quickly if they even think to try in the first place (except for double checked locking, which is often copied from some book or website on the topic). If it's a simple primitive operation, interlocked operations might work. But in other cases, you need a coarser grained critical section-ish lock. For manager programmers, this is Monitor (i.e. 'lock' keyword in C#).

A class that has a private shared static variable, for example, would lock on it before mutating its contents. Imagine we have a class Coords:

class Coords
{
    public int x;
    public int y;
}

Our program decides it needs to maintain the invariant that x == y (don't ask why), and here's the code a developer might write:

class MyComponent : ServicedComponent
{
    private static Coords c = new Coords();
    void DoWork()
    {
        lock (myCoords)
        {
            myCoords.x++;
            DoMoreWork();
            myCoords.y++;
        }
    }
    void DoMoreWork() { /* code that tolerates broken invariants */ }
}

So long as we never leak the myCoords instance (raising the risk somebody accesses it w/out locking), we're safe. Right?

Not quite.

Enter STA

You might not have noticed that MyComponent derives from ServicedComponent. This is a ContextBoundObject that lives by all of the standard COM component rules. If it's instantiated inside an STA (Single Threaded Apartment), all access is serialized, as is the case with ordinary COM components. Now, this might seem a tad esoteric, but consider if you have a class that's called by a user who wrote their own ServicedComponent. It might seem more real, and is equally as problematic.

Chris's article above talks at great length about message pumping. STAs have to pump messages, otherwise queued messages could get starved. For UI applications, this pisses users off. For other applications, it can lead to fairness issues at best and incorrect code at worst. We pump for you so you don't need to worry about it, but we might do it in places you might not expect. This ends up being nearly anywhere you can block.

Let's pretend DoMoreWork above did this:

void DoMoreWork()
{
    Thread.CurrentThread.Join(0);
}

Join waits for the target thread to complete execution or the timeout to expire, whichever comes first. Since we call it on our own thread, it should be clear which occurs first. (You are still awake, right?)

When you pump, code can reenter on top of your existing stack. Let's look at the entire snippet of code:

[ComVisible(true)]
public class MyComponent : ServicedComponent
{
    private static Coords c = new Coords();

    public void DoWork(int n)
    {
        Console.WriteLine("{0}->", n);

        lock (c)
        {
            // Check invariant x==y upon entry
            int x = c.x, y = c.y;
            Console.WriteLine("{0}:{1},{2}", n, x, y);
            Debug.Assert(x == y,
                string.Format("Broken invariant on entry (#{0}, {1}!={2})", n, x, y));

            c.x++;
            DoMoreWork();
            c.y++;

            // Ensure invariant x==y upon exit
            x = c.x; y = c.y;
            Debug.Assert(x == y,
                string.Format("Broken invariant on exit (#{0}, {1}!={2})", n, x, y));
        }

        Console.WriteLine("{0}<-", n);
    }

    private void DoMoreWork()
    {
        Thread.CurrentThread.Join(0);
    }
}

Recap: The call to DoMoreWork from the DoWork function occurs while invariants are broken. And DoMoreWork (or a function that DoMoreWork calls, e.g. some opaque inside the Framework) pumps. This is a recipe for bad things.

I also added some Console.WriteLines and Debug.Asserts in there so you can watch the world fall down.

Breaking monitors with reentrancy

The situation we need to get into in order to show off this neat parlor trick is as follows:

  • A bunch of MyComponents are created inside an STA server;
  • We try to make a load of calls to DoWork on those components from an MTA client;
  • This requires that the MTA code reenter the STA to execute;
  • Our STA thread pumps while invariants are broken, thus reentering another set of work (and enabling it to see us in an inconsistent state).

It's not quite as difficult as it sounds, thanks to the CLR's accomodating interaction with the world of COM.

class Program
{
    const int threadCount = 5;

    [STAThread]
    static void Main()
    {
        // Create our components in our STA server (note the STAThread on Main)
        MyComponent[] components = new MyComponent[threadCount];
        for (int i = 0; i < threadCount; i++)
            components[i] = new MyComponent();

        // Instantiate a bunch of MTA threads to work on the STA component
        List<Thread> threads = new List<Thread>(threadCount);
        for (int i = 0; i < threadCount; i++)
        {
            int v = i;
            Thread t = new Thread(delegate () { components[v].DoWork(v); });
            t.SetApartmentState(ApartmentState.MTA); // default--here for illustration
            threads.Add(t);
        }

        // Let 'em loose
        threads.ForEach(delegate (Thread t) { t.Start(); });

        // If you haven't Aborted by now, wait for completion
        threads.ForEach(delegate (Thread t) { t.Join(); });
    }
}

This glob of code does exactly what my bullets indicate. The whole thing can be downloaded here. Note: ensure you compile this with the DEBUG symbol defined, otherwise your calls to Debug.Assert won't be present and you won't get the desired effect of being bombarded with assert dialogs.

It's quite nice that the CLR goes out of its way to marshal across contexts, moving our code over from the MTA to the thread in the STA, executing it, and marshaling back. And furthermore, the pumping it is doing is in good faith. It's trying to make our application responsive and fair.

Unfortunately, I see the following output when I run the code:

Constructing components in a STA server...
Instantiating 5 MTA threads to operate on our components...
Starting up MTA threads...
Waiting for MTA completion...
3->
3:0,0
3<-
2->
2:1,1
2<-
1->
1:2,2
0->
0:3,2
4->
4:4,2
4<-
0<-
1<-

Notice the "3,2" line. That prints out "x,y"... and does so at a point in the program where they should always be equal. Unfortunately, we've got reentrant code inside our lock, and it now has access to broken invariants! Your mileage may vary based on the inherent race condition. To be fair, this is also a byproduct of our decision to make monitors reentrant. But this decision was made for recursion, not reentrancy. It turns out we don't recognize the difference.

Of course, the above example doesn't demonstrate anything too terrible. But if you happened to apply some sensitive thread wide state that you intended to roll back before enabling other code to run, for example, it means you absolutely want to avoid pumping inside a critical section. That means mostly avoiding opaque method calls, even if you suspect they don't pump. In the future, they could. In practice, this is tough to acheive. And in practice, it usually doesn't matter.

7/22/2005 4:42:37 PM (Pacific Daylight Time, UTC-07:00)  #   

I made a pretty simple mistake just now. I knew the source of the problem immediately when I saw the exception, but it's fairly interesting.

What's wrong with this code?

MyComponent[] myComponents = Create25Components();
for (int i = 0; i < 25; i++) {
    Thread t = new Thread(delegate () {
        // do some stuff
        myComponent[i].DoWork();
        // do some more stuff
    });
    t.Start();
}

Oops! you say? Oops! for sure.

The reference to the induction variable i gets treated like an ordinary variable C# captures inside an anonymous delegate closure. Namely that it just gets captured into a closure class which each thread shares access to. Access to i inside the thread simply dereferences the shared memory location to obtain the value during execution... not when you capture it. So assuming the parent thread is able to spin through the loop quickly, all of your threads will probably see the final result of i, which is 25. It turns out 25 is an invalid index for the array, resulting in an IndexOutOfRangeException or two. If they get a chance to run quickly, they will see some number in between, but probably not the correct one!

One (of many) solutions is to write this instead:

MyComponent[] myComponents = Create25Components();
for (int i = 0; i < 25; i++) {
    MyComponent mc = myComponents[i];
    Thread t = new Thread(delegate () {
        // do some stuff
        mc.DoWork();
        // do some more stuff
    });
    t.Start();
}

Easy mistake.

7/22/2005 3:30:12 PM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, July 21, 2005

It seems JScript was a bit ahead of its time, in much the same sense that Lisp was. More accurately ECMAScript was, the standard on which Microsoft's implementation was based.

In 3 lines of code, here is a REPL that compiles and works. It is written in JScript.NET itself (jsc.exe):

import System;
while (true)
    print(eval(Console.ReadLine()));

A more robust version can be written in roughly 50 lines of code:

import System;
import System.Text;

function read()
{
    var program = new StringBuilder();

    while (true)
    {
        Console.Write("{0}{0} ", program.Length == 0 ? ">" : ".");
        var line = Console.ReadLine();
        if (line != "")
        {
            if (program.Length == 0 && line.StartsWith("!q"))
                break;
            else
                program.AppendLine(line);
        }
        else if (program.Length != 0)
        {
            break;
        }
    }

    return program.ToString();
}

function evalprint(program)
{
    try
    {
        var o = eval(program);
        Console.WriteLine("=> {0}", o);
    }
    catch (e)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.Error.WriteLine(e.ToString());
        Console.ResetColor();
    }
}

while (true)
{
    var program = read();
    if (program == "") break;
    evalprint(program);
}

It's just a matter of time before C# and the world at large embrace the power of eval().

7/21/2005 12:26:40 AM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, July 17, 2005

Note to self: if you happen to write code that AVs the FJIT in Rotor, this prevents you from building the FX tree. I honestly don't quite know why the runtime is loaded there and what managed code executes, but according to Jan Kotas, a dev on the CLR team, it is. And furthermore, the debugging experience kind of sucked... the build log had no errors in it, but binplace failed to find the output DLL when it tried to copy. Sure enough it didn't get built. I'm pasting the error for folks searching in the future:

Binplacing - objd\rotor_x86\system.xml.dll for all platforms
binplace : warning BNP0000: CopyFile(C:\dev\play\sscli-1.0__STM\fx\src\xml\objd\rotor_x86\System.Xml.dll,C:\dev\play\sscli-1.0__STM\build\v1.x86chk.rotor\.\System.Xml.dll) failed 2
binplace : error BNP0000: Unable to place file objd\rotor_x86\System.Xml.dll - exiting.

It killed at least 1 hour of my time tracking it down. Running a quick test under the devenv debugger:

C:\dev\play\sscli-1.0__STM\tests\il_bvt\base\objd>devenv /debugexe %TARGETCOMPLUS%\clix.exe ceq.exe

Did the trick. Stupid bug, easily fixed, and now I'm back building the tree again. Hoorah.

7/17/2005 10:39:27 PM (Pacific Daylight Time, UTC-07:00)  #   

I've spent a bit of time this weekend on my concurrency talk for PDC. It's taking me longer than expected, mostly because I'm writing "a story" up front... before I even think about touching PPT or writing code. The end result will be a great story to tell captured in a paper and--so that I have a convenient way to guide me through the talk--a slide deck. Too many people use PPT as a crutch for presentations, and most of the time it shows.

The talk's focus is on the hows and whys of concurrency with a good mix of the realities of the Windows platform thrown in. This necessarily involves some mechanics (e.g. best practices with explicit threading and the ThreadPool, synchronization, locks, lock-free programming), but also a detailed look inside our platform's legacy, how and why we got here, why some of our legacy still affects how we write concurrent code (anti-concurrency), and where we're headed.

If you're interested in reading up on this stuff, I'd recommend any of the following books. It just so happens that they're all sitting in front of me and being used as references:

Another great related resource that you might want to check out is an article Vance Morisson wrote for August's MSDN magazine. Vance is one of the most senior guys on the team, and is the architect for the CLR's JIT. Bottom line: one of the smartest guys I've ever met.

7/17/2005 7:41:31 PM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, July 14, 2005

As I'm sure you've heard, the PDC talk abstracts just went live.

Jan Gray and I are doing a two part series on Concurrency. His talk (Part I) focuses on the philosophy, hardware, and primitives. I jump up a notch (in Part II) to look at how the Windows platform exposes concurrency, and some of the abstractions we ship to help you take advantage of it:

Programming with Concurrency (Part II)—Multithreaded Programming with Shared Memory
In this session, see hands-on examples illustrating how best to achieve parallelism safely using multithreaded techniques on Longhorn and the Common Language Runtime (CLR). We walk through some common scenarios, APIs, best practices and pitfalls, and take an in-depth look at both managed and native technologies such as threading on the CLR, Windows threads, and OpenMP. To protect your code from concurrency hazards, we discuss how Longhorn and CLR can help you handle deadlocks and other hangs as well as shared memory exhaustion. We touch on more advanced topics such as CLR explicit threading and the thread pool and asynchronous programming. Legacy issues that impact concurrent programming today such as COM and UI apartment threading and thread affinity are considered. You can expect to walk away from this session with the knowledge necessary to get started on writing efficient and reliable concurrent programs.
Session Level(s): 400
Track(s): Fundamentals

I'm also co-presenting with Mr. Pobar in a talk every compiler geek would love... (and anybody who wants to see an Aussie and a Bostonian duke it out on stage over my assertion that Lisp is the one and only truth in the world (don't worry, I'm just an academic ;) )...)

CLR: Writing a Managed Script Compiler in One Hour
Learn about writing a scripting language compiler that targets Intermediate Language in a single session. Coverage of key decision/choice points when compiling a language on the Common Language Runtime (CLR) are discussed, including the decision to statically or dynamically type and the impacts this has on your design. Includes coverage of writing a late-binder, showing that everything really can be typeless in your source language. Demonstration uses a strawman scripting language as the base language.
Track(s): Tools & Languages

Are you going to PDC? Either one of these talks interest you?

7/14/2005 11:37:58 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, July 10, 2005

I'm wrapping up a chapter in my book on Unmanaged Interop this evening. In the process, I've fallen back in love with some classics on my bookshelf:

COM is still cool in my book. (Pun not intended.)

And furthermore, I can't tell you what a blessing it is to be able to write about a topic, encounter a question or two, and walk right down the hall to the guy's office who 1) knows the absolute most about a specific technology and 2) is kind enough to answer questions in exceeding detail. I hope this translates into a better end product.

Here are just a few (externally available) amazing resources related to hosting, CERs, and SafeHandles, the new face of Unmanaged Interop for V2.0:

7/10/2005 10:04:34 PM (Pacific Daylight Time, UTC-07:00)  #   

The Designing .NET Framework Class Libraries series that aired on MSDN a while back was good fun. The format was fun for us here at Microsoft (video on Friday, chat on the following Wednesday). I hope those who participated agreed. If not, I'd love to get your feedback: what did you like, what didn't you like about the format? Should we do precisely the same way if we do that type of thing in the future... make a couple tweaks?

Here's a cross index to each of the talks with associated chat transcripts:

  1. Setting the Stage
  2. Naming Conventions
  3. Rich Type System
  4. Member Types
  5. Designing Inheritence Hierarchies
  6. API Usability
  7. Designing Progressive APIs
  8. CLR Performance Tips**
  9. Designing for a Managed Memory World**
  10. Understanding Interoperability**
  11. Packaging, Assemblies, and Namespaces
  12. FxCop in Depth
  13. Enabling Development Tools**
  14. Security**
  15. Q&A**

We had a great viewing count (I don't have the #s off hand), comparable to some of the more popular MSDN articles and downloads.

Unfortunately, a few chat transcripts are still missing from the home page. I apolgoize for this, it's in part my fault. The good news: I've been assured they'll be up this week.

In the interim, this cross index should suffice. Those marked with ** are currently missing their chat transcripts. The videos for all are available from the Talk Homepage links. Enjoy!

7/10/2005 6:14:14 PM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, July 08, 2005

Brad just pointed out the recent DotNetRocks episode with the PDC planning folks. They give a good insight into the insanity that goes behind planning such a huge event. I'm helping to organize the CLR Team's presence there, but my part is minimal compared to some guys (like the folks in the interview).

I'm really psyched about PDC this year. We've got a plethora of great talks lined up, and some of the best technical speakers you'll find this side of tha' Missisippi. Just take a look at the talks we've release thus far...

And they have an RSS feed so you can keep an eye on the talks as they are released!

If you haven't convinced your boss to pay yet, better work harder (and faster).

The fun begins 2 months from this weekend!

7/8/2005 10:11:42 PM (Pacific Daylight Time, UTC-07:00)  #   

When access to a location in memory is shared among multiple tasks executing in parallel, some form of serialization is necessary in order to guarantee consistent and predictable logic. Furthermore, in many situations, a number of such reads and writes to shared memory are expected to happen “all at once,” in other words atomically. Serializability and atomicity are both often implemented using mutual exclusion locks. This is bread and butter stuff.

An important concept in concurrent programming is forward progress. This is the idea that the largest number of parallel tasks should make the most amount of progress towards their goal as possible for every given time unit of execution. If you can manage to divvy up the work such that all tasks can execute completely logically independently from each other—called linear parallelization, something that is actually difficult to achieve in practice—then sharing resources such as memory can quickly bog down your theoretical linear speedup in practice. Shared memory prevents each task from making forward progress because there are points of execution where access to resources must be serialized. That means code has to wait in line in order to execute. That’s generally bad.

What an ambitious introduction. Unfortunately, I must constrain the rest of this particular article to some very precise, more manageable topics… Else I would never complete it, and might end up with a book on my hands. And furthermore, I am going to constrain my conversation to the CLR, with a focus on the Monitor APIs. I intend to write a series of these articles over the coming months, since I’ve been writing a lot about the topic in general lately.

Eliminating deadlocks

Deadlocks are well documented out there, and are simple to understand. Thus I will start with them. Deadlocks are by far the #1 forward progress inhibitors. While contention over shared memory can prevent all but one parallel unit from making forward progress (in the extreme case, where all tasks request access to the same resource simultaneously), deadlocks prevent all units involved in the access from making forward progress. Without detection and correction logic, your program is likely to come to a grinding halt.

For example, consider two bits of code running in parallel:

#1:                        #2
lock (a)                   lock (b)
{                          {
    lock (b)                   lock (a)
    {                          {
        // atomic code             // more atomic code
    }                          }
}                          }

As written, these can easily get into a so called deadly embrace. Because they acquire and release locks in the opposite order, it’s not a difficult stretch to imagine #1 acquiring a, #2 acquiring b, and then #1 trying to acquire b (blocking forever), and #2 trying to acquire a (also blocking forever). The result is often a hung application or background worker thread. The result is a frustrated user having to open up Task Manager so they can slam the End Task button tens of times… and then waiting for dumprep.exe to get done with its jazz.

The solution to this problem is often “acquire and release locks in the same order,” but that’s seldom achievable in practice. It’s more likely that a and b are acquired in entirely separate functions, deep in some complex call-stack, which can moreover alter the flow of control at runtime. It’s not always a statically detectable situation. Another solution is to write your code so that it can back off of lock acquisitions if it suspects a deadlock has occurred. With the new Monitor.TryEnter API, this is relatively trivial to do (in the simple case).

Regardless of how ridiculously simplified this scenario is, let’s start here. It’s easier to understand and solve.

A quick note on SQL Server

Through the CLR’s hosting APIs, you can actually hook all blocking points, including Monitor.Enter calls. SQL Server (and possible other sophisticated hosts) use this to detect deadlocks and prevent them from occurring. Unfortunately, I don’t know their policy for handling, but presumably it is a fair one whereby a victim is chosen at random and killed. This is consistent with the way SQL Server handles deadlocks pertaining to data transactional deadlocks. Chris Brumme’s weblog entry on Hosting has a plethora of related information.

Lock ordering and optimistic deadlock back-off

An old fashioned solution to this problem is to mentally tag all locks in your program, and ensure that you acquire them in a consistent manner. You could use a simple algorithm, such as “sort by variable name.” This works so long as you never alias a memory location. Oh, and so long as you don’t make a mistake when you’re writing the code (and anybody else who is touching your program). But this would be error prone and laborious. We can do better.

We could, for example, write a function that accepts a list of objects and does a few things in the process of locking on them:

  • Sorts the objects in identity order to ensure consistent lock acquisition ordering;
  • Uses a simple back-off strategy in case there are other lockers not using our ordered locking scheme.

The code might look like this:

static int deadlockWait = 15;

static bool EnterLocks(params object[] locks)
{
    return EnterLocks(-1, locks);
}

static bool EnterLocks(int retryCount, params object[] locks)
{
    // Clone and sort our locks by object identity.
    object[] locksCopy = (object[])locks.Clone();
    Array.Sort<object>(locksCopy, delegate(object x, object y)
    {
        int hx = x == null ? 0 : RuntimeHelpers.GetHashCode(x);
        int hy = y == null ? 0 : RuntimeHelpers.GetHashCode(y);
        return hx.CompareTo(hy);
     });

    // Now begin the lock acquisition process.
    bool successful = false;
    for (int i = 0; !successful && (retryCount == -1 || i < retryCount); i++)
    {
        successful = true;
        for (int j = 0; j < locksCopy.Length; j++)
        {
            try
            {
                if (!Monitor.TryEnter(locksCopy[j], deadlockWait))
                {
                    // We couldn't acquire this lock, ensure we back off.
                    successful = false;
                    break;
                }
            }
            catch
            {
                // An exception occurred--we don't know whether we got the last lock
                // or not. Assume we did. We indicate that by incrementing the counter.
                j++;
                successful = false;
                throw;
            }
            finally
            {
                if (!successful)
                {
                    for (int k = 0; k < j; k++)
                    {
                        try { Monitor.Exit(locksCopy[k]); }
                        catch (SynchronizationLockException) { /* eat it */ }
                    }
                    Thread.Sleep(0); // Might increase chances that a thread will steal a lock (good).
                }
            }
        }
    }

    return successful;
}

This method is actually sufficiently complex that it warrants a bit of discussion. Most of the complexity stems from our paranoia about orphaning locks coupled with the back-off algorithm. Notice that we first sort the list of locks, using the System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode method for comparisons (this function returns a unique hash code based on an object’s identity). We then use a loop to try acquisition of the locks. If an acquisition fails, we begin the back-off logic by unraveling any locks we had acquired previously, yielding the thread to increase the chance that another possibly deadlocked thread is able to make forward progress, and starting over again.

Of course, a real function would probably offer a timeout variant. The timout for the Monitor.TryEnter isn’t configurable, the retry Count is near meaningless to the user, and the routine is still subject to denial of service attacks whereby somebody grabs a lock and holds on to it forever. In that case, we’ll loop forever (unless an explicit retryCount is provided, it defaults to -1 which means infinite). We also need a similar, although much simpler, ExitLocks mechanism. I’ve omitted these implementations for brevity. Lastly, in the face of asynchronous aborts, this code falls on its face. Nevertheless, it demonstrates the concepts (I hope).

Cross call-stack ordering and back-off

Again, this strategy works only if you know all of your locks up front. With deep call-stacks, this may not be the case. For example, consider:

void f(bool b)
{
    if (b)
    {
        lock (a)
        {
             g(!b);
        }
    }
    else
    {
        lock (b)
        {
             g(!b);
        }
    }
}

void g(bool b)
{
    if (b)
    {
        lock (a)
        {
            // some atomic function
        }
    }
    else
    {
        lock (b)
        {
            // some other atomic function
        }
    }
}

If these were called from two parallel tasks, one task run as f(true) the other as f(false), you’d have a similar, although much more complex and difficult to follow, deadlock scenario. We might be able to (almost) solve this, too, however, with some really ugly hacks that I wouldn’t suggest anybody uses in real code. With that caveat, let’s take a look at them…

We could learn a thing or two from STM. If we performed only idempotent and reversible operations inside the atomic (lock protected block), you could imagine a more complex back-off strategy that spanned multiple levels of a call-stack. This requires you to make a lot of assumptions, use exceptions for control flow, and quite truthfully some unorthodox strategies (including polluting your thread with state). Moreover, without some form of transactional memory, rollback in the case of failure has to be done manually. These are in general bad practices, but the result seems to exhibit some redeeming qualities.

Here’s a big steaming pile of code that attempts to demonstrate a possible implementation:

static LocalDataStoreSlot atomicSlot;
static Par()
{
    atomicSlot = Thread.AllocateNamedDataSlot("AtomicContext");
}

internal class AtomicFailedException : Exception
{
    public AtomicFailedException() {}
}

internal class AtomicContext
{
    internal AtomicContext parent;
    internal List<object> toLock = new List<object>();
}

static bool DoAtomically(Action<object> action, params object[] locks)
{
    return DoAtomically(action, null, locks);
}

static bool DoAtomically(Action<object> action, Action<object> cleanup, params object[] locks)
{
    return DoAtomically(action, cleanup, 10, locks);
}

static bool DoAtomically(Action<object> action, Action<object> cleanup, int retryCount, params object[] locks)
{
    bool entered = false;

    // We have to maintain our context so that we can unravel the parent correctly.
    AtomicContext ctx = new AtomicContext();
    ctx.toLock.AddRange(locks);
    ctx.parent = (AtomicContext)Thread.GetData(atomicSlot);
    Thread.SetData(atomicSlot, ctx);
    try
    {
        for (int i = 0; !entered && i < retryCount; i++)
        {
            if (entered = EnterLocks(10, ctx.toLock.ToArray()))
            {
                bool retryRequested = false;
                try
                {
                    action(null);
                }
                catch (AtomicFailedException)
                {
                    if (cleanup != null)
                        cleanup(null);
                    retryRequested = true;
                }
                finally
                {
                    if (entered)
                        ExitLocks(locks);
                    if (retryRequested)
                        entered = false;
                }
            }
        }
    }
    finally
    {
        // Reset the context to what it was before we polluted it.
        AtomicContext cctx = (AtomicContext)Thread.GetData(atomicSlot);
        Thread.SetData(atomicSlot, cctx.parent);
        if (!entered && cctx.parent != null)
        {
            cctx.parent.toLock.AddRange(cctx.toLock);
            throw new AtomicFailedException();
        }
    }

    return entered;
}

The last overload is obviously the most complex, and the meat of the implementation. DoAtomically uses a back-off strategy not unlike the first EnterLocks function. In fact, it uses EnterLocks for lock acquisition. DoAtomically maintains a context of the locks that must be acquired, and can be chained such that there is a parent/child relationship between two contexts (representing multiple DoAtomically calls in a single call stack).

The function then goes ahead and attempts to acquire each object that much be locked. If it succeeds, it calls the delegate that was supplied as an argument. This delegate can likewise make DoAtomically calls which will recursively detect deadlocks and perform escalation if they occur. Note: there is some noise here. Because of the small timeout we use, a function that holds a lock for an extended period of time can give the impression of a deadlock. This number could probably use some tuning. Further, I haven’t tested the interaction between this code and non-DoAtomically code. Presumably, it would be more succeptable to livelock, but wouldn’t actually fail or deadlock (assuming the other code doesn’t mount a denial of service).

The escalation policy we use is to perform cleanup logic (since we tried to execute the action, there could be broken invariants that must be restored), mutate the parent context so that it will attempt to acquire the locks the child tried to acquire (and failed), and essentially unravel the stack to the parent (using an exception—ugh—I think continuations would make this a much prettier situation). The parent then tries to acquire its own locks in addition to the child locks that got escalated. This can be an arbitrarily nested call-stack, so a parent could end up with more than just a single child’s locks to acquire. But this ensures an entire call stack’s worth of locks are acquired in an ordered fashion, and furthermore backed off of all at once. The obvious downside to this approach is that you end up taking a coarser grained lock than necessary, but with the benefit of avoiding deadlocks.

Assuming all of the back off and retry succeeds, it will return a true indicating success. If it doesn’t, and it’s exhausted all of its retries and escalation space, the topmost atomic block will simply return false to indicate failure. Honestly, an exception in this case might be more appropriate.

An overly simple example

A small test function that uses this (sorry, I didn't have time to write up a more complex one), is as follows:

static int i = 0;
static object x = new object();
static object y = new object();

static void Main()
{
    List<Thread> ts = new List<Thread>();
    for (int j = 0; j < 20; j++)
    {
        Thread t = new Thread(new ThreadStart(delegate {
            DoAtomically(delegate {
                i++; Console.WriteLine("{0}, {1}", Thread.CurrentThread.ManagedThreadId, i); i--;
            }, x, y);
        }));
        ts.Add(t);
    }

    ts.ForEach(delegate (Thread t) { t.Start(); });
    ts.ForEach(delegate (Thread t) { t.Join(); });
}

Of course, all threads should print out the number 1.

A brief word on livelock

A quick word on livelock with the above design. With an escalation policy as defined above—your standard back-off, yield, and retry—it is highly susceptible to live-lock. This is a situation where code is trying to make progress, but chasing its own tail, or continually hitting conflicts. Consider what happens if a very long block and a very short block are competing in a deadlock fashion for the same resources.

The policy defined above will always back-off and retry, meaning that a short transaction has less work to do in order to perform its task. If the larger block is higher priority than the smaller one, we’re unfairly favoring the small block simply due to its size. But similarly, a long running block could acquire a lot of resources, and the smaller block could quickly try (and retry) to acquire locks, fail, and give up.

Lock leveling or some more intelligent queuing system might help out here. But I’ve written enough already.

Future topics

If you’re interested in a particular concurrency-related topic, let me know!

I’d like to spend more time in the future on:

  • Events and signaling;
  • Managing large groups of complex parallel tasks;
  • Implicit parallelism, e.g. using compiler code generation and IL rewriting;
  • STA, COM and UI programming, reentrancy;
  • More on livelock—it happens in a lot of contexts—and some ideas on how to solve them;
  • Lock free programming, and why you should avoid it.

Feedback will help me write about things you want to know about.

Happy hacking!

7/8/2005 4:55:10 PM (Pacific Daylight Time, UTC-07:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy