Thursday, July 14, 2005

As I'm sure you've heard, the PDC talk abstracts just went live.

Jan Gray and I are doing a two part series on Concurrency. His talk (Part I) focuses on the philosophy, hardware, and primitives. I jump up a notch (in Part II) to look at how the Windows platform exposes concurrency, and some of the abstractions we ship to help you take advantage of it:

Programming with Concurrency (Part II)—Multithreaded Programming with Shared Memory
In this session, see hands-on examples illustrating how best to achieve parallelism safely using multithreaded techniques on Longhorn and the Common Language Runtime (CLR). We walk through some common scenarios, APIs, best practices and pitfalls, and take an in-depth look at both managed and native technologies such as threading on the CLR, Windows threads, and OpenMP. To protect your code from concurrency hazards, we discuss how Longhorn and CLR can help you handle deadlocks and other hangs as well as shared memory exhaustion. We touch on more advanced topics such as CLR explicit threading and the thread pool and asynchronous programming. Legacy issues that impact concurrent programming today such as COM and UI apartment threading and thread affinity are considered. You can expect to walk away from this session with the knowledge necessary to get started on writing efficient and reliable concurrent programs.
Session Level(s): 400
Track(s): Fundamentals

I'm also co-presenting with Mr. Pobar in a talk every compiler geek would love... (and anybody who wants to see an Aussie and a Bostonian duke it out on stage over my assertion that Lisp is the one and only truth in the world (don't worry, I'm just an academic ;) )...)

CLR: Writing a Managed Script Compiler in One Hour
Learn about writing a scripting language compiler that targets Intermediate Language in a single session. Coverage of key decision/choice points when compiling a language on the Common Language Runtime (CLR) are discussed, including the decision to statically or dynamically type and the impacts this has on your design. Includes coverage of writing a late-binder, showing that everything really can be typeless in your source language. Demonstration uses a strawman scripting language as the base language.
Track(s): Tools & Languages

Are you going to PDC? Either one of these talks interest you?

7/14/2005 11:37:58 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, July 10, 2005

I'm wrapping up a chapter in my book on Unmanaged Interop this evening. In the process, I've fallen back in love with some classics on my bookshelf:

COM is still cool in my book. (Pun not intended.)

And furthermore, I can't tell you what a blessing it is to be able to write about a topic, encounter a question or two, and walk right down the hall to the guy's office who 1) knows the absolute most about a specific technology and 2) is kind enough to answer questions in exceeding detail. I hope this translates into a better end product.

Here are just a few (externally available) amazing resources related to hosting, CERs, and SafeHandles, the new face of Unmanaged Interop for V2.0:

7/10/2005 10:04:34 PM (Pacific Daylight Time, UTC-07:00)  #   

The Designing .NET Framework Class Libraries series that aired on MSDN a while back was good fun. The format was fun for us here at Microsoft (video on Friday, chat on the following Wednesday). I hope those who participated agreed. If not, I'd love to get your feedback: what did you like, what didn't you like about the format? Should we do precisely the same way if we do that type of thing in the future... make a couple tweaks?

Here's a cross index to each of the talks with associated chat transcripts:

  1. Setting the Stage
  2. Naming Conventions
  3. Rich Type System
  4. Member Types
  5. Designing Inheritence Hierarchies
  6. API Usability
  7. Designing Progressive APIs
  8. CLR Performance Tips**
  9. Designing for a Managed Memory World**
  10. Understanding Interoperability**
  11. Packaging, Assemblies, and Namespaces
  12. FxCop in Depth
  13. Enabling Development Tools**
  14. Security**
  15. Q&A**

We had a great viewing count (I don't have the #s off hand), comparable to some of the more popular MSDN articles and downloads.

Unfortunately, a few chat transcripts are still missing from the home page. I apolgoize for this, it's in part my fault. The good news: I've been assured they'll be up this week.

In the interim, this cross index should suffice. Those marked with ** are currently missing their chat transcripts. The videos for all are available from the Talk Homepage links. Enjoy!

7/10/2005 6:14:14 PM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, July 08, 2005

Brad just pointed out the recent DotNetRocks episode with the PDC planning folks. They give a good insight into the insanity that goes behind planning such a huge event. I'm helping to organize the CLR Team's presence there, but my part is minimal compared to some guys (like the folks in the interview).

I'm really psyched about PDC this year. We've got a plethora of great talks lined up, and some of the best technical speakers you'll find this side of tha' Missisippi. Just take a look at the talks we've release thus far...

And they have an RSS feed so you can keep an eye on the talks as they are released!

If you haven't convinced your boss to pay yet, better work harder (and faster).

The fun begins 2 months from this weekend!

7/8/2005 10:11:42 PM (Pacific Daylight Time, UTC-07:00)  #   

When access to a location in memory is shared among multiple tasks executing in parallel, some form of serialization is necessary in order to guarantee consistent and predictable logic. Furthermore, in many situations, a number of such reads and writes to shared memory are expected to happen “all at once,” in other words atomically. Serializability and atomicity are both often implemented using mutual exclusion locks. This is bread and butter stuff.

An important concept in concurrent programming is forward progress. This is the idea that the largest number of parallel tasks should make the most amount of progress towards their goal as possible for every given time unit of execution. If you can manage to divvy up the work such that all tasks can execute completely logically independently from each other—called linear parallelization, something that is actually difficult to achieve in practice—then sharing resources such as memory can quickly bog down your theoretical linear speedup in practice. Shared memory prevents each task from making forward progress because there are points of execution where access to resources must be serialized. That means code has to wait in line in order to execute. That’s generally bad.

What an ambitious introduction. Unfortunately, I must constrain the rest of this particular article to some very precise, more manageable topics… Else I would never complete it, and might end up with a book on my hands. And furthermore, I am going to constrain my conversation to the CLR, with a focus on the Monitor APIs. I intend to write a series of these articles over the coming months, since I’ve been writing a lot about the topic in general lately.

Eliminating deadlocks

Deadlocks are well documented out there, and are simple to understand. Thus I will start with them. Deadlocks are by far the #1 forward progress inhibitors. While contention over shared memory can prevent all but one parallel unit from making forward progress (in the extreme case, where all tasks request access to the same resource simultaneously), deadlocks prevent all units involved in the access from making forward progress. Without detection and correction logic, your program is likely to come to a grinding halt.

For example, consider two bits of code running in parallel:

#1:                        #2
lock (a)                   lock (b)
{                          {
    lock (b)                   lock (a)
    {                          {
        // atomic code             // more atomic code
    }                          }
}                          }

As written, these can easily get into a so called deadly embrace. Because they acquire and release locks in the opposite order, it’s not a difficult stretch to imagine #1 acquiring a, #2 acquiring b, and then #1 trying to acquire b (blocking forever), and #2 trying to acquire a (also blocking forever). The result is often a hung application or background worker thread. The result is a frustrated user having to open up Task Manager so they can slam the End Task button tens of times… and then waiting for dumprep.exe to get done with its jazz.

The solution to this problem is often “acquire and release locks in the same order,” but that’s seldom achievable in practice. It’s more likely that a and b are acquired in entirely separate functions, deep in some complex call-stack, which can moreover alter the flow of control at runtime. It’s not always a statically detectable situation. Another solution is to write your code so that it can back off of lock acquisitions if it suspects a deadlock has occurred. With the new Monitor.TryEnter API, this is relatively trivial to do (in the simple case).

Regardless of how ridiculously simplified this scenario is, let’s start here. It’s easier to understand and solve.

A quick note on SQL Server

Through the CLR’s hosting APIs, you can actually hook all blocking points, including Monitor.Enter calls. SQL Server (and possible other sophisticated hosts) use this to detect deadlocks and prevent them from occurring. Unfortunately, I don’t know their policy for handling, but presumably it is a fair one whereby a victim is chosen at random and killed. This is consistent with the way SQL Server handles deadlocks pertaining to data transactional deadlocks. Chris Brumme’s weblog entry on Hosting has a plethora of related information.

Lock ordering and optimistic deadlock back-off

An old fashioned solution to this problem is to mentally tag all locks in your program, and ensure that you acquire them in a consistent manner. You could use a simple algorithm, such as “sort by variable name.” This works so long as you never alias a memory location. Oh, and so long as you don’t make a mistake when you’re writing the code (and anybody else who is touching your program). But this would be error prone and laborious. We can do better.

We could, for example, write a function that accepts a list of objects and does a few things in the process of locking on them:

  • Sorts the objects in identity order to ensure consistent lock acquisition ordering;
  • Uses a simple back-off strategy in case there are other lockers not using our ordered locking scheme.

The code might look like this:

static int deadlockWait = 15;

static bool EnterLocks(params object[] locks)
{
    return EnterLocks(-1, locks);
}

static bool EnterLocks(int retryCount, params object[] locks)
{
    // Clone and sort our locks by object identity.
    object[] locksCopy = (object[])locks.Clone();
    Array.Sort<object>(locksCopy, delegate(object x, object y)
    {
        int hx = x == null ? 0 : RuntimeHelpers.GetHashCode(x);
        int hy = y == null ? 0 : RuntimeHelpers.GetHashCode(y);
        return hx.CompareTo(hy);
     });

    // Now begin the lock acquisition process.
    bool successful = false;
    for (int i = 0; !successful && (retryCount == -1 || i < retryCount); i++)
    {
        successful = true;
        for (int j = 0; j < locksCopy.Length; j++)
        {
            try
            {
                if (!Monitor.TryEnter(locksCopy[j], deadlockWait))
                {
                    // We couldn't acquire this lock, ensure we back off.
                    successful = false;
                    break;
                }
            }
            catch
            {
                // An exception occurred--we don't know whether we got the last lock
                // or not. Assume we did. We indicate that by incrementing the counter.
                j++;
                successful = false;
                throw;
            }
            finally
            {
                if (!successful)
                {
                    for (int k = 0; k < j; k++)
                    {
                        try { Monitor.Exit(locksCopy[k]); }
                        catch (SynchronizationLockException) { /* eat it */ }
                    }
                    Thread.Sleep(0); // Might increase chances that a thread will steal a lock (good).
                }
            }
        }
    }

    return successful;
}

This method is actually sufficiently complex that it warrants a bit of discussion. Most of the complexity stems from our paranoia about orphaning locks coupled with the back-off algorithm. Notice that we first sort the list of locks, using the System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode method for comparisons (this function returns a unique hash code based on an object’s identity). We then use a loop to try acquisition of the locks. If an acquisition fails, we begin the back-off logic by unraveling any locks we had acquired previously, yielding the thread to increase the chance that another possibly deadlocked thread is able to make forward progress, and starting over again.

Of course, a real function would probably offer a timeout variant. The timout for the Monitor.TryEnter isn’t configurable, the retry Count is near meaningless to the user, and the routine is still subject to denial of service attacks whereby somebody grabs a lock and holds on to it forever. In that case, we’ll loop forever (unless an explicit retryCount is provided, it defaults to -1 which means infinite). We also need a similar, although much simpler, ExitLocks mechanism. I’ve omitted these implementations for brevity. Lastly, in the face of asynchronous aborts, this code falls on its face. Nevertheless, it demonstrates the concepts (I hope).

Cross call-stack ordering and back-off

Again, this strategy works only if you know all of your locks up front. With deep call-stacks, this may not be the case. For example, consider:

void f(bool b)
{
    if (b)
    {
        lock (a)
        {
             g(!b);
        }
    }
    else
    {
        lock (b)
        {
             g(!b);
        }
    }
}

void g(bool b)
{
    if (b)
    {
        lock (a)
        {
            // some atomic function
        }
    }
    else
    {
        lock (b)
        {
            // some other atomic function
        }
    }
}

If these were called from two parallel tasks, one task run as f(true) the other as f(false), you’d have a similar, although much more complex and difficult to follow, deadlock scenario. We might be able to (almost) solve this, too, however, with some really ugly hacks that I wouldn’t suggest anybody uses in real code. With that caveat, let’s take a look at them…

We could learn a thing or two from STM. If we performed only idempotent and reversible operations inside the atomic (lock protected block), you could imagine a more complex back-off strategy that spanned multiple levels of a call-stack. This requires you to make a lot of assumptions, use exceptions for control flow, and quite truthfully some unorthodox strategies (including polluting your thread with state). Moreover, without some form of transactional memory, rollback in the case of failure has to be done manually. These are in general bad practices, but the result seems to exhibit some redeeming qualities.

Here’s a big steaming pile of code that attempts to demonstrate a possible implementation:

static LocalDataStoreSlot atomicSlot;
static Par()
{
    atomicSlot = Thread.AllocateNamedDataSlot("AtomicContext");
}

internal class AtomicFailedException : Exception
{
    public AtomicFailedException() {}
}

internal class AtomicContext
{
    internal AtomicContext parent;
    internal List<object> toLock = new List<object>();
}

static bool DoAtomically(Action<object> action, params object[] locks)
{
    return DoAtomically(action, null, locks);
}

static bool DoAtomically(Action<object> action, Action<object> cleanup, params object[] locks)
{
    return DoAtomically(action, cleanup, 10, locks);
}

static bool DoAtomically(Action<object> action, Action<object> cleanup, int retryCount, params object[] locks)
{
    bool entered = false;

    // We have to maintain our context so that we can unravel the parent correctly.
    AtomicContext ctx = new AtomicContext();
    ctx.toLock.AddRange(locks);
    ctx.parent = (AtomicContext)Thread.GetData(atomicSlot);
    Thread.SetData(atomicSlot, ctx);
    try
    {
        for (int i = 0; !entered && i < retryCount; i++)
        {
            if (entered = EnterLocks(10, ctx.toLock.ToArray()))
            {
                bool retryRequested = false;
                try
                {
                    action(null);
                }
                catch (AtomicFailedException)
                {
                    if (cleanup != null)
                        cleanup(null);
                    retryRequested = true;
                }
                finally
                {
                    if (entered)
                        ExitLocks(locks);
                    if (retryRequested)
                        entered = false;
                }
            }
        }
    }
    finally
    {
        // Reset the context to what it was before we polluted it.
        AtomicContext cctx = (AtomicContext)Thread.GetData(atomicSlot);
        Thread.SetData(atomicSlot, cctx.parent);
        if (!entered && cctx.parent != null)
        {
            cctx.parent.toLock.AddRange(cctx.toLock);
            throw new AtomicFailedException();
        }
    }

    return entered;
}

The last overload is obviously the most complex, and the meat of the implementation. DoAtomically uses a back-off strategy not unlike the first EnterLocks function. In fact, it uses EnterLocks for lock acquisition. DoAtomically maintains a context of the locks that must be acquired, and can be chained such that there is a parent/child relationship between two contexts (representing multiple DoAtomically calls in a single call stack).

The function then goes ahead and attempts to acquire each object that much be locked. If it succeeds, it calls the delegate that was supplied as an argument. This delegate can likewise make DoAtomically calls which will recursively detect deadlocks and perform escalation if they occur. Note: there is some noise here. Because of the small timeout we use, a function that holds a lock for an extended period of time can give the impression of a deadlock. This number could probably use some tuning. Further, I haven’t tested the interaction between this code and non-DoAtomically code. Presumably, it would be more succeptable to livelock, but wouldn’t actually fail or deadlock (assuming the other code doesn’t mount a denial of service).

The escalation policy we use is to perform cleanup logic (since we tried to execute the action, there could be broken invariants that must be restored), mutate the parent context so that it will attempt to acquire the locks the child tried to acquire (and failed), and essentially unravel the stack to the parent (using an exception—ugh—I think continuations would make this a much prettier situation). The parent then tries to acquire its own locks in addition to the child locks that got escalated. This can be an arbitrarily nested call-stack, so a parent could end up with more than just a single child’s locks to acquire. But this ensures an entire call stack’s worth of locks are acquired in an ordered fashion, and furthermore backed off of all at once. The obvious downside to this approach is that you end up taking a coarser grained lock than necessary, but with the benefit of avoiding deadlocks.

Assuming all of the back off and retry succeeds, it will return a true indicating success. If it doesn’t, and it’s exhausted all of its retries and escalation space, the topmost atomic block will simply return false to indicate failure. Honestly, an exception in this case might be more appropriate.

An overly simple example

A small test function that uses this (sorry, I didn't have time to write up a more complex one), is as follows:

static int i = 0;
static object x = new object();
static object y = new object();

static void Main()
{
    List<Thread> ts = new List<Thread>();
    for (int j = 0; j < 20; j++)
    {
        Thread t = new Thread(new ThreadStart(delegate {
            DoAtomically(delegate {
                i++; Console.WriteLine("{0}, {1}", Thread.CurrentThread.ManagedThreadId, i); i--;
            }, x, y);
        }));
        ts.Add(t);
    }

    ts.ForEach(delegate (Thread t) { t.Start(); });
    ts.ForEach(delegate (Thread t) { t.Join(); });
}

Of course, all threads should print out the number 1.

A brief word on livelock

A quick word on livelock with the above design. With an escalation policy as defined above—your standard back-off, yield, and retry—it is highly susceptible to live-lock. This is a situation where code is trying to make progress, but chasing its own tail, or continually hitting conflicts. Consider what happens if a very long block and a very short block are competing in a deadlock fashion for the same resources.

The policy defined above will always back-off and retry, meaning that a short transaction has less work to do in order to perform its task. If the larger block is higher priority than the smaller one, we’re unfairly favoring the small block simply due to its size. But similarly, a long running block could acquire a lot of resources, and the smaller block could quickly try (and retry) to acquire locks, fail, and give up.

Lock leveling or some more intelligent queuing system might help out here. But I’ve written enough already.

Future topics

If you’re interested in a particular concurrency-related topic, let me know!

I’d like to spend more time in the future on:

  • Events and signaling;
  • Managing large groups of complex parallel tasks;
  • Implicit parallelism, e.g. using compiler code generation and IL rewriting;
  • STA, COM and UI programming, reentrancy;
  • More on livelock—it happens in a lot of contexts—and some ideas on how to solve them;
  • Lock free programming, and why you should avoid it.

Feedback will help me write about things you want to know about.

Happy hacking!

7/8/2005 4:55:10 PM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, July 02, 2005

Those guys on the VC++ team have been busy workers. In the Whidbey release of C++/CLI (was Managed C++), they've added a whole big batch of new features. The best part? Some of these are things you simply can't do in C#. Put another way, C++/CLI exposes a larger set of the underlying features that the CTS/CLI has to offer.

For example, want to create a ref type on the stack? Fine.

MyType mt("Foo");

On the GC heap? Alright.

MyType^ mt = gcnew MyType("Foo");

(Note if you're wondering "how'd they do that?" The answer is that the first case only has stack semantics. It still lives on the GC heap. In other words, it acts very similar to 'using' in C#, but it maps nicely to the C++ programmer's existing understanding of stack versus heap semantics.)

Similarly, did you want to maintain a typed reference to a boxed value type? Ok.

int^ i = gcnew int(5);

This compiles to IL which uses modopts to store typing and boxing information so that the runtime/JIT know how to treat it, and for the verifier so that it knows it's being used in a type-sound manner. Did you need a Nullable<int>? Nonsense! Just set your reference to null and you've got it:

int^ i = nullptr; // now it's null
i = gcnew int(5); // now it's not

Furthermore, with stack semantics for ref types, deterministic finalization is simple. Just write a destructor for your type (it gets compiled down to a Dispose method), and it gets invoked for you when leaving the scope. Just like the old C++ days. This means you can say:

{
    StreamReader sr(...);
    // do some stuff with stream
}

And sr gets disposed just prior to leaving the block scope. You can also create your own standard resource mgmt wrappers like that come with TR1 (e.g. tr1::shared_ptr<>). Using the terms that Rico Mariani came up with in a meeting a while back, you've got "the bang" and "the twiddle"...

!MyType() {} // the bang: a finalizer
~MyType() {} // the twiddle: a Dispose method

They've also implemented STL with full interoperability with Whidbey's generics.

They've also implemented OpenMP, a fairly ubiquitous shared memory parallelism library that I've been using a lot for research recently. Now they just need MPI and the world would be complete.

I'm using C++ for many things lately (mostly due to my Rotor work), and I have to say: as I use it more and more, I am starting to miss it. But admittedly I do sometimes prefer the cozy confines of managed code. C++/CLI enables me to nicely sit in between the two worlds, getting the best of both (and leaving behind the worst). There's a hell of a lot more to it than this post surfaces. Check it out.

Happy hacking!

7/2/2005 10:48:50 AM (Pacific Daylight Time, UTC-07:00)  #   

 Monday, June 27, 2005

I need some feedback from the community here.

You see, we've got a veeerry interesting late-game DCR that we're wrapping up this week. It was on the order of 5 weeks of development work to implement. In other words, not small. (DCR == "Design Change Request," i.e. not a bug. We're changing the design of a feature we've already implemented.) I'll be less secretive once I am able to. In fact, I can't wait to blog about this one in detail...

Anyhow.

We're fundamentally mucking with the type system. (Yes, this makes me quite queasy.) In doing so we've introducing a bit of an oddity. The unfortunate thing is that we won't have a Beta with this functionality included. So... software being the inexact science it is, I figured maybe I'd get a response or two from folks out there. Not quite the same, but better than nothing I proclaim!

Now that I've played it up, it's really quite simple. What if

(new T()).GetType() == typeof(T)

evaluated to false? How horrible would that be? Think about it. I believe this is a fairly fundamental invariant we preserve when spanning the static and dynamic type systems.

If we broke it, would you lose sleep? Hate us? Throw your copy of Whidbey into the trash compactor and pick up Java 5 instead? (Go back to COM, VB6, ... DOS programming perhaps?)

I'm being flippant. Mostly because I'm extraordinarily tired. But I really wonder.

6/27/2005 11:45:33 PM (Pacific Daylight Time, UTC-07:00)  #   

 Wednesday, June 22, 2005

Contrary to popular belief, I am still alive.

Is it me or has the whole blogsphere gone quiet over the past few weeks? Guess everybody I subscribe to is heads down shipping Whidbey.

I'm coming up on my 1yr anniversary with Microsoft. This is damned amazing, it really feels like only a few months. (Doesn't everybody say that?) It's been fun thus far, and I think it's only going to get better.

I can't come up with anything intelligent to say right now, so I'm just going go byte-byte (sic).

Byte-byte.

6/22/2005 9:08:22 PM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, June 14, 2005

I'm on a CLR Road Trip right now, which basically means a bunch of us on the team are out visiting customers. We're trying to get a clearer understanding of how folks use (or would like to use) our technology, and also to get feedback on our future direction. Despite the 19 hour day yesterday (no joke, 6:30am-1:30am--that doesn't even include any flying or anything, we flew in the night before!), I'm having a blast. We've done a couple of these in the past.

I gave a "CLR Internals" talk to the Vancouver .NET user group last night. Thanks to those who showed up! I found othis MSDN article which drills nicely into some of my main topics: http://msdn.microsoft.com/msdnmag/issues/05/05/JITCompiler/. I'll try to post a deck in the next few weeks.

We're travelling to Calgary tonight and will be talking at the Calgary .NET user group in the next couple days.

6/14/2005 8:21:05 AM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, June 05, 2005

I used to play guitar quite a bit. I also used to create a lot of industrial music using my computer with a whole host of techniques that ranged from sampling and messing with random non-sound binary files, writing programs to generate sounds, sample munging, and plain old recording. I was also in a metal band somewhere late in high school. We played around at local clubs (Worcester/Boston, MA), and released a tiny album that went nowhere. I did the lead guitar, some of the remastering, and a lot of the sampling that made it onto the record.

We broke up, and I dropped the guitar in favor of a keyboard. (I had dropped the keyboard in favor of the guitar mid-high school, so I was technically "returning to my roots.")

A couple weeks back, I picked up the guitar again. I gave most of my recording and guitar equipment to my brother (I saved a guitar and small amp for myself). So I went to Guitar Center and picked up a Cry Baby and Metal Box. I also still have the Acid and Sound Forge software (a couple versions behind now), so I'm a one man band again.

It's refreshing to strum away in an attempt to relearn the scales and random tabs I used to know by heart. And I'm hoping to create some more industrial tunes. The recent NiN release, I think, reminded me of how fun this can be.

6/5/2005 2:37:11 PM (Pacific Daylight Time, UTC-07:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy