Saturday, May 28, 2005

My implementation of Software Transactional Memory (STM) on Rotor is coming along nicely. I mentioned it briefly here and here. I've taken a different approach than most, actually taking advantage of the JIT and EE to do my dirty work for me. Some prefer to stay inside the cozy confines of managed code, but I'd like to understand better the impact that my design might have on non-transactional code. I admit that my approach is likely not viable for real commercial use mostly due to its intrusive nature. Unless all code were transactional, which it's not. But this is partly what I'd like to understand better.

Most people confuse the idea of memory transactions with, say, database transactions. The concepts are very similar, but memory transactions are about dealing with concurrency at a much finer grained level (just your machine). Rather than using locks to protect shared memory, STM prefers to enable possibly conflicting operations to happen (inside a transaction), and then to resolve them by comparing transaction records to memory at commit time. The theory behind this work has been heavily influenced by database software, and the techniques it uses for its own memory consistency models. You can read more about STM on Tim Harris's site (MSR "smart dude"), e.g. these research papers.

My particular STM design affects the JIT, some core parts of the EE, requires new managed classes, and also a bunch of new FCalls. I'm writing up a paper on this project as I go, but in a few bullets here are the main points:

  • Blocks of code can be marked as "atomic", along with options such as what style of locking (optimistic, pessimistic), granularity (memory location/field, object), and retry semantics (0..n).
    • Atomic blocks get hoisted into methods of their own and annotated with a System.Threading.AtomicTransactionAttribute. This requires a hack to the C# compiler; for now, I've simply required that people write the method and tag it with [AtomicTransaction] on their own.
    • The runtime knows about [AtomicTransaction] and treats these methods differently (details below).
    • Note: I would like to understand how best to take advantage of and integrate with System.Transactions, but haven't gotten around to it yet.
  • The MethodTable in the EE now contains a second set of vtable slots.
    • This second vtable contains atomic versions of the JITted code. We'll see what this means later.
    • All of the atomic vtable slots get pre-populated with a special JIT stub that invokes the JIT with a flag to indicate it desires atomic semantics. If you never call a method atomically, the stub will never get replaced with the JITted code, so this might not be as heavyweight as you first thought.
    • Any methods marked with [AtomicTransaction] get the atomic JIT stub for their ordinary vtable slot too.
  • When the JIT is called with the atomic flag, it does a couple things out of the ordinary:
    • Method calls dispatch using the atomic vtable instead of the plain ole' vanilla vtable.
    • Any calls to ldfld, ldsfld, ldelem, ... get routed through a runtime helper. This helper knows about previous reads/writes in the same transaction by consulting our transaction records in TLS. If we have written, it retrieves that. Otherwise, it reads the underlying memory, records the value read, and loads the value.
    • Any calls to stfld, stsfld, stelem, ... get run through a similar runtime helper. This simply records the written value in the active transaction.
  • Then there are Commit/Rollback APIs which resolve conflicting read/writes, and actually writes any updates to the memory. It does this through a nonblocking algorithm (assuming optimistic locking) which is guaranteed to never deadlock.
    • If conflicts are found (memory value is different than the recorded read), it consults the transaction's retry options. If we haven't spent our retries left count, we loop back and attempt the transaction over again.
    • Otherwise, we blit any writes to memory, mark the transaction as committed, and move on.
    • Note: I also support nested transactions which changes the behavior of commit slightly. Firstly, when a new [AtomicTransaction] method is reached, we allocate a child transaction. Any reads must consult the parent chain for writes that parent transactions have recorded. Then during commits for children, we simply copy the contents of child reads/writes to the parent. The parent is then responsible for detecting consistency with memory when it decides to commit.
5/28/2005 1:20:06 PM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, May 27, 2005

What Video Game Character Are You? I am Pacman.I am Pacman.

I am an aggressive sort of personality, out to get what I can, when I can. I prefer to avoid confrontation, but sometimes when it's called for, I can be a powerful character. I tend to be afflicted with munchies constantly. What Video Game Character Are You?
5/27/2005 12:46:55 AM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, May 21, 2005

It's been a while since I've posted anything about my book.

I'm wrapping up my next chapter this weekend (I'm over half way done page-count-wise :P), and was interested in any opinions out there on the content. To reiterate: my goal is to cover CLR and FCL topics which haven't gotten a lot of love to date, with a focus on v2.0, while still painting a complete enough picture to be useful mostly on its own (with some previous experience with the technology of course). I think assemblies, loading internals, and how to deal with versioning problems are some that have been neglected (except for on Suzanne Cook's blog, of course).

Also, if you're interested in being a reviewer, drop me a line at joedu (at@at) bluebytesoftware (DOT.DOT) com. Please provide a brief bio of yourself so I know who's serious and who isn't.

Here's a rough layout. Comments welcome (uninteresting bullets, missed items, etc.):

Assemblies: Packaging & Deployment

  • Units of Execution & Reuse
    • A Look Inside Portable Executable (PE) Files
    • Strong Naming
    • Metadata, Metadata, Metadata
    • Manifests & Modules
    • Dynamic Assemblies
  • Asssembly Loading
    • Loading By Hand
    • Inside the Bind, Map, Load Process
    • Probing ("Fusion")
    • Custom Assembly Binding
    • Execution
      • EXE Bootstrapping
      • Other Hosts (IE, SQL Server "Yukon")
      • Code Sharing
      • AppDomain Isolation
  • Deployment & Configuration
    • XCopy
    • The GAC
    • ClickOnce
    • Configuration
  • Versioning
    • AppCompat
    • Rolling Forward
    • Versioning Libraries
    • Versioning Applications
    • Servicing Deployed Apps

 

5/21/2005 11:46:47 PM (Pacific Daylight Time, UTC-07:00)  #   

Structured exception handling and all of the challenges associated with it aren't new topics about which developers must worry. I was flipping through my copy of The C++ Programming Language (3rd Edition) this morning and ended up rediscovering Appendex E on C++ Standard-Library Exception Safety. Understanding what consistency and failure guarantees you intend to make with your library and then actually making them, knowing what to expect from a pre-written library and when and how to program defensively against it, and all such related topics were challenging with C++ and remain a challenge with C#. It's amazing how many of these things are directly transferrable between technologies.

He mentions in the book that all of the appendices are available on the web, so I did a quick search. Voila! Here it is [PDF].

Even if you're not a C++-er, you'll find a ton of great information in that article. Enjoy. I did (again).

5/21/2005 1:20:57 PM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, May 14, 2005

What is this program's output and why?

using System;

public class NullableTest
{

    static void Main()
    {
        int? x = null;
        object y = x;

        Console.WriteLine("Test #1: {0}", x == null);
        Console.WriteLine("Test #2: {0}", y == null);
        Console.WriteLine("Test #3: {0}", IsNull<int?>(x));
    }

    static bool IsNull<T>(T t)
    {
        return t == null;
    }

}

Is this intuitive, confusing, or <something else>?

5/14/2005 8:49:07 AM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, May 13, 2005

If you haven't been keeping an eye on Raymond and Rico's progress on the performance tuning of a Chinese/English dictionary, in particular comparisons between the unmanaged and managed version... Well... Go check it out!

I'm normally not one for mindless link propagation, but it's a fun and very interesting goings on.

5/13/2005 10:36:25 PM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, May 12, 2005

Wow. Something bad happened to my blog over the last 24 hours. First, random Service Unavailable messages. Then, my blog started serving up Kanji. Rather than debug the problem I decided to wipe it out and upgrade to dasBlog 1.7.

I'd actually intended to do this weeks ago. I had also intended to redesign the site. The old one was pretty heavyweight, crowded, and the homepage itself is a couple 100k in size. But because of the quick fix nature, I didn't get a chance to do much right now. It's fairly plain, but I like the simplicity. I'll try to shnazz it up at some point.

Now I just need to fix the broken images problem. Methinks my hosting company is throttling my bandwidth, and they might cut down on images as the first tier of throttling. I should probably thank them, as it might prevent me from spiking even higher and paying extra per MB/mo. Traffic on this site has increased a lot over the past few months, and I suspect I'm going to have to pay the extra $$$ for the additional bandwidth.

5/12/2005 9:38:22 PM (Pacific Daylight Time, UTC-07:00)  #   

 Wednesday, May 11, 2005

I was on an email thread with a customer earlier today, and the topic of fibers came up. More generally, the need to schedule lightweight tasks for simulated concurrent execution. The thread pool is fairly heavyweight, doesn’t offer much flexibility in the way of scheduling, and admittedly just doesn't quite cut it for many scenarios. We're looking hard at how best to improve the thread pool support in Orcas (that’s Whidbey + 1)--especially in the area of better control over the scheduling policy--so I'd love to get any feedback you might have.

This discussion got me thinking about how you can do some fancy shmancy hacks using C# 2.0 iterators to cook up a mutated form of simulated concurrent task execution.

Iterators & Coroutines

Coroutines enable coordination between multiple logical tasks all running in lock step in a manner quite similar to fibers. Tasks can run on the same physical thread, but operate as though they were concurrent units of execution, and mostly rely on good citizenship to avoid one unit starving another. A coroutine will execute for a small period of time, and finally yield a value of interest. Sometimes this value is just void/nil/etc, for example in cases where the coroutine is just running some side-effecting code and then yielding back to its caller to let it decide what the next step should be.

C# 2.0 ships with a form of coroutines. The iterator feature, described here, here and here, is an impressive piece of beautiful compiler hackery. It closure-transforms a block of code into a mini state machine—pushing locals into instance variables on the closure—so that an instance will " suspend" (or "yield") itself and can be "resumed" at well defined breakpoints. This is the traditional definition of a coroutine, although the implementation technique differs from many. There might be a few odd cases it doesn't support, but it's quite close. We can use these as a starting point.

Scheduling Tasks

If you're willing to trust that a coroutine will yield often enough, we can schedule a bunch of them so that they are able to run somewhat concurrently. Of course, this is "concurrent" in the sense that processes and threads on a single processor machine are concurrent. With enough coroutines that are doing interesting work, this starts to look attractive. Especially if the code inside of them is willing to yield just before doing a blocking operation, and generally abide by some courteous rules like not eating up too much time before yielding.

We would need a scheduler that maintains a queue of active coroutines, and some form of policy for letting them execute. Using a braindead simple algorithm, it could just walk the queue sequentially and execute, a form of round robin scheduling. At each step, it would dequeue the head coroutine, runs it until it yields, and then re-queue it and move to the next—unless it is done, in which case we just skip the re-queueing.

The fact that this is possible (and pretty simple) relies on a whole array of abstractions underneath us, all of which are easy to take for granted:

  • The "data stack" and "activation frame" for the coroutine is automatically saved and restored for us by the C# compiler's iterator transforms, almost like a mini thread preemption routine. This is awesome. Obviously, it's not the same in reality (only conceptually), and the call stack responds as though it were an ordinary function call.
  • The scheduler is running on a thread like any other piece of code which gets timesliced accordingly. We could have multiple pools of coroutines that are all executing concurrently, and in lockstep with each other.
  • The OS is conceptually doing a very similar thing below us, but at an extremely finer grained level.

One of the major downsides to using coroutines is this fashion that you can't preempt, and thus your ability to fairly timeslice is limited. The scheduling algorithm could use a heuristic which took into account average time allotment per yield for any given coroutine, and throttle time accordingly. If you knew a certain coroutine took 3x longer than most others last time, for example, you might only permit it to run 1/3 as much next time. But there's nothing to prevent a single coroutine from blocking indefinitely or hogging time from other tasks. A simple round robin algorithm probably wouldn't suffice for most requirements.

A First Cut Implementation: Grains

I wrote a whole bunch of code to implement what I describe above, and it works surprisingly well. I tried to produce a nice interface, but it still feels a little rough in some areas. I'm not sure whether anybody will benefit from seeing my arbitrary code postings (comments please!... I seem to get very few comments on these types of posts), but I'll share some of it. If folks want to see more, I'll post the whole bucket of code. I was thinking this would make a nice “official“ whitepaper, too. I named these things grains—get it: threads, fibers, grains, … clever, eh?—but they're really just scheduled coroutines.

So I'll try to walk through this quickly. We start with a simple Grain class. You can think of this as a parallel to the Thread class. It takes a start argument, and has a state (Stopped, Ready, Sleeping). This is the thing that's used to encapsulate a single activation of a grain which gets manipulated over time.

public class Grain

{

 

    // Fields

 

    private GrainState _lastKnownState = GrainState.Stopped;

    private GrainNext _suspendedState;

    private IEnumerator<GrainNext> _executor;

    private ManualResetEvent _completedWaitEvent = new ManualResetEvent(false);

 

    private GrainStart _start;

    private ParameterizedGrainStart _parameterizedStart;

    private IEnumerable<GrainNext> _enumerableStart;

 

    // Constructors

 

    public Grain(GrainStart start)

    {

        _start = start;

    }

 

    public Grain(ParameterizedGrainStart parameterizedStart)

    {

        _parameterizedStart = parameterizedStart;

    }

 

    public Grain(IEnumerable<GrainNext> enumerableStart)

    {

        _enumerableStart = enumerableStart;

    }

 

    public Grain(IEnumerator<GrainNext> executor)

    {

        _executor = executor;

    }

 

    // Properties

 

    public GrainState LastKnownState

    {

        get { return _lastKnownState; }

        internal set { _lastKnownState = value; }

    }

 

    public GrainNext SuspendedState

    {

        get { return _suspendedState; }

        internal set { _suspendedState = value; }

    }

 

    internal IEnumerator<GrainNext> Executor

    {

        get { return _executor; }

    }

 

    internal EventWaitHandle CompletedWaitEvent

    {

        get { return _completedWaitEvent; }

    }

 

    // Methods

 

    public void Start()

    {

        Start(null);

    }

 

    public void Start(object state)

    {

        // If one of the delegate starts were passed in, detect and execute to

        // get our IEnumerable<GrainNext> instance.

        if (_start != null)

            _enumerableStart = _start();

        else if (_parameterizedStart != null)

            _enumerableStart = _parameterizedStart(state);

 

        // Just create a new enumerator and use that as our "executor".

        if (_enumerableStart != null)

            _executor = _enumerableStart.GetEnumerator();

 

        // Lastly, if executor is null, we're hosed. Just fail now.

        if (_executor == null)

            throw new ArgumentNullException(

                "Executor is null. Either your start method returned null or you passed in null.");

    }

 

    public void Join()

    {

        _completedWaitEvent.WaitOne();

    }

 

}

In that code, I reference a couple things. First, the GrainStart and ParameterizedGrainStart types are delegates which are quite similar to the ThreadStart and ParameterizedThreadStart types. They refer to a method used to kick off a coroutine as soon as it gets run by the scheduler.

public delegate IEnumerable<GrainNext> GrainStart();

 

public delegate IEnumerable<GrainNext> ParameterizedGrainStart(object obj);

These delegates return IEnumerable<GrainNext>, which is the target iterator function. GrainNext is a little tricky to get your head around at first. It's a delegate that, when executed, tells the scheduler what to do next. It tells the schedule whether the grain is ready to run, is sleeping waiting for an event, or is done executing. This state is represented by the GrainState enum. There's also a GrainStateFactory static class that helps to generate the simple Ready and Stopped cases. Remember I said coroutines can sometimes yield a void when it has nothing of interest to report? Well, the coroutine is simply yielding an instruction to the scheduler as to what it wants to do next. If the iterator just exits normally, the scheduler will assume this means it is normally terminating. Also one quick thing to note: if an unhandled exception escapes a grain, it tears down the whole scheduler. Not sure the best approach to handle this.

public delegate GrainState GrainNext();

 

public enum GrainState

{

    Ready,

    Sleeping,

    Stopped

}

 

public static class GrainStateFactory

{

    public static readonly GrainNext Ready =

        delegate { return GrainState.Ready; };

    public static readonly GrainNext Stopped =

        delegate { return GrainState.Stopped; };

}

You might be wondering why this is done through a delegate. It seems odd, I agree. Why doesn't the enumerator just return GrainNext? Well, for one I’m a functional junkie and I’m imposing my preference for lazy evaluation on my readers. ;) But truthfully, I thought it was a cool feature that you could use the return delegate to inspect some predicate and determine the next step based on that. This way if you are sleeping a grain, you can check for the wake up condition in the returned delegate and, if it's true, return Ready; otherwise, you just return Sleeping, and the same predicate gets checked next time the scheduler is ready to run the grain. Would love feedback here.

The Round Robin Grain Scheduler

So obviously the Grain doesn’t do much good without some form of scheduler. So I wrote a lot of code here, too. Basically, a scheduler gets hosted in its own thread (hmm, or perhaps it, too, could be a grain!). It is basically a big while(true) loop that consumes things off the queue as they arrive.

So we start with an abstract GrainPool class with some common methods, but is void of any specific scheduling policy. We’ll skip this guy for now, it’s not terribly interesting. On to the actual implementation. The idea is that we can have a whole bunch of scheduling policies out of the box, but I'm tired right now… So I only spent time to implement a round robin scheduler.

The code here is fairly lengthy. But it should be pretty easy to follow. It just loops round and round, processing grains in the queue until it gets shut down.

public class RoundRobinGrainPool : GrainPool

{

 

    private Queue<Grain> _queue = new Queue<Grain>();

 

    protected override void SchedulerLoop()

    {

        try

        {

            bool exitRequested = false;

            while (!exitRequested)

            {

                Grain g = null;

 

                // Dequeue the first item in the queue, or wait until one arrives.

                while (g == null && !_isInterruptionRequested)

                {

                    lock (_poolLock)

                    {

                        if (_queue.Count == 0)

                        {

                            if (_isShutdownRequested)

                            {

                                // The queue has been emptied, process shutdown now.

                                break;

                            }

                            else

                            {

                                // Wait for a new item to arrive. 250ms might need tuning...

                                // it could be overly "spinny".

                                Monitor.Wait(_poolLock, 250);

                            }

                        }

                        else

                        {

                            // Dequeue the item at the head, and jump out of the loop.

                            g = _queue.Dequeue();

                            break;

                        }

                    }

                }

 

                //HACK: Due to shutdown, g could be null. If there are ever other new ways

                //to exit the block w/out setting g, this code will catch it.

                if (g == null)

                {

                    exitRequested = true;

                    continue;

                }

 

                // If we last knew that the grain was asleep, give it a chance to indicate

                // whether it is ready to run or not.

                if (g.LastKnownState == GrainState.Sleeping)

                {

                    g.LastKnownState = g.SuspendedState();

                    if (g.LastKnownState == GrainState.Stopped)

                    {

                        // It reported that it has stopped--move on to the next grain.

                        g.CompletedWaitEvent.Set();

                        continue;

                    }

                }

 

                // So long as the grain isn't sleeping, we give it a chance to run.

                if (g.LastKnownState != GrainState.Sleeping)

                {

                    if (g.Executor.MoveNext())

                    {

                        // The grain reported that it's either sleeping or ready to be

                        // run again. This is fine & dandy, just update its state and prepare

                        // it for its next round of execution.

                        g.SuspendedState = g.Executor.Current;

                        g.LastKnownState = g.SuspendedState();

                    }

                    else

                    {

                        // It's done executing. Update its state and prepare to ditch it.

                        g.SuspendedState = GrainStateFactory.Stopped;

                        g.LastKnownState = GrainState.Stopped;

                    }

 

                    if (g.LastKnownState == GrainState.Stopped)

                    {

                        // It reported that it was done. Move on to the next one.

                        g.CompletedWaitEvent.Set();

                        continue;

                    }

                }

 

                // If we got here, the grain still has execution left in it. Schedule it up

                // for the next loop.

                lock (_poolLock)

                {

                    _queue.Enqueue(g);

                }

            }

        }

        finally

        {

            // Notify waiter that we saw the interruption request.

            _shutdownEvent.Set();

        }

    }

 

    public override void QueueGrain(Grain g)

    {

        // Check that our grain is in a ready state.

        if (g.LastKnownState == GrainState.Stopped)

        {

            // Start will throw an exception if the grain is corrupt. We let

            // it fly, no problems here.

            g.Start();

        }

 

        // Put 'em in the queue and notify the scheduler that it's ready.

        lock (_poolLock)

        {

            _queue.Enqueue(g);

            Monitor.PulseAll(_poolLock);

        }

    }

 

    public override void DequeueGrain(Grain g)

    {

        // Just copy our internal queue to a new version, leaving out the grain

        // which was requested to be removed.

        lock (_poolLock) //BUGBUG: This coarse lock sucks big time.

        {

            Queue<Grain> copy = new Queue<Grain>(_queue.Count * 2);

            while (_queue.Count != 0)

            {

                Grain deq = _queue.Dequeue();

                if (!deq.Equals(g))

                    copy.Enqueue(deq);

            }

            _queue = copy;

        }

    }

 

    public override IEnumerator<Grain> GetEnumerator()

    {

        // Lock to ensure consistency in the data structure while we

        // snapshot the enumerator.

        lock (_poolLock)

        {

            return _queue.GetEnumerator();

        }

    }

 

}

Should really spend more time discussing this bit, but if you have specific questions or items which aren’t clear, please ask!

Usage Example

Again, running out of steam, so I didn't cook up any realistic examples. But this little snippet does demonstrate how it works from a user’s perspective. I did try to make the APIs half-decent and at least a little straightforward to use.

I create a new thread that owns running the grain scheduler, and then send off a couple coroutines for execution. I ask for a normal shutdown (not an interruption, so it waits for the grains to complete), and then we’re done. I'm going to try and get some better examples over the next couple days.

class Program

{

 

    static void Main()

    {

        GrainPool pool = GrainPool.DefaultPool;

 

        // Create a new Thread which hosts the grain pool.

        Console.WriteLine("### Starting grain host...");

        Thread t = new Thread(pool.Start);

        t.Start();

 

        // Now schedule some work for the grain pool.

        Console.WriteLine("### Scheduling grains...");

        pool.QueueGrain(new Grain(GrainFuncs.Fibonacci));

        pool.QueueGrain(new Grain(GrainFuncs.Random));

 

        // Lastly, wait for our grains to be done.

        Console.WriteLine("### Waiting for completion & shutting down the pool...");

        pool.Shutdown(true);

 

        Console.WriteLine("### Joining the thread...");

        t.Join();

 

        Console.WriteLine("### Exited normally");

    }

 

}

 

static class GrainFuncs

{

 

    internal static IEnumerable<GrainNext> Fibonacci()

    {

        int n0 = 0;

        int n1 = 1;

 

        int n;

        do

        {

            n = n0 + n1;

            n0 = n1;

            n1 = n;

            Console.WriteLine("[Fib: {0}]", n);

            yield return GrainStateFactory.Ready;

        }

        while (n < 1000);

 

        Console.WriteLine("[Fib: {0}**FINAL**]", n);

    }

 

    internal static IEnumerable<GrainNext> Random()

    {

        Random r = new Random();

 

        for (int i = 0; i < 15; i++)

        {

            Console.WriteLine("[Rand: {0}]", r.Next());

            yield return new GrainNext(delegate

            {

                if (r.Next(2) == 1)

                    return GrainState.Ready;

                else

                    return GrainState.Sleeping;

            });

        }

    }

 

}

5/11/2005 12:34:40 AM (Pacific Daylight Time, UTC-07:00)  #   

 Monday, May 09, 2005

I wrote up a wait-free atomic stack which remains consistent in the face of multi-threaded scenarios. It uses interlocked operations for pushes and pops. Unfortunately, I didn't manage to beat out the BCL's Stack<T> when combined with monitor locks in a tight insertion loop, but I did beat out our LinkedList<T> using monitors. I admit, I'm using pretty coarse grained locks in my tests, but I'm confident I'd still beat it out with a finer grained lock (anybody want to verify? I'm at ~69% of its execution with the big lock).

Overview

Instead of pasting the entire chunk of code here, I'll walk through one method at a time. The full source is available here. The class is named InterlockedStack<T>, and it implements IEnumerable<T>. Nothing fancy like IList<T> or ICollection<T>, mostly because I didn't want to have to worry about the guarantees those interfaces require you to make.

public class InterlockedStack<T> : IEnumerable<T>

{

 

I wrote the class using a linked list backing store, mostly because I couldn't think of a clever way to guarantee atomic adds and removes otherwise. This hit perf negatively as summarized in the closing paragraphs.

The Code

So I have a self explanatory inner class ListNode<U>:

    // Crummy little data structure to hold a linked cell.

    private class ListNode<U>

    {

        internal ListNode<U> Next;

        internal U Value;

        internal ListNode(U value) { this.Value = value; }

    }

And a few fields:

    private ListNode<T> head;

    private int version;

 

    internal int pushMisses;

    internal int popMisses;

Head is the top of the stack, and version is just an incrementing counter which I use for some dirty read about the list's consistency during enumeration. PushMisses and PopMisses have been useful in my debugging to find out how many times an interlocked operation failed and had to retry due to an inconsistent read/write.

The two meaty methods are Push and Pop. They're pretty well commented, so hopefully not so much explanation is necessary. First, Push:

    public void Push(T item)

    {

        ListNode<T> newHead = new ListNode<T>(item);

        ListNode<T> newNext;

 

        // Note: we could have avoided the extra break jump with a do { } while (x) loop. But

        // we need the extra logic at the end for debugging push misses.

        while (true)

        {

            // Point our new node to the current head.

            newNext = head;

            newHead.Next = newNext;

 

            // The current head might get updated between this snapshot and the next C&S. If

            // it does, we just loop around and try again.

            if (Interlocked.CompareExchange(ref head, newHead, newNext) == newNext)

            {

                // Btw, we don't guarantee that the version will be +1 from the entrance to this

                // routine, nor do we guarantee atomicity with respect to the actual head swap.

                // Our guarantee is simply that it gets incremented roughly when the changes commit.

                Interlocked.Increment(ref version);

                break;

            }

 

            Interlocked.Increment(ref pushMisses);

        }

    } 

It accepts an item, and goes into a while loop trying to add an item consistently. If it fails, it just circles back 'round and tries again. Notice that the only guarantee that we make with regards to version is that it gets incremented at roughly the same time the commit happens. This opens up a tiny race condition with the enumerator and count methods that I'm willing to live with.

Pop takes a similar approach:

    public T Pop()

    {

        bool garbage; // Garbage in, garbage out...

        return Pop(out garbage);

    }

 

    public T Pop(out bool isEmpty)

    {

        ListNode<T> popped;

 

        while (true)

        {

            // Snap a copy of the current head.

            popped = head;

 

            if (popped == null)

            {

                // Simple case: the list is emptied (subject to races, we could in theory

                // loop forever checking whether something got added during the window--probably

                // not wise :) ). Just signal emptiness and return the item.

                isEmpty = true;

                return default(T);

            }

 

            // Try to C&S the head with our popped item's next pointer. This might fail as a result

            // of concurrent updates. That's OK, we just spin around the loop and try again.

            ListNode<T> next = popped.Next;

            if (Interlocked.CompareExchange(ref head, next, popped) == popped)

            {

                // See version increment notes for Push. The same conditions apply here.

                Interlocked.Increment(ref version);

 

                // Setting next to null helps people who have a dirty copy of this node. It might

                // help them do the right thing, but it's probably frivolous.

                popped.Next = null;

 

                break;

            }

 

            Interlocked.Increment(ref popMisses);

        }

 

        isEmpty = false;

        return popped.Value;

    } 

You'll notice I provide two overloads for Pop--one which accepts an out bool parameter that gets set to true if the stack is empty, the other which doesn't. In either case, default(T) will be returned if the pop was empty. This is useful when you're using ref types, as you can just check for null without dealing with an out param. The BCL Stack<T> actually throws if you try to pop an empty stack. I don't know which I prefer.

Lastly, the enumerator and count methods. They're not fully thread-safe, although the only harm is the possibility of stale data. They're still useful for approximations of a point in time. You won't see any corruption, it's just that the way that they operate doesn't take into account all of the subtle races inherent in the data structure. This is just fine. Push and Pop are consistent, which was the #1 priority.

Performance Summary

I wrote a few perf harnesses. Nothing too fancy. If you want to see more, just check out the code that I linked to earlier. The results of doing 10 million inserts into a list across 10 threads on a single-proc box (ugh, would love to run this on a dual/quad) are:

  • Stack<T> with Push wrapped in Monitor.Enter/Exit: ~1125ms (1.0x)
  • LinkedList<T> with AddHead wrapped in Monitor.Enter/Exit: ~5175ms (4.6x)
  • InterlockedStack<T> with Push: ~3584ms (3.2x)

These are the average #s for 3 runs.

I think it's obvious why Stack<T> beats out my implementation, but I'll summarize as follows:

  • Stack<T> uses a fixed size array internally, so aside from the ocassional growth, adding an item is a simple case of a stelem and increment of a field. Stack<T>'s default growth policy seems to handle this type of testing, probably at the expense of unneeded space.
  • I have to heap allocate a new ListNode<T> for each insertion. This adds a whole lot of overhead to the process of adding a node, returning an existing one, or even just inspecting the head.
  • Accessing items is quicker with the Stack<T>, too, since it's just an index into the array to get the native data type or reference to it (hoorah generics), whereas my solution needs to do an extra dereference just to get to the native data/reference.

Summary

I'm not sure why you wouldn't just go with Stack<T> plus a monitor based on the numbers. Granted, this is a pretty contrived micro-benchmark, but still offers some evidence that this is the right way to go.

I think that on computers more capable of parallelism, you might see a win with the InterlockedStack<T>. Why? Because the interlocked operations are very fine grained, and might increase concurrency due to the ratio of the cost of contention to memory allocation increasing. The low push and pop misses I'm seeing in my testing is an indication that this is true. I'd love if somebody out there could verify this or share their thoughts.

5/9/2005 11:05:39 PM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, April 28, 2005

I'm not doing a good job keeping up with the ~150 blogs to which I subscribe. 1,000s of posts behind (17,440/36,207). Scoble, man, I don't know how you do it.

So as a quick fix, I decided to refactor the list to aggregate my current Top 20. The way I decided this is, given my reading habits and interests, what 20 people's posts would I be most likely to read upon noticing a new entry? Obviously, the reasoning differs from person to person.

Figured I'd share it out in case you were missing one of them (btw, my blogroll's always available on the RHS of the site).

In ascending alpha-numeric order; my top 5 are bolded:

4/28/2005 9:34:08 PM (Pacific Daylight Time, UTC-07:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy