RSS 2.0

Personal Info:

Joe Send mail to the author(s) works on parallel libraries, infrastructure, and programming models in Microsoft's Developer Division.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2008, Joe Duffy

 
 Monday, May 09, 2005

I wrote up a wait-free atomic stack which remains consistent in the face of multi-threaded scenarios. It uses interlocked operations for pushes and pops. Unfortunately, I didn't manage to beat out the BCL's Stack<T> when combined with monitor locks in a tight insertion loop, but I did beat out our LinkedList<T> using monitors. I admit, I'm using pretty coarse grained locks in my tests, but I'm confident I'd still beat it out with a finer grained lock (anybody want to verify? I'm at ~69% of its execution with the big lock).

Overview

Instead of pasting the entire chunk of code here, I'll walk through one method at a time. The full source is available here. The class is named InterlockedStack<T>, and it implements IEnumerable<T>. Nothing fancy like IList<T> or ICollection<T>, mostly because I didn't want to have to worry about the guarantees those interfaces require you to make.

public class InterlockedStack<T> : IEnumerable<T>

{

 

I wrote the class using a linked list backing store, mostly because I couldn't think of a clever way to guarantee atomic adds and removes otherwise. This hit perf negatively as summarized in the closing paragraphs.

The Code

So I have a self explanatory inner class ListNode<U>:

    // Crummy little data structure to hold a linked cell.

    private class ListNode<U>

    {

        internal ListNode<U> Next;

        internal U Value;

        internal ListNode(U value) { this.Value = value; }

    }

And a few fields:

    private ListNode<T> head;

    private int version;

 

    internal int pushMisses;

    internal int popMisses;

Head is the top of the stack, and version is just an incrementing counter which I use for some dirty read about the list's consistency during enumeration. PushMisses and PopMisses have been useful in my debugging to find out how many times an interlocked operation failed and had to retry due to an inconsistent read/write.

The two meaty methods are Push and Pop. They're pretty well commented, so hopefully not so much explanation is necessary. First, Push:

    public void Push(T item)

    {

        ListNode<T> newHead = new ListNode<T>(item);

        ListNode<T> newNext;

 

        // Note: we could have avoided the extra break jump with a do { } while (x) loop. But

        // we need the extra logic at the end for debugging push misses.

        while (true)

        {

            // Point our new node to the current head.

            newNext = head;

            newHead.Next = newNext;

 

            // The current head might get updated between this snapshot and the next C&S. If

            // it does, we just loop around and try again.

            if (Interlocked.CompareExchange(ref head, newHead, newNext) == newNext)

            {

                // Btw, we don't guarantee that the version will be +1 from the entrance to this

                // routine, nor do we guarantee atomicity with respect to the actual head swap.

                // Our guarantee is simply that it gets incremented roughly when the changes commit.

                Interlocked.Increment(ref version);

                break;

            }

 

            Interlocked.Increment(ref pushMisses);

        }

    } 

It accepts an item, and goes into a while loop trying to add an item consistently. If it fails, it just circles back 'round and tries again. Notice that the only guarantee that we make with regards to version is that it gets incremented at roughly the same time the commit happens. This opens up a tiny race condition with the enumerator and count methods that I'm willing to live with.

Pop takes a similar approach:

    public T Pop()

    {

        bool garbage; // Garbage in, garbage out...

        return Pop(out garbage);

    }

 

    public T Pop(out bool isEmpty)

    {

        ListNode<T> popped;

 

        while (true)

        {

            // Snap a copy of the current head.

            popped = head;

 

            if (popped == null)

            {

                // Simple case: the list is emptied (subject to races, we could in theory

                // loop forever checking whether something got added during the window--probably

                // not wise :) ). Just signal emptiness and return the item.

                isEmpty = true;

                return default(T);

            }

 

            // Try to C&S the head with our popped item's next pointer. This might fail as a result

            // of concurrent updates. That's OK, we just spin around the loop and try again.

            ListNode<T> next = popped.Next;

            if (Interlocked.CompareExchange(ref head, next, popped) == popped)

            {

                // See version increment notes for Push. The same conditions apply here.

                Interlocked.Increment(ref version);

 

                // Setting next to null helps people who have a dirty copy of this node. It might

                // help them do the right thing, but it's probably frivolous.

                popped.Next = null;

 

                break;

            }

 

            Interlocked.Increment(ref popMisses);

        }

 

        isEmpty = false;

        return popped.Value;

    } 

You'll notice I provide two overloads for Pop--one which accepts an out bool parameter that gets set to true if the stack is empty, the other which doesn't. In either case, default(T) will be returned if the pop was empty. This is useful when you're using ref types, as you can just check for null without dealing with an out param. The BCL Stack<T> actually throws if you try to pop an empty stack. I don't know which I prefer.

Lastly, the enumerator and count methods. They're not fully thread-safe, although the only harm is the possibility of stale data. They're still useful for approximations of a point in time. You won't see any corruption, it's just that the way that they operate doesn't take into account all of the subtle races inherent in the data structure. This is just fine. Push and Pop are consistent, which was the #1 priority.

Performance Summary

I wrote a few perf harnesses. Nothing too fancy. If you want to see more, just check out the code that I linked to earlier. The results of doing 10 million inserts into a list across 10 threads on a single-proc box (ugh, would love to run this on a dual/quad) are:

  • Stack<T> with Push wrapped in Monitor.Enter/Exit: ~1125ms (1.0x)
  • LinkedList<T> with AddHead wrapped in Monitor.Enter/Exit: ~5175ms (4.6x)
  • InterlockedStack<T> with Push: ~3584ms (3.2x)

These are the average #s for 3 runs.

I think it's obvious why Stack<T> beats out my implementation, but I'll summarize as follows:

  • Stack<T> uses a fixed size array internally, so aside from the ocassional growth, adding an item is a simple case of a stelem and increment of a field. Stack<T>'s default growth policy seems to handle this type of testing, probably at the expense of unneeded space.
  • I have to heap allocate a new ListNode<T> for each insertion. This adds a whole lot of overhead to the process of adding a node, returning an existing one, or even just inspecting the head.
  • Accessing items is quicker with the Stack<T>, too, since it's just an index into the array to get the native data type or reference to it (hoorah generics), whereas my solution needs to do an extra dereference just to get to the native data/reference.

Summary

I'm not sure why you wouldn't just go with Stack<T> plus a monitor based on the numbers. Granted, this is a pretty contrived micro-benchmark, but still offers some evidence that this is the right way to go.

I think that on computers more capable of parallelism, you might see a win with the InterlockedStack<T>. Why? Because the interlocked operations are very fine grained, and might increase concurrency due to the ratio of the cost of contention to memory allocation increasing. The low push and pop misses I'm seeing in my testing is an indication that this is true. I'd love if somebody out there could verify this or share their thoughts.

5/9/2005 11:05:39 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [4]

 

Recent Entries:

Search:

Browse by Date:
<May 2005>
SunMonTueWedThuFriSat
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...