RSS 2.0

Personal Info:

Joe Send mail to the author(s) is a lead architect on an OS incubation project at Microsoft, and was the architect for Parallel Extensions to .NET. He is an author and frequent speaker.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2010, Joe Duffy

 
 Thursday, January 26, 2006

Vance Morrison's excellent MSDN article from a few months back talks about why double checked locking is guaranteed to work on the CLR v2.0, and why it is one of the few safe lock-free mechanisms on the runtime. He also sent an email to the develop.com mailing list a while back explaining why this pattern wasn’t guaranteed to work on the ECMA memory model. We did quite a bit of implementation work and testing to tame the crazy memory model of IA-64 on 2.0. (Note that none of this is in the ECMA specification, so if you’re worried about CLI compatibility, beware.)

These modifications not only enable the double checked locking pattern, but also prevent constructors from publishing the newly allocated object before their state has been initialized, as I mentioned in my PDC presentation on concurrency last year. We accomplish this by ensuring writes have 'release' semantics on IA-64, via the st.rel instruction. A single st.rel x guarantees that any other loads and stores leading up to its execution (in the physical instruction stream) must have appeared to have occurred to each logical processor at least by the time x's new value becomes visible to another logical processor. Loads can be given 'acquire' semantics (via the ld.acq instruction), meaning that any other loads and stores that occur after a ld.acq x cannot appear to have occurred prior to the load. The 2.0 memory model does not use ld.acq’s unless you are accessing volatile data (marked w/ the volatile modifier keyword or accessed via the Thread.VolatileRead API). This can lead to some subtle problems.

For example, a slight variant of the double checked lock will not work under our model:

class Singleton {
    private static object slock = new object();
    private static Singleton instance;
    private static bool initialized;
    private Singleton() {}
    public Instance {
        get {
            if (!initialized) {
                lock (slock) {
                    if (!initialized) {
                        instance = new Singleton();
                        initialized = true;
                    }
                }
            }
            return instance;
        }
    }
}

You might have decided to use this pattern to determine whether to initialize a value-type, since checking the variable for null isn’t possible. If you had some more complex set of state, perhaps you want to use a single Boolean rather than checking, say, 10 separate variables to see if they have each been initialized. Whatever your reasoning, as written the above code is prone to a subtle race condition.

The problem here is that both reads of initialized and instance do not have 'acquire' semantics. Thus, instance could appear to have been read before initialized, e.g. as follows:

Time Thread A Thread B
0   Reads instance as null
1 Reads initialized as false
2 Sets instance to ref to new obj
3 Sets initialized to true
4 Uses instance (initialized)
5   Reads initialized as true
6   Uses instance (null!)

Thread B ends up returning a null reference. If a caller tried to use it, they might encounter a spurious NullReferenceException, the cause of which is incredibly hard to debug. For example:

void f() {
    Singleton s = Singleton.Instance;
    s.DoSomething(); // Boom!
}

For this to have happened, Thread B would have had to read instance entirely out of order. It might have done so for any number of reasons. If it recently executed some code that pulled it into cache—either directly or due to locality—it isn’t required to invalidate the cache with non-acquire reads, even though it observed a new write with release semantics, because it's as if the load was moved before the load of initialized. Or superscalar execution might perform branch prediction and retrieve the value of instance, assuming that initialized will be false, pulling it into cache ahead of the read of initialized. Again, because it is a non-acquire read, this is a valid thing to do. If it reads initialized as true, its prediction was actually correct, and it just returns the null value that was pre-fetched. It might even be the case that a compiler along the way moved the read, which is also entirely legal with our memory model.

One possible solution for this is to employ a volatile-read on the first read of the initialized variable, prohibiting the read of instance from moving prior to the read of initialized. Control dependency prevents us from having to use a volatile-read for the reads of both variables.

class Singleton {
    private static object slock = new object();
    private static Singleton instance;
    private static int initialized;
    private Singleton() {}
    public Instance {
        get {
            if (Thread.VolatileRead(ref initialized) == 0) {
                lock (slock) {
                    if (initialized == 0) {
                        instance = new Singleton();
                        initialized = 1;
                    }
                }
            }
            return instance;
        }
    }
}

You could have instead inserted a call to Thread.MemoryBarrier instead, which is a two way memory-fence, in between if-block and the read of instance, but the cost of a barrier is generally higher than both a st.rel and ld.acq because it affects surrounding instructions and movement in both directions.

The take-away here is not that you must understand the specifics of how cache coherency, speculative execution, and our memory model interact. Rather, it should be that once you venture even slightly outside of the bounds of the few "blessed" lock-free practices mentioned in the article mentioned above, you are opening yourself up to the worst kind of race conditions. Using locks is a simple way to avoid this pain. And hopefully someday in the future, transactional memory will enable performant execution of code with lock elision techniques that lead to the performance of lock-free code, but without any of the mental illness that such techniques have been proven to cause.

 

Recent Entries:

Search:

Browse by Date:
<February 2010>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
28123456
78910111213

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...