RSS 2.0

Personal Info:

Joe Send mail to the author(s) is a lead architect on an OS incubation project at Microsoft, and was the architect for Parallel Extensions to .NET. He is an author and frequent speaker.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2010, Joe Duffy

 
 Saturday, May 16, 2009

A while back, I made a big stink about what appeared to be the presence of illegal load-load reorderings in Intel's IA32 memory model.  They specifically claim this is impossible in their documentation.  Well, last week I was chatting with a colleague, Sebastian Burckhardt, about this disturbing fact.  And it turned out he had recently written a paper that formalizes the CLR 2.0 memory model, and in fact treats this phenomenon with a great deal of rigor:

Verifying Compiler Transformations for Concurrent Programs
http://research.microsoft.com/pubs/76524/tr-2008-171-latest-03-11-09.pdf

To jog your memory, the problematic example is

X = 1;
r0 = X;
r1 = Y;

where both X and Y are shared memory locations, and r0 and r1 are processor registers.  According to Intel's IA32 memory model, two loads to different locations cannot reorder.  But it is completely possible for the load of X to be satisfied out of the store buffer, and for r1=Y to pass the store (thereby also passing the load r0=X).  This is a standard Dekker reordering, but the usual example consists of just { X = 1; r1 = Y }.

The key to modeling this is to turn an adjacent store-load affecting the same location into a single instruction.  Therefore, the above becomes something like:

r0 = 1;
X = r0;
r1 = Y;

Now it becomes entirely clear what has gone wrong.  I have yet to see a clear description of this phenomenon, but Sebastian's paper does a great job.

During the discussion, Sebastian showed me another disturbing four processor example:

P0        P1         P2        P3
==        ==         ==        ==
X = 1;    r0 = X;    Y = 1;    s0 = X;
          r1 = Y;              s1 = Y;

Is it possible, after all four processors complete, that { r0 == 1, r1 == 0 } and { s0 == 0, s1 == 1 }?  This seems ridiculous, given a memory model where loads cannot reorder.  It seems that no serializable execution should lead to this.  But let's look at one problematic interleaving.  First, we merge the instruction stream on P0 with P1, and also P2 with P3.  This effect could occur if these writes are in functions that end up running on the same processor, or running on a machine that shares functional units (like hyperthreading), hierarchies that share a cache, and so on.  We end up with:

P0/P1     P2/P3
=====     =====
X = 1     Y = 1;
r0 = X;   s0 = X;
r1 = Y;   s1 = Y;

Now let's permute these with the new rule introduced above in mind:

P0/P1     P2/P3
=====     =====
r0 = 1;   s0 = X;
r1 = Y;   s1 = 1;
X = r0;   Y = s1;

At this point, it should be obvious what the problematic reordering would be.  Let's continue merging these into a single execution order:

P0/P1/P2/P3
===========
r0 = 1; // #1
r1 = Y; // #0
s0 = X; // #0
s1 = 1; // #1
X = r0; // #1
Y = r1; // #1

The outcome?  { r0 == 1, r1 == 0 } and { s0 == 0, s1 == 1 }.  Whoops.

I have yet to observe this happening in practice, but models that permit store buffer forwarding are fundamentally vulnerable to this reordering.  The solution here is the same as with Dekker.  Marking the volatiles is insufficient: you need to insert full memory fences between the store-load adjacent pairs.

5/16/2009 10:31:07 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [3]

 

Recent Entries:

Search:

Browse by Date:
<May 2009>
SunMonTueWedThuFriSat
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...