RSS 2.0

Personal Info:

Joe Send mail to the author(s) works on parallel libraries, infrastructure, and programming models in Microsoft's Developer Division.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2008, Joe Duffy

 
 Thursday, January 17, 2008

Most schedulers try to keep the number of runnable threads (within a certain context) equal to the number of processors available.  Aside from fairness desires, there’s usually little reason to have more: and in fact, having more can lead to more memory pressure due to the cost of stacks and working set held on the stacks, non-pageable kernel memory, per-thread data structures, etc., and also has execution time costs due to increased pressure on the kernel scheduler, more frequent context switches, and poor locality due to threads being swapped in and out of processors.  In extreme cases, blocked threads can build up only for all of them to be awoken and released to wreak havoc on the system at once, hurting scalability.

A naïve approach of one-thread-per-processor works great until a thread on one of these processors blocks, either “silently” as a result of a pagefault or “explicitly” due to IO or a synchronization wait.  (I should mention that due to the plethora of hidden blocking calls in the kernel, Win32 user-mode code, and the .NET Framework, a lot of IO and synchronization waiting is “silent” too.)  In this case, a processor becomes idle (0% utilization) for some period of time.  If there is other work that could be happening instead, this is clearly bad.

Many programs spend most of their time blocked.

Four particular solutions to this problem are commonplace on Windows:

  1. Create more threads than processors and hope for the best.  This trades some amount of runtime efficiency for the insurance that processor time won’t go to waste.
  2. Periodically poll for blocked threads using some kind of daemon and respond to the presence of one by creating a new thread to execute work.  Eventually this thread would go away, for instance when the blocked thread awakens.  This is the approach used by the CLR ThreadPool, although it caps the total.  (The TPL uses this appraoch today also, but we're changing/augmenting it.)  For obvious reasons, this approach is quite flawed: you easily end up with more running threads than processors, have to trade-off more frequent polling--which implies more runtime overhead--with less frequent polling--which adds time to the latency in the scheduler’s response to a blocked thread.
  3. Block on an IO Completion Port at periodic intervals--e.g. when dispatching a new work item in a ThreadPool-like thing--which has the effect of throttling running threads.  This still requires creating more threads than processors, but helps to ensure few of them run at the same time.  Unfortunately, it still does lead to more of them actually running than you’d like since the port can only prevent a thread from running when it goes back and blocks on the port in the kernel.  But this is only done periodically.
  4. Specialized systems like SQL have used Fibers in the past to avoid needing full-fledged threads to replace the blocking ones.  To do this, they ensure all blocking goes through a cooperative layer, which notifies a user-mode scheduler (UMS).  The user-mode scheduler maintains a list of blocked Fibers, but can multiplex runnable Fibers onto threads, keeping the number of threads equal to the number of processors.  A thread effectively never blocks, Fibers do, but this requires all blocking to notify the UMS.  Aside from extraordinarily closed world systems, this approach doesn’t usually work.  That’s because Fibers are not threads and multiplexing entirely different contexts of work onto a shared pool of threads (at blocking points) can easily lead to thread affinity nightmares.

The CLR facilities #4 by funneling all synchronization waits in managed code through one point in the VM codebase.  This was done initially to ensure consistent message pumping on STA threads, via CoWaitForMultipleHandles.  But it was then exploited in 2.0 to expand the CLR Hosting APIs to enable custom hosts to hook all synchronization calls.  This is convenient for building interesting debugging tools, like deadlock detecting hosts.

A fifth approach is often viable and even preferable, and that is to avoid blocking altogether.  Often referred to as continuation passing style (CPS), the idea is that, where you’d normally have blocked, the callstack is transformed into a resumable continuation.  For an example of this, look at Jeff Richter’s ReaderWriterLockGate class: it’s a reader/writer lock with no blocking.  Asynchronous IO is supported by files and sockets on Windows, and enables a similar style of programming.  The continuation is ordinarily just a closure object that has enough state to restart itself when the sought-after condition arises.  When it does arise, the continuation is scheduled on something like the CLR ThreadPool.  This avoids burning any threads while the wait occurs.

For obvious reasons, CPS is usually hard to achieve in .NET: there is no language support for first class continuations in .NET, all synchronization primitives are wait-based, and keeping a whole stack around in memory would be a terrible idea anyway.  You’d also need to worry about resources held on the stack, including locks.  Instead, you should save only that state which is needed during the continuation.  In a message passing system this is much simpler, since most of the program is full of continuations in the form of message handlers.  For an example of such a system, check out the Concurrent and Coordination Runtime (CCR) and/or Erlang.

Even in message passing systems, it’s impossible to escape the fundamental blocking issue, since it is platform-wide.  And in ordinary imperative programs, the CPS transformation is near-impossible at the leaves of callstacks: unless you have whole program knowledge, who knows what your caller expects?  Most APIs are synchronous.  Futures and Promises potentially make this style of programming easier, though in the extreme all APIs would need to return a Promise rather than a true value.

Nothing conclusive, just some random thoughts ...

1/17/2008 3:29:24 PM (Pacific Standard Time, UTC-08:00)  #    Comments [6]

 

Recent Entries:

Search:

Browse by Date:
<August 2008>
SunMonTueWedThuFriSat
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...