RSS 2.0

Personal Info:

Joe Send mail to the author(s) is the lead developer and architect for Parallel Extensions to .NET, tinkers with type systems, and is an author and speaker.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2009, Joe Duffy

 
 Friday, March 09, 2007

The CLR commits the entire reserved stack for managed threads.   This by default is 1MB per thread, though you can change the values with compiler settings, a PE file editor, or by changing the way you create threads.  We've been having a fascinating internal discussion on the topic recently, and I've been surprised how many people were unaware that the CLR engages in this practice.  I figure there's bound to be plenty of customers in the real world that are also unaware.

Let's see some pages

This behavior can be seen quite easily by breaking into a debugger (like WinDbg) and inspecting the status of the virtual memory pages comprising a thread's stack.  For example, from WinDbg the !teb command will show you the highest stack address (StackBase) and !vadump will show you the status of all pages in the process's address space.  From this you can see that the relevant stack pages are in the state MEM_COMMIT rather than MEM_RESERVE.

Here's a quick example taken from a sample managed program.  I've broken right inside the Main function and will dump the TEB:

0:000> !teb
TEB at 000007fffffde000
    ExceptionList:        0000000000000000
    StackBase:            0000000000190000
    StackLimit:           0000000000189000
    ...

Based on this information, combined with the fact that we know managed thread stacks are 1MB by default, we can determine what memory addresses to look for: we subtract 100000 (1MB) from 190000 (StackBase) to arrive at the base address of the stack pages: 90000.  Now we dump the virtual memory pages:

0:000> !vadump

...
(1)
BaseAddress:       0000000000090000
RegionSize:        0000000000001000
State:             00002000  MEM_RESERVE
Type:              00020000  MEM_PRIVATE

(2)
BaseAddress:       0000000000091000
RegionSize:        00000000000f0000
State:             00001000  MEM_COMMIT
Protect:           00000004  PAGE_READWRITE
Type:              00020000  MEM_PRIVATE

(3)
BaseAddress:       0000000000181000
RegionSize:        0000000000001000
State:             00002000  MEM_RESERVE
Type:              00020000  MEM_PRIVATE

(4)
BaseAddress:       0000000000182000
RegionSize:        0000000000007000
State:             00001000  MEM_COMMIT
Protect:           00000104  PAGE_READWRITE + PAGE_GUARD
Type:              00020000  MEM_PRIVATE

(5)
BaseAddress:       0000000000189000
RegionSize:        0000000000007000
State:             00001000  MEM_COMMIT
Protect:           00000004  PAGE_READWRITE
Type:              00020000  MEM_PRIVATE
...

That summarizes the stack memory.  But what does it all mean?  I've labeled the individual regions above with numbers so I can reference them.  And, remember folks, the stack grows downward in the address space, so we'll discuss them in reverse order:

5. The actively used portion of the stack.  Notice that the BaseAddress equals the thread's current StackLimit, and that its BaseAddress+RegionSize equals StackBase.  The thread is actively using stack memory only within this region.

4. The guard portion of the stack.  When an attempt is made to write to an address within this range (i.e. as the thread's stack grows by virtue of the program calling functions, stackalloc'ing, etc.), the memory's guard status is cleared and a fault is triggered.  The OS traps this fault, and responds by committing additional guard region and then resuming at the faulting instruction.  What used to be the guard page has now become part of 5, and the program can continue on its merry old way.  (Assuming there is room to commit another guard region; if not, stack overflow ensues.)  A couple things are worth noting.  Because the CLR commits the entire stack, the OS doesn't really have to "commit" the memory: it just annotates the next region as the new guard.  Also notice that the guard in this program is 28KB in size.  Normally the guard is just a single page, but the CLR uses SetThreadStackGuarantee to increase the amount of committed stack we are guaranteed to have at any point in time, at least on OSes that support it.  This makes responding to stack overflow easier.

3. This is often referred to as the "hard guard page".  If you try to write to this, the OS rips down your process.  In the wink of an eye, it’s gone, no callbacks, no nice Dr. Watson dumps, it just disappears.  As guard pages are committed, this page is moved so that it's just beyond the guard region.  I don't know precisely how this happens w/out having to commit more memory (since it's marked MEM_RESERVE), but I suspect the OS just magically rearranges some page table information.

2. This is the rest of the stack.  It hasn't been used yet, and it's not part of the guard region.  This is where you'll see a difference between a managed thread's stack and a native thread's stack: the pages are marked MEM_COMMIT for managed code, whereas they'd be MEM_RESERVE for native.

1. This is the final destination of the hard guard page, after the whole stack has been committed and the guard rests just before it at the end of the stack, this page will always remain.  It is treated as a separate MEM_RESERVED page and will never be committed.

One additional thing is worth noting.  When the CLR pre-commits the whole stack, it uses VirtualAlloc to do so.  This leaves the guard page close to the bottom of the stack, the hard guard page just behind it, and the StackLimit in the TEB, set to the address where the guard page ends.  This surprises some people, i.e. they expect to see a StackLimit set to, say, 91000 in the above example.  But remember, the OS doesn’t get involved at all in our pre-committing of the stack.

Method to the madness

Why in the heck would we do all of this?

Believe it or not, there is a method to the madness.  When the OS tries to commit a new guard region, it can fail.  It won’t fail due to insufficient virtual memory space (not that such things would matter much on X64 anyhow), because the memory is already reserved.  We can handle those cases just fine.  Rather, it might fail if there is insufficient pagefile backing to commit the memory.  This would manifest as a stack overflow.  Sadly, at the point of this stack overflow, the CLR’s vectored exception handler (which responds to ERROR_STACK_OVERFLOW) would then have only the guard region’s worth of stack space in which to do anything reasonably intelligent.  (Which, recall, was traditionally one page.)  The unhosted CLR just has to issue a failfast in this case, but it also wants to do things like create a Windows Event Log entry, play nicely with Dr. Watson and debuggers, and so on.

This is also required for hosts like SQL Server who try to continue running in the face of stack overflow.  In these cases, the CLR has to call out to the host to see what it would like to do.  Maybe the host can run in just a page’s worth of stack, and maybe it can’t.  The CLR doesn’t try to recover in unhosted situations because it is extremely difficult, and there are even problems with some of Win32 itself not being able to tolerate the presence of stack overflow (most notably CRITICAL_SECTIONs).  But the SQL Server engine is a very carefully engineered piece of software and they have a lot of experience (and success, apparently) remaining running in these cases.

If we commit the entire stack, there is no fear of running out of physical space during stack growth, because the whole thing has already been backed.

But of course this is the major downside to this design as well.  The pressure this puts on the pagefile is not negligible.  If you have 1,000 threads in a process, you need 1GB of pagefile space to back all of their stacks.  Sure, that’s a lot of threads, but heck, that’s a lot of disk space too!  A stack page won’t require physical memory until it’s actually faulted in (i.e. read from or written to), but the pagefile expense is a high price to pay for what amounts to be an obscure (and dubious) condition.

I say “dubious” because you have to wonder: is it even worth it anyway?  Probably not.  On modern Windows OSes that support SetThreadStackGuarantee, there’s little reason to commit anything but the guaranteed guard region.  The CLR uses this API, which means we can size the guaranteed region large enough to so we can always run our stack overflow logic within it.  Committing any more than this really is just a waste.  Even on OSes without this API, however, we’re going to failfast the process in this situation anyhow.  Sure, if we didn’t commit the whole thing up front, then these “out of pagefile space” situations might result in an inability to log an Event Log entry, notify a debugger, and so on, but will we really be able to do that anyway given the extreme resource pressure the machine has to be under to create this situation?  Probably not.

In the end, it matters little what I think about the design.  This is how it is, and I figured you all should know about it.

 

Recent Entries:

Search:

Browse by Date:
<July 2009>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...