Wednesday, December 28, 2005

I wanted to increase the GoogleRank for Aaron's article:
http://blogs.msdn.com/astebner/archive/2005/12/16/504906.aspx

My machine's Visual Studio 2005 installation has been partially hosed since somewhere around Beta2. Various tasks would regularly fail with "Package Load Failure" error messages. I could compile and perform basic functions with it, but was forced to use cmd-line for a lot of stuff. Like editing project files. This was caused by the multiple betas, CTPs, and hand-built CLRs that I've had installed over the past year; it seems some cruft had built up inside of the native-cache.

I followed his steps, and now it works like a charm. Sweet.

12/28/2005 11:56:19 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 27, 2005

Some fundamental changes were made in the .NET Framework 2.0 that just about obviate the need to ever write a traditional finalizer. A lot of the guidance written here is now obsolete, not because it is incorrect, but rather because there is one important new consideration to make (hosting) and a set of new tools to aid you in the task. Jeff Richter pointed this out to all of us a few months back.

As Stephen Toub discusses in depth in his recent MSDN Magazine article on CLR reliability, resources not under the protection of critical finalizers are doomed for leakage when run inside of sophisticated hosts. SQL Server uses AppDomains as the unit of code isolation, much like Windows’ use of processes. When it tears one down, it expects there to be no resulting residual build-up over time. But if the best you’ve got are ordinary finalizers to clean up resources, a rude AppDomain unload can bypass execution of them entirely, leading to leaks over time. This might happen if a finalizer in the queue with you takes too long to complete, perhaps by deadlocking on entry to a non-pumping STA, causing the host to escalate to a rude unload.

Critical finalizers

During a rude unload, normal finalizers are skipped, finally blocks aren’t run, and only critical finalizers get a chance to make the world sane again. Thus we can immediately form a guiding principle:

Any resource whose natural lifetime outlasts an AppDomain must be protected by a critical finalizer to avoid leaks.

Notice that I say "lifetime spans an AppDomain." This is important. Finalizers are often used for process-wide resources, such as file HANDLEs and Semaphores. But a resource whose lifetime is limited to the enclosing process’s surely outlasts any single AppDomain; a finalizer is not good enough. Another piece of code in the same process might be denied access to the file handle because the (now-dead) AppDomain orphaned an exclusively-opened handle to it. Windows ensures this HANDLE will get released when the process shuts down, but our goal with critical finalization is to do this at AppDomain unload time (avoiding cross-AppDomain interference). In the worst case, not doing so can actually lead to state corruption; a process crash is then likely to ensue, taking down a host like SQL Server with it. Imagine if two AppDomains—perhaps even multiple processes—communicate via memory mapped I/O inside of a shared address space. If an AppDomain gets interrupted by an unload mid-way through a paired operation and intends to clean up state in its finalizer, failure to execute the finalizer might lead to chaos. A critical finalizer should have been used. (And use of BeginDelayAbort, e.g. via a CER. But that’s digging a little too deep for now.)

Critical finalizers are somewhat easier to write when compared to ordinary finalizers, due to the out-of-the-box plumbing that you get. But they impose additional constraints on what you can actually do at finalization time. To implement a critical finalizer, simply subclass the System.Runtime.ConstrainedExecution.CriticalFinalizerObject (CFO) type, provide a way for users to acquire a resource (e.g. in the constructor), and override its Finalize method to perform cleanup. When instantiated, your object will be placed onto the critical-finalization object queue. CFOs can be suppressed as usual with the GC.SuppressFinalize method, and can be re-registered onto the critical-finalization queue with the GC.ReRegisterForFinalize method. The CLR then ensures your object is finalized should a rude unload occur; obviously, it also runs them in the same cases ordinary finalizers are run too: i.e. standard GC finalization, managed shut-down, ordinary AppDomain unload, etc. There is a weak guarantee that CFOs are finalized after other finalizable objects, specifically to accomodate relationships like how the FileStream must flush its buffer before its underlying SafeHandle has been released.

As noted, writing a CFO Finalize method is trickier than a standard finalizer due to additional constraints. This is because it can be called from inside of a CER if the host escalates to a rude unload. It must guarantee that state will not be corrupted as a result of its execution and that it will never fail (i.e. by leaking an exception). And of course you can only call non-virtual methods that make similar guarantees. This means your code has to be written to succeed in the most hostile of situations, for instance in situations where any attempt to allocate memory dynamically will be rejected via an OutOfMemoryException. If you let that exception leak, you’ve violated the contract and can expect the host to respond in any number of ways, including crashing the process immediately. CERs perform eager preparation to statically ensure your code can execute, jitting the transitive closure of methods you invoke, but it’s easy to make a misstep here due to the massive number of hidden allocations in the runtime. A box instruction allocates memory; unbox does, too, but only if you’re unboxing a Nullable<T>; throw has to manufacture a RuntimeWrappedException if you're throwing a non-Exception object; and so forth. And unfortunately there aren’t any tools to prove that you’ve written your CER correctly. Thankfully most developers write bug free code on their first attempt. ;)

Critical- and safe-handles

Using the base CFO type directly has a couple drawbacks. First, it doesn’t fully implement the IDisposable pattern. There are two convenient Framework CFO abstract classes that do, both in the System.Runtime.InteropServices namespace: CriticalHandle and SafeHandle.

The CriticalHandle type is sufficient to get critical finalization semantics: you simply override its protected constructor and ReleaseHandle methods, performing open and close operations inside of them respectively. Your ReleaseHandle implementation can be called from inside of a critical finalization CER, so as with writing CFOs by hand you must make the same guarantees outlined above. This type provides a cleanly factored and encapsulated interface to your users.

But more concerning is the fact that both CFO and CriticalHandle are still prone to security problems that you might need to worry about if you’re building any sort of reusable Framework. BrianGru outlines this situation here. To tackle those issues, you need SafeHandle. Implementing SafeHandle is much like CriticalHandle, in that you override the protected constructor and ReleaseHandle methods, and abide by CER rules inside of ReleaseHandle. One additional piece is necessary, however: you must implement the abstract IsInvalid property getter and return true or false to indicate whether the SafeHandle refers to an invalid handle. (The SafeHandleMinusOneIsInvalid and SafeHandleZeroOrMinusOneIsInvalid types in the Microsoft.Win32.SafeHandles namespace are there to help out here, returning true if the handle is the value -1 in the first case and true if the handle is the value -1 or 0 in the latter case. A PVOID with a value of 0 (i.e. NULL), for example, would be invalid for a handle to a memory address; SafeHandleZeroOrMinusOneIsInvalid would be perfect for this.) ShawnFa discusses implementing SafeHandle in more detail on his blog.

Your CriticalHandle and SafeHandle types should never take on additional business-logic responsibility; make them as light-weight as possible, doing just enough to allocate and free resources. You’ll probably have a number of other functional classes that make use of these handles. The Framework’s Stream types are a classic example. Such types should implement the IDisposable interface and invoke Dispose on the underlying handle, providing an eager way to dispose of the resource. They should furthermore take care to never publicly expose the underlying handle, as doing so could be used to erroneously suppress finalization on a handle, leading to resource leaks.

Did you really mean never?

Almost. There are still several situations in which people must still write complex finalizers. The tax they must pay for stepping outside of the simple allocate/deallocate pattern is understanding intimately the big mess outlined here. Most people should consider factoring their real cleanup code to use a SafeHandle, and only then layering specialized code on top of that inside of a normal finalizer.

After a brief email thread with Chris Brumme, a number of legitimate cases of alternative finalizer patterns were identified, including:

  1. Sophisticated APIs can use finalizers to return expensive objects—like large buffers or database connections—back to a pool, amortizing the cost of creating and destroying them over the life of the application. System.EnterpriseServices does this. This is one of the only cases where resurrection is an acceptable practice. Critical finalization should only be used here if resources are pooled across an entire process. Most resources are AppDomain-local, and thus do not qualify for CFO status.
  2. Calling GC.RemoveMemoryPressure to compensate for a previous GC.AddMemoryPressure, used to communicate to the GC that the pressure associated with an object's resources is no longer a factor (because it's been cleaned up). This should be protected by a CFO if the resource whose pressure it tracks is also allocated/deallocated under a CFO. It’s unfortunate that the RemoveMemoryPressure API doesn’t make reliability guarantees (e.g. with ReliabilityContractAttribute). If it attempts to allocate memory—I can’t imagine that it would—you could end up crashing the process due to an unhandled OutOfMemoryException. You could consider swallowing such exceptions, at the risk of violating the corruption contract. This is a crappy situation, but if a large quantity of pressure were leaked after an AppDomain unload, a skew could build up over time, affecting all parties in the process, precisely what we’re trying to avoid by using CFOs. You need to make an intelligent tradeoff. We should fix this in a later release.
  3. Incrementing or decrementing a performance counter or lighter-weight counter like a static field. This is often used to monitor the rate of creation/destruction of objects, and is often turned off in retail builds. Assuming imprecise counting is OK—e.g. if it’s used only for testing purposes—this should not use a CFO. If you do use a CFO, you have you follow the guidelines above. For light-weight counters this is easy (i++ and i-- traditionally don’t allocate memory); but for performance counters it is not.
  4. Asserting to find cases where an object should have been, but was not, eagerly cleaned up using the IDisposable pattern. Properly written eager disposal is supposed to call GC.SuppressFinalization to eliminate the assert. It would be inappropriate to use CFOs for this purpose. Finally blocks will not run under a rude unload (which includes Dispose methods), and thus under any rude unload situation your CFO will fire.
  5. Some external resources have elaborate rules for sequencing cleanup. The COM ADO APIs (not ADO.NET) require that fields are cleaned up before rows, which must precede tables, which must precede connections. If objects are cleaned up in a free-threaded manner or in the wrong order, memory corruption will occur. In other words, they violate the standard COM pUnk AddRef/Release rules. Outlook exposes COM APIs with similar sequencing rules. This is traditionally addressed by writing elaborate finalization code that walks the graph on the managed side and initiates the sequenced cleanup. This is the trickiest of all. If you can guarantee you follow the CFO rules outlined above, this probably belongs in a critical finalizer. But it’s quite easy to make a misstep...you're basically playing with dynamite at this point.

If you decide you must write a finalizer, it’s still important to follow the pattern described here, or the condensed version in the .NET Framework Design Guidelines book. This facilitates seamless integration with VC++ 2005’s new destructor, Dispose, and finalization unification features.

Summary

At first glance, it appears that the world is simpler with CFOs. But when you consider that you have to abide by the same rules for normal finalizers plus the new ornery CER rules, life still isn’t very simple at all. CriticalFinalizerObject makes sure your resources don’t leak during hostile takeovers, and SafeHandle makes life more secure and a little easier in that the plumbing required to get IDisposable hooked up is all built for you, but one thing remains the same: Interoperating with unmanaged code is tricky stuff. But thankfully the world will be written in nothing but managed code sometime in the near future. Then we can get rid of all of this hairy finalization code once and for all.

12/27/2005 10:47:45 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 20, 2005

Now that the book's quieted down, I have more time to do things like code, read, drink wine, eat, and sleep. In that order.

And I've changed roles at Microsoft to focus entirely on concurrency.

I'm wrapping up development on some C++ code I wrote for an upcoming MSDN article. I also intend to spend quite a bit of time over the holiday finishing up another project that has really tested my thinking and coding skills. I love stuff like that. When carefully and intentionally crafted code must play nicely with the topology of the underlying machine. I have a presentation to the C# Design Team in late January to show 'em what I got, so I need to get these ideas down into code and optimized ASAP.

I've also become quite hooked on the sweet sounds of System of a Down. My 5 Top Played bands in iTunes right now are (in order): System of a Down, Ill Nino, Mudvayne, Machine Head, and Misfits. I've also been playing a bit more guitar lately, recording a little, but not being overly happy with the end result. Someday.

By the way, if you want to buy me any books, here's a condensed Wish List:

DEC is Dead, Long Live DEC: The Lasting Legacy of Digital Equipment Corporation -- Edgar H. Schein The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture -- John Battelle
Parallel Computer Architecture: A Hardware/Software Approach -- David Culler, J.P. Singh, Anoop Gupta Transaction Processing: Concepts and Techniques -- Jim Gray, Andreas Reuter
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control -- Gerhard Weikum, Gottfried Vossen Principles of Transaction Processing -- Philip A. Bernstein, Eric Newcomer

Interestingly, a number of those authors currently work at Microsoft.

Have a happy holidays everybody.

12/20/2005 7:37:26 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, December 14, 2005

Charles, dude, I still read Programming Windows from beginning to comatose. Frequently.

Maybe I'm a dying breed.

Will anybody actually read my book from beginning to comatose? I hope so.

12/14/2005 11:00:44 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 13, 2005

My recent {End Bracket} column, Transactions for Memory, shipped in the January MSDN Magazine. It's now been posted online: http://msdn.microsoft.com/msdnmag/issues/06/01/EndBracket/.

It's admittedly just a teaser, but hopefully strikes a good balance between hand-waviness and a useful explanation of the core ideas.

 

12/13/2005 9:46:18 AM (Pacific Standard Time, UTC-08:00)  #   

 Saturday, December 10, 2005

A wise man once said a picture is worth a thousand words.

Alas, for those authors hoping to fill pages of text with fluffy pictures I bear sad news: A sufficiently fluffy 4" tall illustration takes up only a measly 239 words worth of space.

Perhaps a picture is worth 239 words is more accurate. (Or alternatively, 4.18 pictures are worth a thousand words.)

12/10/2005 5:10:23 PM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, December 01, 2005

Lots of people try to roll their own thread-pool. Many people have different (good) reasons for doing so.

If you're one of these people, please tell me why. Either leave me a comment or send me an email at joedu@microsoft.com.

But if you're interested in performance, getting a good heuristic isn't as easy as you might think. The goal of such a heuristic is to have one runnable thread per hardware thread at any given moment. (A HT thread isn't equal to a full thread, but for sake of conversation let's pretend it is.) Acheiving this goal is much more complicated than it sounds.

  • If you have a task sitting in front of you, it's hard to intelligently determine whether scheduling it on another thread is the right thing to do. It might be quicker just to execute it synchronously on the current thread. When is that the case? When the current number of running threads is equal to or greater than the number of hardware threads. And any decisions must be made statistically, because presumably concurrent tasks could be contemplating new work simultaneously.
  • Remember I said running threads. If you have blocked threads, they are not making use of the CPU and thus need to be be considered differently in the heuristic. Just a count of threads isn't enough. If you have 16 tasks, 8 hardware threads, and statistically 50% of those tasks will be blocked at any given quantum, you want 16 real threads. If they block 75% of the time, you want 24. And so forth.
  • You aren't the only code on the machine. Another process could be happily hogging as many threads as there are hardware threads, in which case your algorithm just got twice as bad (or half as good) as it was originally. This type of global data is hard to come by. (I should note that most machines have more than 2 processes running simultaneously. I currently have 67 processes running with 605 total threads. That's an average of ~9 threads per process. Clearly this is a real concern.)

Scheduling a task on another thread is costly. Why? For a number of reasons.

  • Because unless you have ample hardware resources to run it, this implies at least one context switch to swap the work in. If it runs longer than that, it means many more. If you have more than one long running tasks competing for the same hardware thread, it means they will continually thrash the thread context in an attempt to make forward progress. As Larry puts it so eloquently, "...Context switches are BAD. They represent CPU time that the application could be spending on working for the customer."
  • And not only that (and perhaps worse), you're going to mess with the cache hierarchy. Your program might be happily working on conflict-free cache-lines, CASing right in the local cache without locking the bus, and then boom: You pass a pointer to an object to another thread (e.g. on the thread-pool), it pulls in the same lines of cache, and then you're both contending for the same lines back and forth. Your good locality goes right out the window and becomes a tax instead of a blessing. This sort of cache thrashing can kill good performance and scaling.
  • Lastly, threads aren't free you know. Just having one around consumes 1MB of reserved stack space (0.5MB in SQL Server). Same goes for fibers.

Some people are interested in using thread-pools for other purposes. (That, is: not performance.) They might want to manage a pool of work items, for example, which get scheduled fairly with respect to each other (in the fine-grained sense). No one task will complete very quickly during saturation, but at least each is guaranteed to move forward. A newly enqueued item won't sit festering in the queue while an older item continues bumbling along towards its goal. And sometimes, priorities must be used to evict lesser priority tasks when a higher priority task gets enqueued. These are all perfect cases where user-mode scheduling makes sense. Co-routines or (*cough, cough*) perhaps fibers could be used. Using threads for this simply adds way too much overhead.

Clearly getting this right is difficult. But the consequence of getting it horribly wrong today isn't too bad. (Although really crappy algorithms are noticeable.) When you only have 1-4 hardware threads on the average high-end machine, the difference between a great heuristic and a poor one isn't significant. That will change.

12/1/2005 11:30:04 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, November 30, 2005

Classic.

If you're bored, read any of these papers. I wish I were smart too.

11/30/2005 9:05:31 PM (Pacific Standard Time, UTC-08:00)  #   

 Sunday, November 27, 2005

Each Windows thread has a Thread Environment Block (i.e. TEB) which is a block of user-mode memory pointed at and reserved for use by the Windows kernel Thread data structure (KTHREAD). In addition to basic OS information like the active SEH filter chain, stack base and limit, and owned critical sections, applications can easily stash data into and retrieve data out of the Thread Local Storage (TLS) area of the TEB. This is done using the Win32 TlsAlloc, TlsGetValue, TlsSetValue, and TlsFree functions. You can view the TEB via the kernel debugger's !thread command.

(The CLR of course offers TLS functionality too, i.e. using ThreadStatics and the System.Threading.Thread's AllocateDataSlot, SetData, and GetData functions. This information does go into the TEB, but it is managed by the CLR. A call to SetData does not translate directly to a call to TlsSetValue.)

Win32--and Windows in general--makes liberal use of thread-local memory. I noted a few uses above (e.g. exception handlers) which are pervasive. Such usage creates an implicit affinity between the workload running on the thread and the physical OS thread itself. What do I mean by affinity? Simply that the work executing on a thread must continue executing on that exact physical thread for it to remain correct. This affinity isn't documented consistently nor is it easy to detect. You might be able to weasel around it by chance. But it makes it extraordinarily difficult to transfer logical work from one physical thread to another.

Imagine what would happen if we made a call to some Win32 function and then decided to swap out the logical work so that we could install new work. SetLastError might have been used to communicate a failure in a function called on either the thread the work is being swapped out of, or the destination once it gets rescheduled. But SetLastError installs the error information into the TEB. GetLastError will then either fail to retrieve information or, more likely, will retrieve somebody else's information, either of which would lead to all sorts of serious problems. Similar issues can happen if we (foolishly) tried to swap out a thread that owned a critical section, or some other thread-specific resource (like a mutex).

This is one major reason why fibers are still problematic as a general task scheduling solution for Windows. And it's a challenge if you even want to consider user-mode scheduling a la continuations. You just can't get around the platform's hidden thread affinity. We've done much better in managed code. Over time we are trying to use ExecutionContext as the currency for logical context information, which can be easily captured and restored by the runtime. But there are examples where we violate this (e.g. monitors), where we use the physical OS thread as the context (be fair: we do notify hosts of such situations via Thread.Begin/EndThreadAffinity).

But you can't escape the fact that the runtime itself is built right on top of Win32.

11/27/2005 9:50:32 PM (Pacific Standard Time, UTC-08:00)  #   

 Friday, November 25, 2005

I joined Microsoft mid-way through Whidbey's lifecycle. Mid-way means post-feature development, for the most part. There were plenty of unplanned features that I got to design and work on, but those were handled quite differently than the initial development process. The impression I formed during this period was that of a very regimented, structured, and process-heavy software engineering practice. Clearly this is good to ensure people don't screw up too badly, but it also places unnecessary constraints on your best talent.

Or, as Paul Graham said in his essay Hackers & Painters:

Big companies want to decrease the standard deviation of design outcomes because they want to avoid disasters. But when you damp oscillations, you lose the high points as well as the low.

At first, I thought this was necessary for the type of project I was working on, when compared to the projects I'd worked on in the past. But now that 2.0 is out the door, I'm enjoying myself quite a bit more.

Scrounging for change

Planning the direction for future releases is clearly a complex game, consisting of a mixture of top-down (where must the business go?) and bottom-up (what features do we want to do?) analysis. Customer needs come from all directions. At some point, somebody presumably must pull the trigger and unleash the team in a concrete and coordinated direction. Some don't have the stomache for this, but without it paralysis analysis sets in.

Of course, direction is a funny thing. It typically emerges over time rather than being planned explicitly, whether those doing the planning realize it or not. This particular case is no different. Before we even shipped Whidbey Beta2, we knew the primary focuses for Orcas, even down to the feature level in many cases. I suspect most people are already half-way down the path we'll eventually go, e.g. because they've been dreaming of and prototyping the new features they would like to implement for well over a year already.

But (in theory) somebody has to capture that, refine it, and communicate it to form a shared understanding. Presumably for purposes of making sure management is OK with what everybody wants to implement. But of course, what everybody wants to implement must first be turned into market segmentation and value propositions. OK, that statement is a tad cynical. Although I'm sure the process in this case will ferret out some thought bugs before they get put into code--which is clearly a good thing--I can't help but wonder whether the cost is worth the benefit. This deserves separate analysis, of course.

Flying under the radar

Planning aside, the projects I am most interested in as we move into the next few releases are the tiny incubation efforts. These are ordinarily small groups of individuals from across the company (including product teams and research). Such groups are a diverse mix of people with different backgrouns and goals, yet are drawn together because of a shared interest. If people are responsible for allocating their own time "on the side" to work on something, you can safely bet they are passionate about it and (more than) qualified to work on it. Being united by a shared interest can lead to fun collaboration and great end results. Generics is an example of a (big) project that evolved this way.

In many cases, there is no clear support from management in terms of funding for these projects. A head nod about its importance is about all you'll usually get at first. And in fact, often such wild-west efforts can go against the spirit of the planning. In terms of funding, research is funded to research it and obviously product teams can communicate with research. But for product teams to get something worthy of the productization stamp of approval (that's a vigorous management head-nod), clearly there's some level of prototyping that is needed. But simultaneously, most capable developers are focusing on staying inside the bounds of the aforementioned processes and fixing bugs. This is a slight catch-22.

Regardless of tedious funding problems, I enjoy these efforts the most. The path is not pre-defined--we must find it--and the ability for one individual to contribute substantially is very high. They feel like start-ups. But you don't have to worry about paying the bills and recruiting the best talent (the perks of working at a place like Microsoft). Many such projects I've been involved in have been primarily thought exercises. But a few have recently been given some level of funding. The groups of people that are working on them are, as noted above, often genuinely interested in what's being built. This is great. Very little process is needed...only enough to ensure we hit the deadlines for integration with the main product, and to report back to management to make sure they feel comfortable.

Only time will tell whether this approach will lead to better results. But something seems intuitively correct about the attitude that enabling your best people to do what they were hired for will lead to huge successes. Obvious corollaries can be drawn from this statement.

11/25/2005 12:32:54 PM (Pacific Standard Time, UTC-08:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy