RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Thursday, December 29, 2005

This white-paper is the best overview I've ever seen on COM threading models, ranging from apartments to CoWaitForMultipleHandles to synchronization-contexts:

http://msdn.microsoft.com/library/en-us/dncomser/html/comthread.asp

Read it and weep. (For those that still program in COM, that is. Haha.)

12/29/2005 8:41:54 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, December 28, 2005

I wanted to increase the GoogleRank for Aaron's article:
http://blogs.msdn.com/astebner/archive/2005/12/16/504906.aspx

My machine's Visual Studio 2005 installation has been partially hosed since somewhere around Beta2. Various tasks would regularly fail with "Package Load Failure" error messages. I could compile and perform basic functions with it, but was forced to use cmd-line for a lot of stuff. Like editing project files. This was caused by the multiple betas, CTPs, and hand-built CLRs that I've had installed over the past year; it seems some cruft had built up inside of the native-cache.

I followed his steps, and now it works like a charm. Sweet.

12/28/2005 11:56:19 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 27, 2005

Some fundamental changes were made in the .NET Framework 2.0 that just about obviate the need to ever write a traditional finalizer. A lot of the guidance written here is now obsolete, not because it is incorrect, but rather because there is one important new consideration to make (hosting) and a set of new tools to aid you in the task. Jeff Richter pointed this out to all of us a few months back.

As Stephen Toub discusses in depth in his recent MSDN Magazine article on CLR reliability, resources not under the protection of critical finalizers are doomed for leakage when run inside of sophisticated hosts. SQL Server uses AppDomains as the unit of code isolation, much like Windows’ use of processes. When it tears one down, it expects there to be no resulting residual build-up over time. But if the best you’ve got are ordinary finalizers to clean up resources, a rude AppDomain unload can bypass execution of them entirely, leading to leaks over time. This might happen if a finalizer in the queue with you takes too long to complete, perhaps by deadlocking on entry to a non-pumping STA, causing the host to escalate to a rude unload.

Critical finalizers

During a rude unload, normal finalizers are skipped, finally blocks aren’t run, and only critical finalizers get a chance to make the world sane again. Thus we can immediately form a guiding principle:

Any resource whose natural lifetime outlasts an AppDomain must be protected by a critical finalizer to avoid leaks.

Notice that I say "lifetime spans an AppDomain." This is important. Finalizers are often used for process-wide resources, such as file HANDLEs and Semaphores. But a resource whose lifetime is limited to the enclosing process’s surely outlasts any single AppDomain; a finalizer is not good enough. Another piece of code in the same process might be denied access to the file handle because the (now-dead) AppDomain orphaned an exclusively-opened handle to it. Windows ensures this HANDLE will get released when the process shuts down, but our goal with critical finalization is to do this at AppDomain unload time (avoiding cross-AppDomain interference). In the worst case, not doing so can actually lead to state corruption; a process crash is then likely to ensue, taking down a host like SQL Server with it. Imagine if two AppDomains—perhaps even multiple processes—communicate via memory mapped I/O inside of a shared address space. If an AppDomain gets interrupted by an unload mid-way through a paired operation and intends to clean up state in its finalizer, failure to execute the finalizer might lead to chaos. A critical finalizer should have been used. (And use of BeginDelayAbort, e.g. via a CER. But that’s digging a little too deep for now.)

Critical finalizers are somewhat easier to write when compared to ordinary finalizers, due to the out-of-the-box plumbing that you get. But they impose additional constraints on what you can actually do at finalization time. To implement a critical finalizer, simply subclass the System.Runtime.ConstrainedExecution.CriticalFinalizerObject (CFO) type, provide a way for users to acquire a resource (e.g. in the constructor), and override its Finalize method to perform cleanup. When instantiated, your object will be placed onto the critical-finalization object queue. CFOs can be suppressed as usual with the GC.SuppressFinalize method, and can be re-registered onto the critical-finalization queue with the GC.ReRegisterForFinalize method. The CLR then ensures your object is finalized should a rude unload occur; obviously, it also runs them in the same cases ordinary finalizers are run too: i.e. standard GC finalization, managed shut-down, ordinary AppDomain unload, etc. There is a weak guarantee that CFOs are finalized after other finalizable objects, specifically to accomodate relationships like how the FileStream must flush its buffer before its underlying SafeHandle has been released.

As noted, writing a CFO Finalize method is trickier than a standard finalizer due to additional constraints. This is because it can be called from inside of a CER if the host escalates to a rude unload. It must guarantee that state will not be corrupted as a result of its execution and that it will never fail (i.e. by leaking an exception). And of course you can only call non-virtual methods that make similar guarantees. This means your code has to be written to succeed in the most hostile of situations, for instance in situations where any attempt to allocate memory dynamically will be rejected via an OutOfMemoryException. If you let that exception leak, you’ve violated the contract and can expect the host to respond in any number of ways, including crashing the process immediately. CERs perform eager preparation to statically ensure your code can execute, jitting the transitive closure of methods you invoke, but it’s easy to make a misstep here due to the massive number of hidden allocations in the runtime. A box instruction allocates memory; unbox does, too, but only if you’re unboxing a Nullable<T>; throw has to manufacture a RuntimeWrappedException if you're throwing a non-Exception object; and so forth. And unfortunately there aren’t any tools to prove that you’ve written your CER correctly. Thankfully most developers write bug free code on their first attempt. ;)

Critical- and safe-handles

Using the base CFO type directly has a couple drawbacks. First, it doesn’t fully implement the IDisposable pattern. There are two convenient Framework CFO abstract classes that do, both in the System.Runtime.InteropServices namespace: CriticalHandle and SafeHandle.

The CriticalHandle type is sufficient to get critical finalization semantics: you simply override its protected constructor and ReleaseHandle methods, performing open and close operations inside of them respectively. Your ReleaseHandle implementation can be called from inside of a critical finalization CER, so as with writing CFOs by hand you must make the same guarantees outlined above. This type provides a cleanly factored and encapsulated interface to your users.

But more concerning is the fact that both CFO and CriticalHandle are still prone to security problems that you might need to worry about if you’re building any sort of reusable Framework. BrianGru outlines this situation here. To tackle those issues, you need SafeHandle. Implementing SafeHandle is much like CriticalHandle, in that you override the protected constructor and ReleaseHandle methods, and abide by CER rules inside of ReleaseHandle. One additional piece is necessary, however: you must implement the abstract IsInvalid property getter and return true or false to indicate whether the SafeHandle refers to an invalid handle. (The SafeHandleMinusOneIsInvalid and SafeHandleZeroOrMinusOneIsInvalid types in the Microsoft.Win32.SafeHandles namespace are there to help out here, returning true if the handle is the value -1 in the first case and true if the handle is the value -1 or 0 in the latter case. A PVOID with a value of 0 (i.e. NULL), for example, would be invalid for a handle to a memory address; SafeHandleZeroOrMinusOneIsInvalid would be perfect for this.) ShawnFa discusses implementing SafeHandle in more detail on his blog.

Your CriticalHandle and SafeHandle types should never take on additional business-logic responsibility; make them as light-weight as possible, doing just enough to allocate and free resources. You’ll probably have a number of other functional classes that make use of these handles. The Framework’s Stream types are a classic example. Such types should implement the IDisposable interface and invoke Dispose on the underlying handle, providing an eager way to dispose of the resource. They should furthermore take care to never publicly expose the underlying handle, as doing so could be used to erroneously suppress finalization on a handle, leading to resource leaks.

Did you really mean never?

Almost. There are still several situations in which people must still write complex finalizers. The tax they must pay for stepping outside of the simple allocate/deallocate pattern is understanding intimately the big mess outlined here. Most people should consider factoring their real cleanup code to use a SafeHandle, and only then layering specialized code on top of that inside of a normal finalizer.

After a brief email thread with Chris Brumme, a number of legitimate cases of alternative finalizer patterns were identified, including:

  1. Sophisticated APIs can use finalizers to return expensive objects—like large buffers or database connections—back to a pool, amortizing the cost of creating and destroying them over the life of the application. System.EnterpriseServices does this. This is one of the only cases where resurrection is an acceptable practice. Critical finalization should only be used here if resources are pooled across an entire process. Most resources are AppDomain-local, and thus do not qualify for CFO status.
  2. Calling GC.RemoveMemoryPressure to compensate for a previous GC.AddMemoryPressure, used to communicate to the GC that the pressure associated with an object's resources is no longer a factor (because it's been cleaned up). This should be protected by a CFO if the resource whose pressure it tracks is also allocated/deallocated under a CFO. It’s unfortunate that the RemoveMemoryPressure API doesn’t make reliability guarantees (e.g. with ReliabilityContractAttribute). If it attempts to allocate memory—I can’t imagine that it would—you could end up crashing the process due to an unhandled OutOfMemoryException. You could consider swallowing such exceptions, at the risk of violating the corruption contract. This is a crappy situation, but if a large quantity of pressure were leaked after an AppDomain unload, a skew could build up over time, affecting all parties in the process, precisely what we’re trying to avoid by using CFOs. You need to make an intelligent tradeoff. We should fix this in a later release.
  3. Incrementing or decrementing a performance counter or lighter-weight counter like a static field. This is often used to monitor the rate of creation/destruction of objects, and is often turned off in retail builds. Assuming imprecise counting is OK—e.g. if it’s used only for testing purposes—this should not use a CFO. If you do use a CFO, you have you follow the guidelines above. For light-weight counters this is easy (i++ and i-- traditionally don’t allocate memory); but for performance counters it is not.
  4. Asserting to find cases where an object should have been, but was not, eagerly cleaned up using the IDisposable pattern. Properly written eager disposal is supposed to call GC.SuppressFinalization to eliminate the assert. It would be inappropriate to use CFOs for this purpose. Finally blocks will not run under a rude unload (which includes Dispose methods), and thus under any rude unload situation your CFO will fire.
  5. Some external resources have elaborate rules for sequencing cleanup. The COM ADO APIs (not ADO.NET) require that fields are cleaned up before rows, which must precede tables, which must precede connections. If objects are cleaned up in a free-threaded manner or in the wrong order, memory corruption will occur. In other words, they violate the standard COM pUnk AddRef/Release rules. Outlook exposes COM APIs with similar sequencing rules. This is traditionally addressed by writing elaborate finalization code that walks the graph on the managed side and initiates the sequenced cleanup. This is the trickiest of all. If you can guarantee you follow the CFO rules outlined above, this probably belongs in a critical finalizer. But it’s quite easy to make a misstep...you're basically playing with dynamite at this point.

If you decide you must write a finalizer, it’s still important to follow the pattern described here, or the condensed version in the .NET Framework Design Guidelines book. This facilitates seamless integration with VC++ 2005’s new destructor, Dispose, and finalization unification features.

Summary

At first glance, it appears that the world is simpler with CFOs. But when you consider that you have to abide by the same rules for normal finalizers plus the new ornery CER rules, life still isn’t very simple at all. CriticalFinalizerObject makes sure your resources don’t leak during hostile takeovers, and SafeHandle makes life more secure and a little easier in that the plumbing required to get IDisposable hooked up is all built for you, but one thing remains the same: Interoperating with unmanaged code is tricky stuff. But thankfully the world will be written in nothing but managed code sometime in the near future. Then we can get rid of all of this hairy finalization code once and for all.

12/27/2005 10:47:45 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 20, 2005

Now that the book's quieted down, I have more time to do things like code, read, drink wine, eat, and sleep. In that order.

And I've changed roles at Microsoft to focus entirely on concurrency.

I'm wrapping up development on some C++ code I wrote for an upcoming MSDN article. I also intend to spend quite a bit of time over the holiday finishing up another project that has really tested my thinking and coding skills. I love stuff like that. When carefully and intentionally crafted code must play nicely with the topology of the underlying machine. I have a presentation to the C# Design Team in late January to show 'em what I got, so I need to get these ideas down into code and optimized ASAP.

I've also become quite hooked on the sweet sounds of System of a Down. My 5 Top Played bands in iTunes right now are (in order): System of a Down, Ill Nino, Mudvayne, Machine Head, and Misfits. I've also been playing a bit more guitar lately, recording a little, but not being overly happy with the end result. Someday.

By the way, if you want to buy me any books, here's a condensed Wish List:

DEC is Dead, Long Live DEC: The Lasting Legacy of Digital Equipment Corporation -- Edgar H. Schein The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture -- John Battelle
Parallel Computer Architecture: A Hardware/Software Approach -- David Culler, J.P. Singh, Anoop Gupta Transaction Processing: Concepts and Techniques -- Jim Gray, Andreas Reuter
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control -- Gerhard Weikum, Gottfried Vossen Principles of Transaction Processing -- Philip A. Bernstein, Eric Newcomer

Interestingly, a number of those authors currently work at Microsoft.

Have a happy holidays everybody.

12/20/2005 7:37:26 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, December 14, 2005

Charles, dude, I still read Programming Windows from beginning to comatose. Frequently.

Maybe I'm a dying breed.

Will anybody actually read my book from beginning to comatose? I hope so.

12/14/2005 11:00:44 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 13, 2005

My recent {End Bracket} column, Transactions for Memory, shipped in the January MSDN Magazine. It's now been posted online: http://msdn.microsoft.com/msdnmag/issues/06/01/EndBracket/.

It's admittedly just a teaser, but hopefully strikes a good balance between hand-waviness and a useful explanation of the core ideas.

 

12/13/2005 9:46:18 AM (Pacific Standard Time, UTC-08:00)  #   

 Saturday, December 10, 2005

A wise man once said a picture is worth a thousand words.

Alas, for those authors hoping to fill pages of text with fluffy pictures I bear sad news: A sufficiently fluffy 4" tall illustration takes up only a measly 239 words worth of space.

Perhaps a picture is worth 239 words is more accurate. (Or alternatively, 4.18 pictures are worth a thousand words.)

12/10/2005 5:10:23 PM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, December 01, 2005

Lots of people try to roll their own thread-pool. Many people have different (good) reasons for doing so.

If you're one of these people, please tell me why. Either leave me a comment or send me an email at joedu@microsoft.com.

But if you're interested in performance, getting a good heuristic isn't as easy as you might think. The goal of such a heuristic is to have one runnable thread per hardware thread at any given moment. (A HT thread isn't equal to a full thread, but for sake of conversation let's pretend it is.) Acheiving this goal is much more complicated than it sounds.

  • If you have a task sitting in front of you, it's hard to intelligently determine whether scheduling it on another thread is the right thing to do. It might be quicker just to execute it synchronously on the current thread. When is that the case? When the current number of running threads is equal to or greater than the number of hardware threads. And any decisions must be made statistically, because presumably concurrent tasks could be contemplating new work simultaneously.
  • Remember I said running threads. If you have blocked threads, they are not making use of the CPU and thus need to be be considered differently in the heuristic. Just a count of threads isn't enough. If you have 16 tasks, 8 hardware threads, and statistically 50% of those tasks will be blocked at any given quantum, you want 16 real threads. If they block 75% of the time, you want 24. And so forth.
  • You aren't the only code on the machine. Another process could be happily hogging as many threads as there are hardware threads, in which case your algorithm just got twice as bad (or half as good) as it was originally. This type of global data is hard to come by. (I should note that most machines have more than 2 processes running simultaneously. I currently have 67 processes running with 605 total threads. That's an average of ~9 threads per process. Clearly this is a real concern.)

Scheduling a task on another thread is costly. Why? For a number of reasons.

  • Because unless you have ample hardware resources to run it, this implies at least one context switch to swap the work in. If it runs longer than that, it means many more. If you have more than one long running tasks competing for the same hardware thread, it means they will continually thrash the thread context in an attempt to make forward progress. As Larry puts it so eloquently, "...Context switches are BAD. They represent CPU time that the application could be spending on working for the customer."
  • And not only that (and perhaps worse), you're going to mess with the cache hierarchy. Your program might be happily working on conflict-free cache-lines, CASing right in the local cache without locking the bus, and then boom: You pass a pointer to an object to another thread (e.g. on the thread-pool), it pulls in the same lines of cache, and then you're both contending for the same lines back and forth. Your good locality goes right out the window and becomes a tax instead of a blessing. This sort of cache thrashing can kill good performance and scaling.
  • Lastly, threads aren't free you know. Just having one around consumes 1MB of reserved stack space (0.5MB in SQL Server). Same goes for fibers.

Some people are interested in using thread-pools for other purposes. (That, is: not performance.) They might want to manage a pool of work items, for example, which get scheduled fairly with respect to each other (in the fine-grained sense). No one task will complete very quickly during saturation, but at least each is guaranteed to move forward. A newly enqueued item won't sit festering in the queue while an older item continues bumbling along towards its goal. And sometimes, priorities must be used to evict lesser priority tasks when a higher priority task gets enqueued. These are all perfect cases where user-mode scheduling makes sense. Co-routines or (*cough, cough*) perhaps fibers could be used. Using threads for this simply adds way too much overhead.

Clearly getting this right is difficult. But the consequence of getting it horribly wrong today isn't too bad. (Although really crappy algorithms are noticeable.) When you only have 1-4 hardware threads on the average high-end machine, the difference between a great heuristic and a poor one isn't significant. That will change.

12/1/2005 11:30:04 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, November 30, 2005

Classic.

If you're bored, read any of these papers. I wish I were smart too.

11/30/2005 9:05:31 PM (Pacific Standard Time, UTC-08:00)  #   

 Sunday, November 27, 2005

Each Windows thread has a Thread Environment Block (i.e. TEB) which is a block of user-mode memory pointed at and reserved for use by the Windows kernel Thread data structure (KTHREAD). In addition to basic OS information like the active SEH filter chain, stack base and limit, and owned critical sections, applications can easily stash data into and retrieve data out of the Thread Local Storage (TLS) area of the TEB. This is done using the Win32 TlsAlloc, TlsGetValue, TlsSetValue, and TlsFree functions. You can view the TEB via the kernel debugger's !thread command.

(The CLR of course offers TLS functionality too, i.e. using ThreadStatics and the System.Threading.Thread's AllocateDataSlot, SetData, and GetData functions. This information does go into the TEB, but it is managed by the CLR. A call to SetData does not translate directly to a call to TlsSetValue.)

Win32--and Windows in general--makes liberal use of thread-local memory. I noted a few uses above (e.g. exception handlers) which are pervasive. Such usage creates an implicit affinity between the workload running on the thread and the physical OS thread itself. What do I mean by affinity? Simply that the work executing on a thread must continue executing on that exact physical thread for it to remain correct. This affinity isn't documented consistently nor is it easy to detect. You might be able to weasel around it by chance. But it makes it extraordinarily difficult to transfer logical work from one physical thread to another.

Imagine what would happen if we made a call to some Win32 function and then decided to swap out the logical work so that we could install new work. SetLastError might have been used to communicate a failure in a function called on either the thread the work is being swapped out of, or the destination once it gets rescheduled. But SetLastError installs the error information into the TEB. GetLastError will then either fail to retrieve information or, more likely, will retrieve somebody else's information, either of which would lead to all sorts of serious problems. Similar issues can happen if we (foolishly) tried to swap out a thread that owned a critical section, or some other thread-specific resource (like a mutex).

This is one major reason why fibers are still problematic as a general task scheduling solution for Windows. And it's a challenge if you even want to consider user-mode scheduling a la continuations. You just can't get around the platform's hidden thread affinity. We've done much better in managed code. Over time we are trying to use ExecutionContext as the currency for logical context information, which can be easily captured and restored by the runtime. But there are examples where we violate this (e.g. monitors), where we use the physical OS thread as the context (be fair: we do notify hosts of such situations via Thread.Begin/EndThreadAffinity).

But you can't escape the fact that the runtime itself is built right on top of Win32.

11/27/2005 9:50:32 PM (Pacific Standard Time, UTC-08:00)  #   

 

Recent Entries:

Search:

Browse by Date:
<December 2005>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

Browse by Category:

Notables: