RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Tuesday, October 17, 2006

The CLR's approach to monitor acquisition (i.e. Monitor.Enter and Monitor.Exit) during shutdown is very different from native CRITICAL_SECTIONs and mutexes (as described in my last post). In particular, the CLR does not ensure requests to acquire monitors on the shutdown path succeed, preferring instead to cope with the risk of deadlock rather than the risk of broken state invariants.

Managed code is run during orderly shutdowns in two places: the AppDomain.ProcessExit event and inside the Finalize method for all finalizable objects in the heap. (The term "orderly shutdown" is used to distinguish an Environment.Exit from a P/Invoke to kernel32!TerminateProcess, for instance.) Just as with the example described for native code, threads can be suspended while they hold arbitrary locks and have partially mutated state to the point where invariants do not hold any longer. Instead of permitting the shutdown code to observe this state--possibly causing corruption or unhandled exceptions on the finalizer thread--the CLR treats lock acquisitions as it normally does.

If a lock was orphaned in the process of stopping all running threads, then, the shutdown code path will fail to acquire the lock. If these acquisitions are done with non-timeout (or long timeout) acquires, a hang will ensue. To cope with this (and any other sort of hang that might happen), the CLR annoints a watchdog thread to keep an eye on the finalizer thread. Although configurable, by default the CLR will let finalizers run for 2 seconds before becoming impatient; if this timeout is exceeded, the finalizer thread is stopped, and shutdown continues without draining the rest of the finalizer queue.

This is typically not horrible since many finalizers are meant to cleanup intra-process state that Windows will cleanup automatically anyway. This covers things like file HANDLEs. But it does mean that any additional logic won't be run, like flushing file write-buffers. And for any cross-process state, you're screwed and had better have a fail-safe plan in place, like detecting corrupt machine-wide state and repairing upon the next program restart. (For what it's worth, DLL_PROCESS_DETACH notifications aren't run in all process exits either, so this really is not any worse than what you have with native code today.)

AppDomain unloads are very different beasts. Any reliability-critical code that will run as part of unload (CERs, critical finalizers, and generally any Cer.Success/Consistency.WillNotCorruptState methods) should strictly only ever acquire locks that are always dealt with in a reliable manner throughout the code-base. That statement is actually a little too strong. In reality, either (1) locks must never be orphaned (aside from process exit) or (2) the associated broken state invariants that will occur (e.g. in the face of asynchronous exceptions) can be tolerated.

Unfortunately, we don't give you access to Monitor.ReliableEnter (the BCL team gets to use it, though, as it's internal to mscorlib), which means almost nobody is equipped to do (1) today. It's impossible to write code that will reliably release a monitor in the face of possible asynchronous thread aborts and out of memory exceptions without it. Only a very tiny fraction of the BCL actually deals with locks in such a strictly reliable manner, so as a general rule of thumb very little of it actually acquires and releases locks while executing reliable-critical code. Without the risk of deadlock that is. Hosts will of course use policy to escalate to rude AppDomain unloads in the face of hangs, much like the CLR does by default for process exit.

(Note: Thanks to Jan Kotas--a SDE on the CLR team--for noticing that I confused AppDomain unloads with process exit in my last post, in addition to pointing out that appearances are deceiving: the multi-threaded CRT can actually suffer from the sort of shutdown problems outlined in the last post.)

 

Recent Entries:

Search:

Browse by Date:
<October 2006>
SunMonTueWedThuFriSat
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234

Browse by Category:

Notables: