|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Saturday, October 13, 2007
Charles from Channel9 recorded a conversation with Anders and me a couple weeks back. The topic? Concurrency. More specifically, Parallel FX (PFX):
Programming in the Age of Concurrency: Concurrent Programming with PFX
Microsoft is developing a number of technologies to simplify the expression of parallelism in code. An example of this work is Parallel Extensions for the .NET Framework (PFX), a managed programming model for data parallelism, task parallelism, scheduling, and coordination on parallel hardware.
PFX makes it easier for developers to write programs that take advantage of parallel hardware (you've all heard of multi-core and what the future holds with many-core...), without having to deal with the complexities of threads and locks in today’s concurrent programming story
We don't go too deep, but you can bet we'll be doing more of these things as the technology matures and gets closer to general availability. Enjoy!
(Note: you may also be interested in Stephen Toub's PFX interview with Scott Hanselman, available here.)
 Tuesday, October 09, 2007
When COM came onto the scene in the early 90’s, Symmetric Muiltiprocessor (SMP) architectures had just been introduced into the higher end of the computer market. “Higher end” in this context basically meant server-side computing, a market in which the increase in compute power promised increased throughput for heavily loaded backend systems. Parallelism per se—that is, breaking apart large problems into smaller ones so that multiple processors can hack away to solve the larger problem more quickly—was still limited to the domain of specialty computing, such as high-end supercomputing and high-performance computing communities. The only economic incentive for Windows programmers to use multithreading, therefore, was limited mostly to servers. Heck, parallelism is still pretty much limited to those domains, but the economic incentives are clearly in the midst of a fundamental change.
As is already well established, server-side computing is highly parallel for several reasons. The most obvious is the steady stream of work a server farm usually enjoys, meaning there is seldom a shortage of compute work to do. Even if work is IO bound, there’s typically at least some work that could use a CPU waiting in the arrival queue to overlap execution with. Moreover, sever workloads are usually isolated except for some select and small amount of application-wide state. Each user has his own account, order history, bank transaction information, etc., and therefore the interaction between sessions can be carefully controlled and nearly non-existent, leading (once again) to a good cost/benefit tradeoff, due to the large scalability wins.
Human productivity has always been markedly more important than other software features, like performance, reliability, and security, unless the domains in which programs are being developed require an intense focus on certain attributes. I’m sure the DOA prioritizes security far above productivity, but the same isn’t true of most of the industry. This was true back in the COM days, and is still true to this day (perhaps more so). So it’s safe to conclude that the designers of COM had “ease of development” at the forefront of their minds when creating it. That coupled with the kind of multithreading in use back then on Windows machines (servers), putting an emphasis on lack of sharing, lead to the development of the Single Threaded Apartment (STA) model. And, related, were COM+’s addition of explicit synchronization contexts which took the STA auto-synchronization idea and generalized it to make synchronization policies more customizable.
These features made synchronization, an often-impossible task, and less important to be precise about when isolation is pervasive, much simpler. Instead of having to test a million different machine configurations, various difficult-to-predict-ahead-of-time component interactions, and so on, a component got the STA stamp and was guaranteed safe in a multithreaded environment. The alternative then is the alternative today: go free-threaded (MTA or NTA) and deal with all of the nasty synchronization problems that arise “the old fashioned way.” In other words, use locks and events, and run the risk of race conditions, deadlocks, and various other latent bugs that would ruin the composability and reliability of any less-than-bulletproof component. Sadly, “the old fashioned way” is still “the state of the art” until we build a better mousetrap.
Now, the STA’s gotten a really bad rap over the years. (I’ll ignore synchronization contexts for the time being but just about everything I say applies to them too.) It’s true that STAs cause us a lot of problems when thinking about legacy compatibility, and will make it just that much more difficult to migrate legacy Windows apps over to a massively parallel world, but I’m going to stick my neck out and make a claim that won’t win me friends (and in fact might lose me some): STAs aren’t entirely evil, and are an interesting idea that we as a community can learn a lot from. What’s more, we have years of experience using them. I see a lot of people basically reinventing the STA model, often without realizing it due either to a lack of understanding of (or interest in) COM or simply a lack of pattern matching abilities. “History will repeat itself, because nobody was listening the first time.”
Automatic synchronization is now the holy grail of the new multicore era. STM is another attempt at that. Active objects, however, which have shown up in numerous places are another more closely related attempt to the STA. Yet another closely related technology is message passing in general, where isolated domains of control do not share state and instead communicate via disconnected message passing. All strive to attain similar goals, improved developer productivity and safety, usually with some performance or scaling overhead. The biggest difference, from my perspective, is that design priorities are now different due to the environment at the time these things are being created. It’s clear today that any automatic synchronization technology we invent should scale to hundreds (perhaps thousands) of processors, not one or two (or, at the extreme, eight), that fine-grained parallelism will become more and more important, and that the degree of sharing will be high, whether that means logically (by message exchange) or physically (in the most literal shared memory sense).
Clearly the worst aspect of COM STAs is that they are obviously not up for the task of scaling like this at such a fine-grain, because a single thread is responsible for executing all code for some particular set of objects in the process. It's just plain impossible to parallelize finer than the granularity of a single component, and it's common to glump many components together into one apartment which is worse. As the number of available processors grows, and/or the number of objects instantiated inside a particular STA which need to interact, scalability suffers. Sadly we’ve inherited huge hunks of code that have been written in this fashion, with all of the assumptions about the multithreading environment in which the components will be deployed as immutable laws.
But there are good things about COM STAs! They are brain-dead simple in the most common cases. Synchronization doesn’t take nearly as much brainpower and development time away from the component creation process, improving developer productivity and the robustness of the software written. So long as your STA component never blocks or performs a cross-apartment invocation, life remains very simple. This is an example of a leaky abstraction, however, because it’s not always evident to the programmer when this chasm has been crossed. Proxies do attempt to hide the gunk of crossing the chasm, though at the risk of introducing reentrancy, which itself comes with a lot of baggage. I’d like to stop and point out something at this point, perhaps helping to support the “reinventing the wheel” claim earlier. Active objects and message passing systems generally suffer from similar problems. If one object uses another (by enqueueing a message) and then, at some point, waits for a response message to arrive, there is the risk that the thread which is now blocked will need to itself respond to a message coming from another object. Ahh, the classic reentrancy versus deadlock tradeoff. Event-driven, stackless systems like the Concurrency and Coordination Runtime (CCR), etc., mitigate this problem but require a fundamentally different way of programming. UI programmers are generally more comfortable with this approach. And linear types a la Singularity's exchange heap also offers a promising way to enable concurrency, but to safely guarantee certain state will not be shared.
In the end, COM STAs are still an invention I wish we could do away with. I think of the technology a bit like a cheap, half-way immitation of Hoare's CSPs. But at the same time, I fear we as an industry will continue to reinvent them, just under a different guise or with subtly different nuances. We need to resist the urge to pretend they don't exist just because they contain the letters C, O, and M and because the sound of STA is known to trigger feelings of intense nausea. What’s scary to me is that, STM aside, there doesn’t seem to be any super-promising alternative to the automatic synchronization problem for shared memory, aside from provable declarative and functional safety. As I’ve noted above, true fine-grained message passing has a lot of similar issues, but I do wonder at the end of the day if Joe Armstrong has been right all along. (Well, Tony Hoare really deserves the credit, and perhaps David May too, but Erlang is en vogue currently.) Time will tell.
 Saturday, September 15, 2007
Two articles about ParallelFX (PFX) are in the October issue of MSDN magazine and have been posted online:
- Parallel LINQ: Running Queries on Multi-Core Processors. An overview of an implementation of LINQ-to-Objects and -XML which automagically uses data parallelism internally to execute declarative language queries. It supports the full set of LINQ operators, and several ways of consuming output in parallel.
- Parallel Performance: Optimize Managed Code for Multi-Core Machines. Describes the Task Parallel Library (TPL), a new "thread pool on steroids" with cancelation, waiting, and pool isolation support, among many other things. Uses dynamic work stealing techniques (see here and here) for superior scalability.
As noted in the article, there's a PFX CTP planned for 2007*. Watch my blog for more details when it's available.
*Note: some might wonder why we released the articles before the CTP was actually online. When we originally put the articles in the magazine's pipeline, our intent was that they would in fact line up. And both were meant to align with PDC'07. But when PDC was canceled, we also delayed our CTP so that we had more time to make progress on things that would have otherwise been cut. It's less than ideal, but I'm still confident this was the right choice.
 Wednesday, September 12, 2007
Lock recursion is usually a bad idea. It can seem convenient (at first), but once the slippery slope of making calls from critical regions into complex ecosystems of code is embarked upon (which is usually a necessary pre-requisite to lock recursion, except for some relatively simple cases), it’s easy to accidentally fall right off the edge. This topic was part of the doc I wrote previously about using concurrency inside of reusable libraries. My opinions haven't changed much since then.
Lock recursion coupled with condition variables is even worse. In fact, its behavior might surprise you.
To motivate this, would you ever think of writing code that does something like this?
void BreakAtomicity() { … I assume somebody called me with a recursive lock on ‘obj’ … Monitor.Exit(obj); Monitor.Exit(obj); … Do something … Monitor.Enter(obj); Monitor.Enter(obj); }
I should certainly hope not! Unless you're crazy or reeeeally know what you're doing. Who knows what state invariants are busted at the time the call to BreakAtomicity was made? Releasing the lock in this manner hoists these ticking timebombs onto the other threads into the process that might want to inspect the shared state. If you, the author of BreakAtomicity, have all-knowing omnipresent knowledge of the entire program, perhaps you know precisely. But, particularly in the case of recursion, where it's all-too-common to engage in practices of sloppy composition, this is actually quite unlikely. Lock recursion is typically used for convenience, not because of a really solid design that is based on clean algorithmic recursion.
What does this example have to do with condition variables anyway? Glad you asked! It matters because of what happens when you wait on a monitor that has been recursively acquired. In such cases, Monitor.Wait will release all recursive acquires as part of its waiting. I.e. if it has been acquired 10 times, it is released 10 times before waiting. It does this, of course, because otherwise it would deadlock waiting for some other thread to make a call to Monitor.Pulse/PulseAll (since a separate thread needs to first acquire the lock in order to do either). This is symmetric, so once the thread has been awoken, it will reacquire the lock as many times as needed before returning to attain the same level of recursion that existed prior to the call.
Now, Monitor.Wait breaks atomicity anyway. This is obvious. It releases and reacquires the lock internally, and so any conditions regarding shared state that exist prior to the call cannot be assumed to exist after the call returns. (Most) people understand this and tend to use Wait in fairly common and safe patterns, such as guarded regions where some predicate is checked for validity at the very front of a critical region before doing anything interesting with state. But the really nasty thing about recursive locks and the Wait behavior described above is that this breaks atomicity for some unknowing number of nested critical regions that have existed for some unknown amount of time leading up to the Wait. This is a recipe for pain. My recommendation is probably predictable: just following the broadly accepted advice that, because lock recursion is evil to begin with, it is best avoided, and you will safely avoid the more complicated case outlined above.
It’s worth pointing out that the new CONDITION_VARIABLE in Vista, i.e. SleepConditionVariableCS and SleepConditionVariableSRW, only release the lock once, despite recursive acquire counts. (SRWL doesn’t officiailly support recursion, although it works for shared locks since there is no affinity used and it is undetectable.) Deadlocks result instead. From an editorial perspective, I prefer this behavior quite a bit, since it’s easier to debug. (Admittedly, if Monitor’s behavior is what you want, it’s less than straightforward to achieve, unless you know the recursion counter somehow. Although, I will also note that I am convinced very few real people would want Monitor’s current behavior...) My preferred solution to this would have been to throw an exception, since I do think issuing a Wait when locks have been recursively acquired is in most cases a bug. As a workaround, we could have exposed a RecursionCount on Monitor so that a developer could manually exit the lock RecursionCount-1 times before the call to Wait and then reacquire it RecursionCount-1 times after the call returns. (Actually no -- I would have made Monitor non-recursive by default, like the new ReaderWriterLockSlim.) Sadly, I guess I'm only about 10 years too late...
 Wednesday, August 22, 2007
Most managed code in the .NET Framework has not been hardened against asynchronous exceptions. This includes out of memory (OOM) conditions and asynchronous thread aborts, and is entirely by design. Hardening against OOM, for example, is historically an extraordinarily difficult feat, and few systems undertake the development and QA costs needed to do so. (FWIW, the CLR VM is one such system.) Simply failing gracefully is usually hard enough. Failing gracefully is admittedly leaps and bounds easier in managed code because allocation failures are communicated via exceptions rather than return values, and are thus transitively propagated “by default.” Thread aborts are even more difficult to harden against, however, because they can originate at any instruction (with a handful of exceptions). Ensuring data invariants are protected for every single instruction is clearly just a little difficult.
These things are certainly not impossible. With enough effort, you can make inroads toward solutions for both issues. Portions of the .NET Framework have gone to such lengths. For example, code that manipulates process-wide state spanning AppDomains needs to ensure that this state is not corrupted by an unfortunately placed thread abort when run inside systems like SQL Server that use aborts to tear down boundaries of isolation. While possible, the important thing to understand here is that most of the .NET Framework is in fact not resilient to these things. See this doc as an example of guidance the CLR team provided to other developers inside of Microsoft to this effect. OOMs are in a similar category, though many subsystems take different, inconsistent approaches to memory allocation failures (e.g. WPF takes a different stance than WCF).
All of this is a long winded build up to the following problem: thread interrupts are just about as evil as these sorts of asynchronous exceptions. The failure injection points are more constrained—e.g. an OOM can occur wherever an allocation occurs, a thread abort can happen in between nearly any two instructions, and thread interruptions can only occur at blocking calls that transition the managed thread into the state WaitSleepJoin—but this doesn’t change the fact that most code is unprepared to deal properly with such interruptions. Once again, it’s not that managed code cannot be constructed to be resilient to interruptions—in fact, it’s much easier than OOMs and thread aborts—it’s simply that the .NET Framework hasn’t been constructed to tolerate arbitrary interruptions. If threads are calling into these APIs and thread interruptions are provoked, state corruption, memory leaks, and possible deadlocks can be left in the wake.
To take a brief example of where such a problem might crop up, imagine a thread has blocked on FileStream.EndRead because it is finishing some asynchronous IO operation. After a brief inspection of the code, I’m convinced interrupting the call it makes to WaitHandle.WaitOne internally will lead to a memory leak:
if (1 == Interlocked.CompareExchange(ref result._EndXxxCalled, 1, 0)) { __Error.EndReadCalledTwice(); } WaitHandle handle = result._waitHandle; if (handle != null) { try { handle.WaitOne(); } finally { handle.Close(); } } NativeOverlapped* nativeOverlappedPtr = result._overlapped; if (nativeOverlappedPtr != null) { Overlapped.Free(nativeOverlappedPtr); }
The method ensures only one call to EndRead can occur, and will throw on subsequent attempts. So the above code will only ever run once. Sadly, EndRead needs to free the NativeOverlapped structure used internally for asynchronous IO completion. But because the call to Overlapped.Free follows the call to WaitOne, and doesn’t occur inside of a finally block, it won’t execute. In summary: interrupt that call to WaitOne, and boom, we leak a NativeOverlapped object. Whether or not this is disastrous of course depends on the precise scenario. A few bytes here and a few bytes there can quickly add up, particularly for long running programs. At least this particular example protects invariants sufficiently well to avoid state corruption that would lead to further unpredictability. But recall that this is just one example. In my experience, the BCL represents some of the most carefully written code in the .NET Framework, so this problem is undoubtedly scattered about all over the place.
Unfortunately, it’s become somewhat common advice that using thread interruption as a synchronization and control mechanism is a GoodThing™. Andrew Birrell, a researcher from Microsoft Research, for example, suggested this in his paper “An Introduction to Programming with C# Threads”:
“Interrupts are most useful when you don’t know exactly what is going on. For example, the target thread might be blocked in any of several packages, or within a single package it might be waiting on any of several objects. In these cases an interrupt is certainly the best solution. Even when other alternatives are available, it might be best to use interrupts just because they are a single unified scheme for provoking thread termination.” (p33)
While I am sure this advice is well intentioned, it is extremely dangerous for the subtle reasons outlined above and can lead to reliability problems in any programs that follow it. My recommendation is to build this kind of higher level synchronization into the code that you actually own, and handle shutdown and interruption logic yourself. This is a bit cumbersome and is more work, but it also ensures that arbitrary blocking points in the libraries you use will not be affected by interruptions.
With the increase in hardware parallelism over the coming years, I worry that the use of interruptions will become more widespread as a popular technique developers use to control threads. And as more and more of the .NET Framework uses higher degrees of concurrency, necessarily requiring more internal synchronization, the number of blocking points that are vulnerable to this kind of abuse will grow accordingly. So, please, do your part… avoid Thread.Interrupt like the plague. In fact, perhaps we should deprecate it.
 Monday, July 30, 2007
My book, Concurrent Programming on Windows, is shaping up quite nicely. (Given that I've been working on it for over a year now, I suppose it had better be!) I’ve been surprised at the amazing level of anticipation and excitement from blog readers, coworkers, and Microsoft customers, and I really can’t wait for it to be finished. Thanks for the patience so far.
I feel like I’m almost on the home stretch. End of September is my current target for completion. It’s looking like it’ll contain 18 chapters, with 3 appendices, and will have a total running length of somewhere around 700 pages. The reasons it has taken so long are numerous, but the primary reason is that the content is quite deep and detail-oriented—more than I expected at the start—and I’ve wanted to take the time to get it just right rather than cut corners. My editors recently gave me feedback that there will be very little developmental editing required, since I’m (ahem) very, uhh, meticulous when it comes to writing. And feedback from technical reviewers has been very positive as well. I think both are good news.
I’m confident the end product will be worth the wait.
In the meantime, it seems that some of the abstractions I’ve built while writing the book will likely become part of a future release of the .NET Framework. Keep an eye out on Channel9 for some additional details in a few months time. Right now is a super exiting time to be in the field of computer science, that’s for sure... Laissez les bons temps rouler!
 Saturday, July 21, 2007
We're hiring developers and testers for the Parallel Computing team in Microsoft's Developer Division.
Update: Here are links to the jobs on microsoft.com:
Our team works regularly with MSR and the CTO, as well as other Developer Division teams like CLR, VS, C#, VB, etc.
If you want to help define and build the next generation of concurrency support in C# and the .NET Framework, this is your chance.
If you want to work closely with supersmart folks like Anders, Burton, and David, wait no longer.
Send me an email at joedu at microsoft dot com if you are interested.
 Friday, July 20, 2007
Whether or not it’s possible for an object to be published before it has been fully constructed is perhaps the most common .NET memory model-related question that arises time and time again. In fact, there was a discussion this week on an internal .NET alias, and another a couple weeks ago in the Joel on Software forums.
The basic question is: Can one thread read a pointer to an object whose constructor has not finished running on a separate thread?
This pattern pops up quite a bit in lazily initialization scenarios, for instance. For example, given some class C:
class C { public int f; public static C s_c; public C() { f = 55; } }
And some code that lazily initializes and then uses the object:
if (s_c == null) s_c = new C(); Console.WriteLine(s_c.f);
Specifically, is it possible in this case to write 0 (or garbage) to the console, instead of 55?
(Note that related examples, like the Joel on Software thread, use separate initialization routines or steps before publishing the pointer. It boils down to the same issue.)
How could observing anything other ‘f’ value than 55 possibly happen anyway, you might wonder? Well, since some processors are free to execute certain instructions out-of-order, the write of the return value of ‘new C()’ could theoretically retire before the write to that instance’s ‘f’ field. This isn’t an issue on X86, since the processor memory model doesn’t permit it, but architectures like IA64 do permit such reordering. Moreover, some compilers might decide to reorder writes; in this example, if the constructor were to be inlined, the compiler could subsequently use code motion to delay the write to the field.
(Note: obviously the constructor could publish a reference to 'this' before it has finished. In this case, clearly other threads could then access the instance before it was fully constructed.)
On .NET, the answer is no, this kind of code motion and processor reordering is not legal. This specific example was a primary motivation for the strengthening changes we made to the .NET Framework 2.0’s implemented memory model in the CLR. Writes always retire in-order. To forbid out-of-order writes, the CLR’s JIT compiler emits the proper instructions on appropriate architectures (i.e. in this case, ensuring all writes on IA64 are store/release). Although reads can retire out-of-order, the data dependence on the pointer value being published prevents subsequent read of fields from happening before the read of the pointer itself. So thankfully this simply cannot happen.
A lot of .NET code out there, including code in the Framework itself, would have suddenly been open to reordering bugs when the CLR 2.0 shipped with IA64 support had we not made this decision. We decided to sacrifice some performance on one particular architecture (and possibly subsequent ones) to ensure these tricky races didn’t bite people unexpectedly, and to avoid a costly audit of the entire Framework.
Lastly, I will note a couple things. First, this strength is not specified in ECMA, so other implementations of the CLI do not provide such guarantees. (I hope one day we decide to standardize the stronger model.) I don’t know what Mono implements, but it may be weaker. Second, the Java Memory Model does not prohibit such publication reorderings, unless the assignments are to a ‘final’ field. So I’m sure people who are familiar with the JMM will assume this pattern is broken on .NET and use locks and/or explicit memory barriers instead. This approach is more conservative and still leads to correct code, however, so it really matters very little for most code.
 Sunday, June 24, 2007
In response to a previous post, a reader said
“I was under the impression that monitors were implemented in .NET in a multiplexed way, so that events are only allocated to an object while there is contention - and that they aren't "sticky", becoming permanently attached to the object.”
This is absolutely correct. My nulling out of the object reference in the example only has the slight advantage of promoting the object’s collection sooner, but it does not have the effect of speeding up the reclamation of the internally managed monitor state. My original posting erroneously said that it would.
Let’s take a quick step back, and see exactly what this means.
Monitors are comprised of two capabilities: critical regions (i.e. Monitor.Enter and Exit), to achieve mutual exclusion, and condition variables (i.e. Monitor.Wait, Pulse, and PulseAll), to coordinate between threads. Any CLR object can be used as a monitor.
For the critical region case, the CLR uses an efficient thin lock which simply embeds locking information as a bit pattern inside the object’s header word. Other parts of the system also try to use the header, e.g. when caching an object’s default hash-code, COM interop, etc. There are limits to what can be stored in the header, so use of any two of these things simultaneously causes inflation, meaning the object header’s contents become an index into a table of sync blocks. Sync blocks are just little data structures capable of holding all of that state simultaneously. The CLR manages a system-wide table of them and recycles and reuses them as objects need them. Another event that causes inflation is the first occasion on which a thread tries to enter the critical region while another thread holds it (i.e. contention).
When contention arises, the CLR will spin briefly before truly waiting, but it may eventually need to allocate a Windows kernel event object. This is an auto-reset (synchronization) event, and a handle to it gets stored on the sync block. Waiting threads just wait on it, and threads exiting the critical region will set it (if the wait count is non-zero). Note that this leads to unfair behavior, because threads can steal the critical region between the signal and the wake-up, but helps to prevent convoys.
Condition variables are implemented slightly differently. Each CLR thread object has a single event object dedicated to it. The first time a thread calls Wait on a condition variable, the event is lazily allocated. And then the thread simply places its own thread-local event into a list of events associated with the monitor. Registering the event also requires inflation to a sync block, if it hasn’t happened already, because obviously the event list can’t be stored in the object header. When a Pulse happens, the pulsing thread just signals the first event in the list. Waiting and pulsing is thus actually somewhat fair, but there are other races that can eliminate this that I won't get into. When a PulseAll occurs, the pulsing thread walks the whole list and signals each.
So now back to the question: when are sync blocks reclaimed?
When a GC is triggered, objects in the reachability traversal may have their sync blocks reclaimed, even if the object in question is still alive, and made available again in the system-wide pool of reusable sync blocks. This reclamation can happen so long as the sync block isn’t needed permanently (as would be the case if COM interop information was stored inside of it) and the sync block isn’t marked precious. A sync block is precious anytime there is a thread inside of the object’s critical region, when a thread is waiting to enter the critical region, or when at least one thread has registered its event into the associated condition variable list. Notice that orphaning monitors can thus lead to leaking events, because they will remain precious, unless the monitor object itself becomes unreachable. When a sync block is reclaimed in this fashion, certain reusable state is kept, like the critical region event object, so that the next monitor to use the sync block can reuse it.
 Saturday, June 09, 2007
Windows Vista has a new one-time initialization feature, which I’m pretty envious of being someone who writes most of his code in C# and answers countless questions about double-checked locking in the CLR. Rather than sprinkling double-checked locking all over your code base, along with the ever-lasting worry in the back of your mind that you’ve gotten the synchronization incorrect, it's a better idea to consolidate it into one place.
That’s the purpose of the LazyInit<T> and LazyInitOnlyOnce<T> structs below. Both let you specify an “initialization” routine (as a delegate) which gets invoked at the appropriate time to lazily initialize the state. The only difference between the two is that LazyInit<T> might invoke your delegate more than once, due to races, but it will ensure only one value “wins”. LazyInitOnlyOnce<T> does the extra work to ensure the initialization routine only gets called once, though at a slightly higher cost: we might need to block a thread, which means allocating a Win32 event.
Why the two? I had originally written this with a Boolean specified at construction time to pick one over the other, but this required an extra object field which, for LazyInit<T> which was never used, along with a Boolean field. I defined both as structs to make them super lightweight to use, and getting rid of the extra two fields seemed worth the extra baggage of an extra class, given that such a type could end up used very pervasively throughout a large code-base. As it stands, LazyInit<T> is just the size of a pointer plus the size of T. LazyInitOnlyOnce<T> adds one additional pointer to that.
To start with, both use the same Initializer<T> delegate:
public delegate T Initializer<T>();
And here’s LazyInit<T>, the simpler of the two:
public struct LazyInit<T> where T : class { private Initializer<T> m_init; private T m_value;
public LazyInit(Initializer<T> init) { m_init = init; m_value = null; }
public T Value { get { if (m_value == null) { T newValue = m_init(); if (Interlocked.CompareExchange(ref m_value, newValue, null) != null && newValue is IDisposable) { ((IDisposable)newValue).Dispose(); } }
return m_value; } } }
Note that T is constrained to a reference type, so that we can use a null check to determine when initialization is needed. We could have used a separate Boolean, but this would required adding another field as well as considering some trickier memory model issues.
If the Interlocked.CompareExchange fails, it means we lost the lazy initialization race with another thread, and thus just return the value the other thread produced. We also Dispose of the garbage object if it implements IDisposable. This pattern is very common in lazy initialization scenarios, like allocating an expensive kernel object lazily on demand. We’d prefer to get rid of it right away since we know it will never be used.
I wish there was a way to make boxing a compile-time error for some value types. Clearly you don't ever want to box one of these, because making a copy will entirely break the synchronization guarantees.
I’ve omitted some error checking, like ensuring m_init actually got initialized to a non-null value.
Say you need a lazily initialized event on your object. You would just do this:
public class C { private LazyInit<EventWaitHandle> m_event; private object m_otherState; public C() { m_event = new LazyInit<EventWaitHandle>( delegate { return new ManualResetEvent(false); }); m_otherState = ...; } ... private void DoSomething() { ... if (... need to set the event ...) m_event.Value.Set(); } }
And lastly, here’s LazyInitOnlyOnce<T>:
public struct LazyInitOnlyOnce<T> where T : class { private Initializer<T> m_init; private T m_value; private object m_syncLock;
public LazyInitOnlyOnce(Initializer<T> init) { m_init = init; m_value = null; m_syncLock = null; }
public T Value { get { if (m_value == null) { object newSyncLock = new object(); object syncLockToUse = Interlocked.CompareExchange( ref m_syncLock, newSyncLock, null); if (syncLockToUse == null) syncLockToUse = newSyncLock; lock (syncLockToUse) { if (m_value == null) m_value = m_init(); m_syncLock = null; m_init = null; } }
return m_value; } } }
We use a monitor to ensure mutual exclusion. I lazily allocate the object used for synchronization, but this is clearly a tradeoff. We pay for the added complexity to the code and the extra interlocked instruction (on the slow path), but avoid having to allocate an extra object when we create the struct itself and keep it alive, when we might not ever need it. There’s already an allocation for the delegate, but this just means there’s one instead of two.
It may also not be obvious why I null out the m_syncLock field before exiting. If we don't, the object will remain live as long as the lazily initialized variable remains live. We want the object to be GC'd as soon as possible, because it is no longer needed.
You can use a class constructor in .NET to acheive a similar effect. Static field initializers, however, execute in the class constructor, meaning if you have multiple lazily initialized objects or static methods, they all get initialized at once. This is much more like LazyInitOnlyOnce<T> than LazyInit<T>, since the CLR uses locks to prevent the class constructor from running on multiple threads at once.
Anyway, there’s very little that is novel here. But I do believe having these primitives in the .NET Framework would be immensely useful. It would at least help steer people towards the recommended and most efficient lazy initialization pattern, which is to use double-checked locking, rather than having them possibly pursue more complicated designs. It also removes the need to worry about volatile and Thread.MemoryBarrier, for those that aren't knowledgeable of the work we did in the CLR 2.0 to ensure double-checked locking works properly. Lastly, it has the added benefit of getting rid of tricky calls to Interlocked.CompareExchange and lock statements scattered throughout your code, in favor of something more declarative. What do you think?
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 30 | 1 | 2 | 3 | 4 | 5 | 6 | | 7 | 8 | 9 | 10 | 11 | 12 | 13 | | 14 | 15 | 16 | 17 | 18 | 19 | 20 | | 21 | 22 | 23 | 24 | 25 | 26 | 27 | | 28 | 29 | 30 | 31 | 1 | 2 | 3 | | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Browse by Category:
Notables:
|