|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Sunday, November 11, 2007
I’ve been asked a number of times about immutable types support for C#. Although C# doesn't offer first class language support in the way that F# does, you can get pretty far with what you do have in your hands already. Nothing prevents you from creating immutable data structures today, of course, but the problem is that there’s no compiler or runtime support to ensure you’ve done it right.
I just hacked together some new attributes and a handful of FxCop rules as an experiment. I’ve been very happy with the result. Sure it’s not baked into the language, but it’s a start. If there’s any interest, I can make the code available so you can play with it too.
Attributes and analysis
Imagine we had an ImmutableAttribute. Annotating a type with it indicates that objects of that particular type are immutable, i.e. that their state never changes after being constructed. This is great from a concurrency standpoint because it means access to such objects do not require synchronization. This can lead to more efficient code that not only has a higher chance of being correct but is also vastly easier to maintain. Well, what kind of restrictions would such a type be subject to?
1. Immutable types must have only initonly fields.
The first rule takes advantage of existing CLR type system and language support for initonly fields (a.k.a. readonly in C#). Marking a field as initonly ensures it is never written to after the constructor has finished executing. So long as all fields are initonly, the class is effectively already “shallow” immutable.
2. Immutable types must only contain fields of immutable types.
The second rule ensures transitivity, or “deep” immutability. The mutability of a complex object is typically, but not always, comprised of not only its own fields but also the state in the objects it refers to. With this rule and the prior rule, we’re about 90% there.
3. Immutable types should only subclass other immutable types.
To give the appearance that a particular object is immutable, that object’s type must not depend on other types that are mutable, as articulated by the previous rule. The ‘base’ reference is effectively just another field, and so this rule is derived from the previous one. If an immutable type could inherit mutable members and fields, then it wouldn’t really be immutable after all.
4. Mutable types should not subclass immutable types.
Similar to the previous rule, it is also a bad thing if a mutable subtype can override behavior from the subtype, giving the appearance of mutability. Say we have an immutable class IC with a virtual method f, and some mutable subclass MC overrides f to introduce logic that logically mutates the state of an object. Although the rules above are sufficient to ensure that the object is physically immutable, this can circumvent immutability safety through polymorphism. A related piece of advice: public immutable types should be sealed, to prevent outside classes that do not abide by immutability analysis from breaking code which assumes a given type is immutable. Alternatively, virtual members can be eliminated.
5. Immutable types must not leak the ‘this’ reference during construction.
This rule is subtle. Although initonly ensures fields are never written to after construction, these fields may be written to any number of times while an object is actively being constructed. If some code called during construction publishes a reference to the object (e.g. by storing it in a static variable), then other threads might access the object while it still appears to be mutable. They may witness a partially initialized object, an object whose fields are still changing, and so on.
And that’s it! 5 simple rules. It may sound complicated, but the code to perform the static analysis for all but the last one is straightforward and a dozen-or-so lines apiece. A few things are worth mentioning. First, the CLR’s memory model ensures that, after an object is constructed and published, reads of its fields cannot be reordered to break immutability. Additionally, there are many immutable types in the CLR today that are not verified as such. So in my analysis rules, I have hard-coded a set of well known immutable types so that you can use them w/out problem: Object, String, DateTime, TimeSpan, Boolean, Byte, SByte, Int16, Int32, Int64, IntPtr, UInt16, UInt32, UInt64, UIntPtr, Decimal, Double, Single, and ValueType. Ideally if we supported this in the .NET Framework, we'd annotate them.
Impact to imperative programming
Programming in an immutable world is rather tricky. As someone who has done most of his programming for the past 10+ years in C-style languages, I just take for granted that data structures change over time. With immutability, there tends to be a whole lot more copying and functional-style function calling, where data structures are passed as an input argument and the “mutated” copy is returned as an output argument. I’m trying to kick the mutability habit, since I fully believe immutability is one key to being successful with massive degrees of parallelism. And it usually leads to cleaner code too.
But it’s hard. Using something as simple as an array field on an immutable type will fail the above rules, since the CLR’s array types are mutable. I’ll explore building one below, but this probably points to a need for better immutability support in the .NET Framework. It’s not too difficult to imagine providing base classes for common needs when building immutable data structures.
Circumventing the analysis
As you begin to explore immutable types in a bit more depth, you’ll realize there are often cases where immutability-by-cleverness is possible. That is to say, although one or more of the rules above have been violated, the end result still appears to be immutable. I can build an immutable list out of a linked list to avoid depending on CLR arrays, and mark the nodes as immutable, but they must refer to elements stored within the list. Should we require the elements to also be immutable? Perhaps, but perhaps not, depending on whether you consider the state of the list to also include the state of the elements inside it. Usually that wouldn't be the case. And, besides, if we know what we’re doing, we can create an immutable list based on an array anyway, which enables O(1) IList<T>-style random access. We just need to be careful to encapsulate the array object and to never store an element into it post-construction.
To facilitate working around some of the rules in ways that are often necessary, I have provided options on ImmutableAttribute to suppress certain checks. Additionally, there is a MutableAttribute which can mark certain fields to indicate they are not subject to the same restrictions as other fields on an immutable type.
An ImmutableList<T>
As an illustration, here is an ImmutableList<T>. It implements IList<T>, but sadly it must throw exceptions in several circumstances because both IList<T> and ICollection<T> offer methods that are intrinsically mutable. Undoubtedtly there are bugs because I whipped it up quickly and have omitted a lot of needed error checking. I just wanted to give the general idea of how it might be done:
/// <summary> /// A list that has been written to be observationally immutable. A mutable array /// is used as the backing store for the list, but no mutable operations are offered. /// </summary> /// <typeparam name="T">The type of elements contained in the list.</typeparam> [Immutable] public sealed class ImmutableList<T> : IList<T> {
[Mutable] private readonly T[] m_array;
/// <summary> /// Create a new list. /// </summary> public ImmutableList() { m_array = new T[0]; }
/// <summary> /// Create a new list, copying elements from the specified array. /// </summary> /// <param name="arrayToCopy">An array whose contents will be copied.</param> public ImmutableList(T[] arrayToCopy) { m_array = new T[arrayToCopy.Length]; Array.Copy(arrayToCopy, m_array, arrayToCopy.Length); }
/// <summary> /// Create a new list, copying elements from the specified enumerable. /// </summary> /// <param name="enumerableToCopy">An enumerable whose contents will /// be copied.</param> public ImmutableList(IEnumerable<T> enumerableToCopy) { m_array = new List<T>(enumerableToCopy).ToArray(); }
/// <summary> /// Retrieves the immutable count of the list. /// </summary> public int Count { get { return m_array.Length; } }
/// <summary> /// A helper method used below when a mutable method is accessed. Several /// operations on the collections interfaces IList<T> and /// ICollection<T> are mutable, so we cannot support them. We offer /// immutable versions of each. /// </summary> private static void ThrowMutableException(string copyMethod) { throw new InvalidOperationException( String.Format("Cannot mutate an immutable list; " + "see copying method '{0}'", copyMethod)); }
/// <summary> /// Whether the list is read only: because the list is immutable, this /// is always true. /// </summary> public bool IsReadOnly { get { return true; } }
/// <summary> /// Accesses the element at the specified index. Because the list is /// immutable, the setter will always throw an exception. /// </summary> /// <param name="index">The index to access.</param> /// <returns>The element at the specified index.</returns> public T this[int index] { get { return m_array[index]; } set { ThrowMutableException("CopyAndSet"); } }
/// <summary> /// Copies the list and adds a new value at the end. /// </summary> /// <param name="value">The value to add.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndAdd(T value) { T[] newArray = new T[m_array.Length + 1]; m_array.CopyTo(newArray, 0); newArray[m_array.Length] = value; return new ImmutableList<T>(newArray); }
/// <summary> /// Returns a new, cleared (empty) immutable list. /// </summary> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndClear() { return new ImmutableList<T>(new T[0]); }
/// <summary> /// Copies the list and modifies the specific value at the index provided. /// </summary> /// <param name="index">The index whose value is to be changed.</param> /// <param name="item">The value to store at the specified index.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndSet(int index, T item) { T[] newArray = new T[m_array.Length]; m_array.CopyTo(newArray, 0); newArray[index] = item; return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="item">The element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemove(T item) { int index = IndexOf(item); if (index == -1) { throw new ArgumentException("Item not found in list."); }
return CopyAndRemoveAt(index); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="index">The index of the element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemoveAt(int index) { T[] newArray = new T[m_array.Length - 1]; Array.Copy(m_array, newArray, index); Array.Copy(m_array, index + 1, newArray, index, m_array.Length - index - 1); return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list adn inserts a particular element. /// </summary> /// <param name="index">The index at which to insert an element.</param> /// <param name="item">The element to insert.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndInsert(int index, T item) { T[] newArray = new T[m_array.Length + 1]; Array.Copy(m_array, newArray, index); newArray[index] = item; Array.Copy(m_array, index, newArray, index + 1, m_array.Length - index); return new ImmutableList<T>(newArray); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Add(T item) { ThrowMutableException("CopyAndAdd"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Clear() { ThrowMutableException("CopyAndClear"); }
/// <summary> /// Checks whether the specified item is contained in the list. /// </summary> /// <param name="item">The item to search for.</param> /// <returns>True if the item is found, false otherwise.</returns> public bool Contains(T item) { return Array.IndexOf<T>(m_array, item) != -1; }
/// <summary> /// Copies the contents of this list to a destination array. /// </summary> /// <param name="array">The array to copy elements to.</param> /// <param name="index">The index at which copying begins.</param> public void CopyTo(T[] array, int index) { m_array.CopyTo(array, index); }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> public IEnumerator<T> GetEnumerator() { for (int i = 0; i < m_array.Length; i++) { yield return m_array[i]; } }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<T>)this).GetEnumerator(); }
/// <summary> /// Finds the index of the specified element. /// </summary> /// <param name="item">An item to search for.</param> /// <returns>The index of the item, or -1 if it was not found.</returns> public int IndexOf(T item) { return Array.IndexOf<T>(m_array, item); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.Insert(int index, T item) { ThrowMutableException("CopyAndInsert"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> bool ICollection<T>.Remove(T item) { ThrowMutableException("CopyAndRemove"); return false; }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.RemoveAt(int index) { ThrowMutableException("CopyAndRemoveAt"); }
}
I won’t spend much time going over this code. Just notice that the type is marked with the ImmutableAttribute, the array field is marked with the MutableAttribute (since it’s not itself an immutable type and would fail the analysis otherwise), and that any operations that modify the list must make a copy of the entire thing.
Summary
This has been an interesting exercise. Through it, I have come to realize that first class immutability in the type system is not such a farfetched dream. The most onerous aspect to it is probably the restrictions it imposes on subclassing in the programming model, effectively bifurcating the type system into those types that are mutable and those types that are immutable. But, in the end, I’m not so sure it’s too bad a problem: interchanging the two seems like a very bad idea anyway.
Feedback on all of this would be appreciated. Do you see it as useful? If you had it, would you use it in your programs today? Do you believe that it is one step needed (of many!) to bring us towards a world in which building concurrent programs is simpler?
 Saturday, November 10, 2007
There are several docs out there that describe the CLR memory, most notably this article.
When describing the model, one can either use acquire/release, barrier/fence, or happens-before terminology. They all acheive the same goal, so I will simply choose one, acquire/release: an acquire operation means no loads or stores may move before it, and a release operation means no loads or stores may move after it. I can explain it with such simple terms because the CLR is homogeneous in the kinds of operations it permits or disallows to cross such a barrier, e.g. there's never a case where loads may cross such a chasm but stores may not.
Despite the great article referenced above, I find that it’s still not entirely straightforward. It is important to code to a well-understood abstract model when writing lock-free code. For reference, here are the rules as I have come to understand them stated as simply as I can:
- Rule 1: Data dependence among loads and stores is never violated.
- Rule 2: All stores have release semantics, i.e. no load or store may move after one.
- Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.
- Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).
- Rule 5: Loads and stores to the heap may never be introduced.
- Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.
Note that by this definition, non-volatile loads are not required to have any sort of barrier associated with them. So loads may be freely reordered, and writes may move after them (though not before, due to Rule 2). With this model, the only true case where you’d truly need the strength of a full-barrier provided by Rule 4 is to prevent reordering in the case where a store is followed by a volatile load. Without the barrier, the instructions may reorder.
It is unfortunate that we’ve never gone to the level of detail and thoroughness the Java memory model folks have gone to. We have constructed our model over years of informal work and design-by-example, but something about the JMM approach is far more attractive. Lastly, what I’ve described applies to the implemented memory model, and not to what was specified in ECMA. So this is apt to change from one implementation to the next. I have no idea what Mono implements, for example.
 Friday, November 09, 2007
Lock-free algorithms are often “better” than lock-based algorithms:
- They are, by definition, wait-free, ensuring threads never block.
- State transitions are atomic such that failure at any point will not corrupt the data structure.
- Because threads never block, they typically lead to greater throughput, as the granularity of synchronization is a single atomic write or compare-exchange.
- In some cases, lock-free algorithms incur fewer total synchronizing writes (e.g. interlocked ops) and thus can be cheaper from a pure performance standpoint.
But lock-freedom is not a panacea. I’ve gained a lot more experience using lock-free algorithms in the past 3 years: first, when working on memory model improvements for Whidbey, and recently during the implementation of the ParallelFX library and as I write new content for my book.
There are some obvious drawbacks:
- The use of optimistic concurrency can lead to livelock for hot data structures.
- The code is significantly harder to test. Usually its correctness hinges on a correct interpretation of the target machine’s memory model –in my case, the .NET memory model (the topic of an upcoming post)—which can be misinterpreted and is hard to validate. Moreover, because the most popular hardware is stronger than the lesser popular hardware (e.g. X86 vs. IA64), testing needs to explicitly focus on the esoteric hardware to expose races.
- The code is significantly harder to write and maintain, for many of the same reasons.
I have learned about a less obvious drawback the hard way. When initially implementing a certain data structure, you may make a bunch of assumptions about the use cases your class needs to accommodate. And you may actually succeed in writing an implementation that is correct given those assumptions. But over time, as new use cases are discovered, it is much harder to retrofit the code and revalidate that the lock-free algorithms are still correct given the new assumptions. There is no magic oracle that says: hey, adding feature X is going to invalidate assumption Y over there, creating a memory reordering bug.
In several recent cases, I’ve discovered such problems, and dealt with them one-by-one, usually by adding additional memory barriers. As the numbers of memory barriers increase (roughly ½ the cycle-cost of acquiring and releasing a lock), however, the benefits of lock-free algorithms begin to dwindle. It’s easy to begin with an algorithm that scales and performs nicely, and over time add a memory barrier here and there, and eventually end up with something that performs worse than the lock-based equivalent. Unfortunately, this threshold isn’t always obvious, so you can end up with a real mess on your hands: a buggy, impossible to test, and difficult to understand hunk of code. All the drawbacks of lock-free code with none of the benefits. Whoops.
The moral of the story? Be careful and conservative in your use of lock-free code. There are many well-known published lock-free algorithms, and it’s usually a good idea to stick to them, if you use lock-free code at all. When in doubt, just use locks. Truthfully, they are hard enough.
 Wednesday, October 17, 2007
For those who haven't heard yet, F# is joining the "official" .NET language family. As Soma says, "[t]his is one of the best things that has happened at Microsoft since we created Microsoft Research over 15 years ago." I am just as excited as he is. A huge congrats to Don. He's been working super hard to build and evangelize F# for several years now, and getting the attention of a huge division like DevDiv isn't an easy thing, but in the end sanity does prevail. As a blanket statement, the more functional .NET programmers' brains become, the more latent parallelism the programs they write will contain.
 Saturday, October 13, 2007
Charles from Channel9 recorded a conversation with Anders and me a couple weeks back. The topic? Concurrency. More specifically, Parallel FX (PFX):
Programming in the Age of Concurrency: Concurrent Programming with PFX
Microsoft is developing a number of technologies to simplify the expression of parallelism in code. An example of this work is Parallel Extensions for the .NET Framework (PFX), a managed programming model for data parallelism, task parallelism, scheduling, and coordination on parallel hardware.
PFX makes it easier for developers to write programs that take advantage of parallel hardware (you've all heard of multi-core and what the future holds with many-core...), without having to deal with the complexities of threads and locks in today’s concurrent programming story
We don't go too deep, but you can bet we'll be doing more of these things as the technology matures and gets closer to general availability. Enjoy!
(Note: you may also be interested in Stephen Toub's PFX interview with Scott Hanselman, available here.)
 Tuesday, October 09, 2007
When COM came onto the scene in the early 90’s, Symmetric Muiltiprocessor (SMP) architectures had just been introduced into the higher end of the computer market. “Higher end” in this context basically meant server-side computing, a market in which the increase in compute power promised increased throughput for heavily loaded backend systems. Parallelism per se—that is, breaking apart large problems into smaller ones so that multiple processors can hack away to solve the larger problem more quickly—was still limited to the domain of specialty computing, such as high-end supercomputing and high-performance computing communities. The only economic incentive for Windows programmers to use multithreading, therefore, was limited mostly to servers. Heck, parallelism is still pretty much limited to those domains, but the economic incentives are clearly in the midst of a fundamental change.
As is already well established, server-side computing is highly parallel for several reasons. The most obvious is the steady stream of work a server farm usually enjoys, meaning there is seldom a shortage of compute work to do. Even if work is IO bound, there’s typically at least some work that could use a CPU waiting in the arrival queue to overlap execution with. Moreover, sever workloads are usually isolated except for some select and small amount of application-wide state. Each user has his own account, order history, bank transaction information, etc., and therefore the interaction between sessions can be carefully controlled and nearly non-existent, leading (once again) to a good cost/benefit tradeoff, due to the large scalability wins.
Human productivity has always been markedly more important than other software features, like performance, reliability, and security, unless the domains in which programs are being developed require an intense focus on certain attributes. I’m sure the DOA prioritizes security far above productivity, but the same isn’t true of most of the industry. This was true back in the COM days, and is still true to this day (perhaps more so). So it’s safe to conclude that the designers of COM had “ease of development” at the forefront of their minds when creating it. That coupled with the kind of multithreading in use back then on Windows machines (servers), putting an emphasis on lack of sharing, lead to the development of the Single Threaded Apartment (STA) model. And, related, were COM+’s addition of explicit synchronization contexts which took the STA auto-synchronization idea and generalized it to make synchronization policies more customizable.
These features made synchronization, an often-impossible task, and less important to be precise about when isolation is pervasive, much simpler. Instead of having to test a million different machine configurations, various difficult-to-predict-ahead-of-time component interactions, and so on, a component got the STA stamp and was guaranteed safe in a multithreaded environment. The alternative then is the alternative today: go free-threaded (MTA or NTA) and deal with all of the nasty synchronization problems that arise “the old fashioned way.” In other words, use locks and events, and run the risk of race conditions, deadlocks, and various other latent bugs that would ruin the composability and reliability of any less-than-bulletproof component. Sadly, “the old fashioned way” is still “the state of the art” until we build a better mousetrap.
Now, the STA’s gotten a really bad rap over the years. (I’ll ignore synchronization contexts for the time being but just about everything I say applies to them too.) It’s true that STAs cause us a lot of problems when thinking about legacy compatibility, and will make it just that much more difficult to migrate legacy Windows apps over to a massively parallel world, but I’m going to stick my neck out and make a claim that won’t win me friends (and in fact might lose me some): STAs aren’t entirely evil, and are an interesting idea that we as a community can learn a lot from. What’s more, we have years of experience using them. I see a lot of people basically reinventing the STA model, often without realizing it due either to a lack of understanding of (or interest in) COM or simply a lack of pattern matching abilities. “History will repeat itself, because nobody was listening the first time.”
Automatic synchronization is now the holy grail of the new multicore era. STM is another attempt at that. Active objects, however, which have shown up in numerous places are another more closely related attempt to the STA. Yet another closely related technology is message passing in general, where isolated domains of control do not share state and instead communicate via disconnected message passing. All strive to attain similar goals, improved developer productivity and safety, usually with some performance or scaling overhead. The biggest difference, from my perspective, is that design priorities are now different due to the environment at the time these things are being created. It’s clear today that any automatic synchronization technology we invent should scale to hundreds (perhaps thousands) of processors, not one or two (or, at the extreme, eight), that fine-grained parallelism will become more and more important, and that the degree of sharing will be high, whether that means logically (by message exchange) or physically (in the most literal shared memory sense).
Clearly the worst aspect of COM STAs is that they are obviously not up for the task of scaling like this at such a fine-grain, because a single thread is responsible for executing all code for some particular set of objects in the process. It's just plain impossible to parallelize finer than the granularity of a single component, and it's common to glump many components together into one apartment which is worse. As the number of available processors grows, and/or the number of objects instantiated inside a particular STA which need to interact, scalability suffers. Sadly we’ve inherited huge hunks of code that have been written in this fashion, with all of the assumptions about the multithreading environment in which the components will be deployed as immutable laws.
But there are good things about COM STAs! They are brain-dead simple in the most common cases. Synchronization doesn’t take nearly as much brainpower and development time away from the component creation process, improving developer productivity and the robustness of the software written. So long as your STA component never blocks or performs a cross-apartment invocation, life remains very simple. This is an example of a leaky abstraction, however, because it’s not always evident to the programmer when this chasm has been crossed. Proxies do attempt to hide the gunk of crossing the chasm, though at the risk of introducing reentrancy, which itself comes with a lot of baggage. I’d like to stop and point out something at this point, perhaps helping to support the “reinventing the wheel” claim earlier. Active objects and message passing systems generally suffer from similar problems. If one object uses another (by enqueueing a message) and then, at some point, waits for a response message to arrive, there is the risk that the thread which is now blocked will need to itself respond to a message coming from another object. Ahh, the classic reentrancy versus deadlock tradeoff. Event-driven, stackless systems like the Concurrency and Coordination Runtime (CCR), etc., mitigate this problem but require a fundamentally different way of programming. UI programmers are generally more comfortable with this approach. And linear types a la Singularity's exchange heap also offers a promising way to enable concurrency, but to safely guarantee certain state will not be shared.
In the end, COM STAs are still an invention I wish we could do away with. I think of the technology a bit like a cheap, half-way immitation of Hoare's CSPs. But at the same time, I fear we as an industry will continue to reinvent them, just under a different guise or with subtly different nuances. We need to resist the urge to pretend they don't exist just because they contain the letters C, O, and M and because the sound of STA is known to trigger feelings of intense nausea. What’s scary to me is that, STM aside, there doesn’t seem to be any super-promising alternative to the automatic synchronization problem for shared memory, aside from provable declarative and functional safety. As I’ve noted above, true fine-grained message passing has a lot of similar issues, but I do wonder at the end of the day if Joe Armstrong has been right all along. (Well, Tony Hoare really deserves the credit, and perhaps David May too, but Erlang is en vogue currently.) Time will tell.
 Saturday, September 15, 2007
Two articles about ParallelFX (PFX) are in the October issue of MSDN magazine and have been posted online:
- Parallel LINQ: Running Queries on Multi-Core Processors. An overview of an implementation of LINQ-to-Objects and -XML which automagically uses data parallelism internally to execute declarative language queries. It supports the full set of LINQ operators, and several ways of consuming output in parallel.
- Parallel Performance: Optimize Managed Code for Multi-Core Machines. Describes the Task Parallel Library (TPL), a new "thread pool on steroids" with cancelation, waiting, and pool isolation support, among many other things. Uses dynamic work stealing techniques (see here and here) for superior scalability.
As noted in the article, there's a PFX CTP planned for 2007*. Watch my blog for more details when it's available.
*Note: some might wonder why we released the articles before the CTP was actually online. When we originally put the articles in the magazine's pipeline, our intent was that they would in fact line up. And both were meant to align with PDC'07. But when PDC was canceled, we also delayed our CTP so that we had more time to make progress on things that would have otherwise been cut. It's less than ideal, but I'm still confident this was the right choice.
 Wednesday, September 12, 2007
Lock recursion is usually a bad idea. It can seem convenient (at first), but once the slippery slope of making calls from critical regions into complex ecosystems of code is embarked upon (which is usually a necessary pre-requisite to lock recursion, except for some relatively simple cases), it’s easy to accidentally fall right off the edge. This topic was part of the doc I wrote previously about using concurrency inside of reusable libraries. My opinions haven't changed much since then.
Lock recursion coupled with condition variables is even worse. In fact, its behavior might surprise you.
To motivate this, would you ever think of writing code that does something like this?
void BreakAtomicity() { … I assume somebody called me with a recursive lock on ‘obj’ … Monitor.Exit(obj); Monitor.Exit(obj); … Do something … Monitor.Enter(obj); Monitor.Enter(obj); }
I should certainly hope not! Unless you're crazy or reeeeally know what you're doing. Who knows what state invariants are busted at the time the call to BreakAtomicity was made? Releasing the lock in this manner hoists these ticking timebombs onto the other threads into the process that might want to inspect the shared state. If you, the author of BreakAtomicity, have all-knowing omnipresent knowledge of the entire program, perhaps you know precisely. But, particularly in the case of recursion, where it's all-too-common to engage in practices of sloppy composition, this is actually quite unlikely. Lock recursion is typically used for convenience, not because of a really solid design that is based on clean algorithmic recursion.
What does this example have to do with condition variables anyway? Glad you asked! It matters because of what happens when you wait on a monitor that has been recursively acquired. In such cases, Monitor.Wait will release all recursive acquires as part of its waiting. I.e. if it has been acquired 10 times, it is released 10 times before waiting. It does this, of course, because otherwise it would deadlock waiting for some other thread to make a call to Monitor.Pulse/PulseAll (since a separate thread needs to first acquire the lock in order to do either). This is symmetric, so once the thread has been awoken, it will reacquire the lock as many times as needed before returning to attain the same level of recursion that existed prior to the call.
Now, Monitor.Wait breaks atomicity anyway. This is obvious. It releases and reacquires the lock internally, and so any conditions regarding shared state that exist prior to the call cannot be assumed to exist after the call returns. (Most) people understand this and tend to use Wait in fairly common and safe patterns, such as guarded regions where some predicate is checked for validity at the very front of a critical region before doing anything interesting with state. But the really nasty thing about recursive locks and the Wait behavior described above is that this breaks atomicity for some unknowing number of nested critical regions that have existed for some unknown amount of time leading up to the Wait. This is a recipe for pain. My recommendation is probably predictable: just following the broadly accepted advice that, because lock recursion is evil to begin with, it is best avoided, and you will safely avoid the more complicated case outlined above.
It’s worth pointing out that the new CONDITION_VARIABLE in Vista, i.e. SleepConditionVariableCS and SleepConditionVariableSRW, only release the lock once, despite recursive acquire counts. (SRWL doesn’t officiailly support recursion, although it works for shared locks since there is no affinity used and it is undetectable.) Deadlocks result instead. From an editorial perspective, I prefer this behavior quite a bit, since it’s easier to debug. (Admittedly, if Monitor’s behavior is what you want, it’s less than straightforward to achieve, unless you know the recursion counter somehow. Although, I will also note that I am convinced very few real people would want Monitor’s current behavior...) My preferred solution to this would have been to throw an exception, since I do think issuing a Wait when locks have been recursively acquired is in most cases a bug. As a workaround, we could have exposed a RecursionCount on Monitor so that a developer could manually exit the lock RecursionCount-1 times before the call to Wait and then reacquire it RecursionCount-1 times after the call returns. (Actually no -- I would have made Monitor non-recursive by default, like the new ReaderWriterLockSlim.) Sadly, I guess I'm only about 10 years too late...
 Wednesday, August 22, 2007
Most managed code in the .NET Framework has not been hardened against asynchronous exceptions. This includes out of memory (OOM) conditions and asynchronous thread aborts, and is entirely by design. Hardening against OOM, for example, is historically an extraordinarily difficult feat, and few systems undertake the development and QA costs needed to do so. (FWIW, the CLR VM is one such system.) Simply failing gracefully is usually hard enough. Failing gracefully is admittedly leaps and bounds easier in managed code because allocation failures are communicated via exceptions rather than return values, and are thus transitively propagated “by default.” Thread aborts are even more difficult to harden against, however, because they can originate at any instruction (with a handful of exceptions). Ensuring data invariants are protected for every single instruction is clearly just a little difficult.
These things are certainly not impossible. With enough effort, you can make inroads toward solutions for both issues. Portions of the .NET Framework have gone to such lengths. For example, code that manipulates process-wide state spanning AppDomains needs to ensure that this state is not corrupted by an unfortunately placed thread abort when run inside systems like SQL Server that use aborts to tear down boundaries of isolation. While possible, the important thing to understand here is that most of the .NET Framework is in fact not resilient to these things. See this doc as an example of guidance the CLR team provided to other developers inside of Microsoft to this effect. OOMs are in a similar category, though many subsystems take different, inconsistent approaches to memory allocation failures (e.g. WPF takes a different stance than WCF).
All of this is a long winded build up to the following problem: thread interrupts are just about as evil as these sorts of asynchronous exceptions. The failure injection points are more constrained—e.g. an OOM can occur wherever an allocation occurs, a thread abort can happen in between nearly any two instructions, and thread interruptions can only occur at blocking calls that transition the managed thread into the state WaitSleepJoin—but this doesn’t change the fact that most code is unprepared to deal properly with such interruptions. Once again, it’s not that managed code cannot be constructed to be resilient to interruptions—in fact, it’s much easier than OOMs and thread aborts—it’s simply that the .NET Framework hasn’t been constructed to tolerate arbitrary interruptions. If threads are calling into these APIs and thread interruptions are provoked, state corruption, memory leaks, and possible deadlocks can be left in the wake.
To take a brief example of where such a problem might crop up, imagine a thread has blocked on FileStream.EndRead because it is finishing some asynchronous IO operation. After a brief inspection of the code, I’m convinced interrupting the call it makes to WaitHandle.WaitOne internally will lead to a memory leak:
if (1 == Interlocked.CompareExchange(ref result._EndXxxCalled, 1, 0)) { __Error.EndReadCalledTwice(); } WaitHandle handle = result._waitHandle; if (handle != null) { try { handle.WaitOne(); } finally { handle.Close(); } } NativeOverlapped* nativeOverlappedPtr = result._overlapped; if (nativeOverlappedPtr != null) { Overlapped.Free(nativeOverlappedPtr); }
The method ensures only one call to EndRead can occur, and will throw on subsequent attempts. So the above code will only ever run once. Sadly, EndRead needs to free the NativeOverlapped structure used internally for asynchronous IO completion. But because the call to Overlapped.Free follows the call to WaitOne, and doesn’t occur inside of a finally block, it won’t execute. In summary: interrupt that call to WaitOne, and boom, we leak a NativeOverlapped object. Whether or not this is disastrous of course depends on the precise scenario. A few bytes here and a few bytes there can quickly add up, particularly for long running programs. At least this particular example protects invariants sufficiently well to avoid state corruption that would lead to further unpredictability. But recall that this is just one example. In my experience, the BCL represents some of the most carefully written code in the .NET Framework, so this problem is undoubtedly scattered about all over the place.
Unfortunately, it’s become somewhat common advice that using thread interruption as a synchronization and control mechanism is a GoodThing™. Andrew Birrell, a researcher from Microsoft Research, for example, suggested this in his paper “An Introduction to Programming with C# Threads”:
“Interrupts are most useful when you don’t know exactly what is going on. For example, the target thread might be blocked in any of several packages, or within a single package it might be waiting on any of several objects. In these cases an interrupt is certainly the best solution. Even when other alternatives are available, it might be best to use interrupts just because they are a single unified scheme for provoking thread termination.” (p33)
While I am sure this advice is well intentioned, it is extremely dangerous for the subtle reasons outlined above and can lead to reliability problems in any programs that follow it. My recommendation is to build this kind of higher level synchronization into the code that you actually own, and handle shutdown and interruption logic yourself. This is a bit cumbersome and is more work, but it also ensures that arbitrary blocking points in the libraries you use will not be affected by interruptions.
With the increase in hardware parallelism over the coming years, I worry that the use of interruptions will become more widespread as a popular technique developers use to control threads. And as more and more of the .NET Framework uses higher degrees of concurrency, necessarily requiring more internal synchronization, the number of blocking points that are vulnerable to this kind of abuse will grow accordingly. So, please, do your part… avoid Thread.Interrupt like the plague. In fact, perhaps we should deprecate it.
 Monday, July 30, 2007
My book, Concurrent Programming on Windows, is shaping up quite nicely. (Given that I've been working on it for over a year now, I suppose it had better be!) I’ve been surprised at the amazing level of anticipation and excitement from blog readers, coworkers, and Microsoft customers, and I really can’t wait for it to be finished. Thanks for the patience so far.
I feel like I’m almost on the home stretch. End of September is my current target for completion. It’s looking like it’ll contain 18 chapters, with 3 appendices, and will have a total running length of somewhere around 700 pages. The reasons it has taken so long are numerous, but the primary reason is that the content is quite deep and detail-oriented—more than I expected at the start—and I’ve wanted to take the time to get it just right rather than cut corners. My editors recently gave me feedback that there will be very little developmental editing required, since I’m (ahem) very, uhh, meticulous when it comes to writing. And feedback from technical reviewers has been very positive as well. I think both are good news.
I’m confident the end product will be worth the wait.
In the meantime, it seems that some of the abstractions I’ve built while writing the book will likely become part of a future release of the .NET Framework. Keep an eye out on Channel9 for some additional details in a few months time. Right now is a super exiting time to be in the field of computer science, that’s for sure... Laissez les bons temps rouler!
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 28 | 29 | 30 | 31 | 1 | 2 | 3 | | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | 11 | 12 | 13 | 14 | 15 | 16 | 17 | | 18 | 19 | 20 | 21 | 22 | 23 | 24 | | 25 | 26 | 27 | 28 | 29 | 30 | 1 | | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Browse by Category:
Notables:
|