|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Sunday, February 17, 2008
A long time ago, I wrote that you’d never need to write another finalizer again. I’m sorry to say, my friends, that I may have (unintentionally) lied. In my defense, the blog title where I proclaimed this fact did end with “well, almost never.”
Finalizers have historically been used to ensure reclamation of resources that are finite or outside of the purview of the CLR’s GC. Native memory and Windows kernel HANDLEs immediately come to mind. Without a finalizer, resources would leak; server apps would die, client apps would page like crazy, and life would be a mess. For such resources, properly authored frameworks also provide IDisposable implementations to eagerly and deterministically reclaim the resources when they are definitely done. Three years ago, I wrote a lengthy treatise on the subject.
The finalizer is there as a backstop. It is often meant to clean up after bugs , such as when a developer forgets to call Dispose in the first place, tried to but failed due to some runtime execution path skipping it (often exceptions-related), or a framework or library author hasn’t respect the transitive IDisposable rule, meaning that eager reclamation isn’t even possible. It also avoids tricky ref-counting situations as are prevalent in native code: since the GC handles tracking references, you, the programmer, can avoid needing to worry about such low-level mechanics. In all honesty, the finalizer’s main purpose is probably that we wanted to facilitate a RAD and VB-like development experience on .NET, where programmers don’t need to think about resource management at all, unlike C++ where it’s in your face. While one could reasonable argue that IDisposable is all you need (the C++ argument), that would have gone against this goal.
Concurrency changes things a little bit. A thread is just another resource outside of the purview of the CLR’s GC, and is actually backed by a kernel object and associated resources like non-pageable memory for the kernel stack, some data for the TEB and TLS, and 1MB of user-mode stack, to name a few. They also add pressure to the thread scheduler. Threads are fairly expensive to keep around, and “user” code is responsible for creating and destroying them.
Now, it’s true that we are moving towards a world where threads and logical tasks are not one and the same. This is a ThreadPool model. But it’s also true that a task that is running on a thread is effectively keeping that thread alive, and perhaps more concerning, preventing other tasks from running on it. Use of a resource is a kind of actual resource itself, although more difficult to quantify.
So, what does all of this have to do with finalization?
If some object kicks off a bunch of asynchronous work and then becomes garbage—i.e. the consumer of that object no longer needs to access it’s information—then it’s possible (or even likely) that any outstanding asynchronous operations ought to be killed as soon as possible. Otherwise they will continue to use up system resources (like threads, the CPU, IO, system locks, virtual memory, and so on), all in the name of producing results that will never be needed. The only reason this task stays alive is because the scheduler itself has a reference to it.
Just as with everything discussed above pertaining to non-GC resources, we’d like it to be the case that such a component would offer two methods of cleanup:
- Dispose: to get rid of associated asynchronous computations immediately when the caller knows they no longer need the object.
- Finalization: to get rid of associated resources that are still outstanding when the GC collects the root object that is responsible for managing those asynchronous computations.
You’ll notice that we support cancelation in a first class and deeply-ingrained way in the Task Parallel Library. While not exposed in PLINQ (yet), there is actually cancelation support built-in (though not as fundamental as we’d like (yet)). This is a useful hook to allow us to build support for both resource reclamation models. In this sense, cancelation as a pattern of stopping expensive things from happening is quite similar to resource cleanup. Clearly they aren't identical, but we will need to figure out the specific deltas.
I should also point out that we will prefer and push structured parallelism for many reasons. Parallel.For is an example, where the API looks synchronous but is internally highly parallel. One reason we like this model is that the point at which concurrency begins and ends is very specific. The call won’t return until all work is accounted for and completed. It’s only when you bleed computations into the background after a call returns that everything stated above becomes an issue. This is obviously nice for failures (e.g. you are forced to deal with them right away), but also because it alleviates this problem nicely.
I don’t think we’re at a point where we can recommend definite tried-and-true best practices for cancelation of asynchronous work and how it pertains to resource management. I do think we need to get there by the time we ship Parallel Extensions V1.0. And I think we will. Here’s a snapshot summary of my current thinking, however, and I would love to get feedback on it:
- We should tell people to implement IDisposable and to Cancel tasks inside Dispose, when their classes own unstructured asynchronous computations.
- We may or may not want people to implement a finalizer to do the same. I currently believe we will.
- I am undecided about whether these cancelations should be synchronous. In some sense, they should be since you’d like to know that all resources have definitely been reclaimed. But this would mean blocking (possibly indefinitely) on the finalizer thread. That’s a definite no-no. Blocking in Dispose would mean blocking (possibly indefinitely) inside a finally block. That’s also a no-no, although it’s less severe of one than the finalizer. It just means hosts can’t take over threads as easily when they need to abort them. Thankfully we offer the Task.Cancel method which is non-blocking. Possibly we should suggest synchronous cancelation inside of Dispose, and asynchronous inside of the finalizer.
- If we did do synchronous anywhere, presumably with Task.CancelAndWait, we’d need to recommend a practice for communicating failures. Throwing from Dispose is discouraged, but so is swallowing failures. The kind of code usually run inside of Dispose is much less likely to generate exceptions than running arbitrary tasks full of user code. Catch-22.
- There are some cases we can do the cancelation thing ourselves. Whether we do or not is subject to debate, but I believe we should. If we ensured the scheduler’s references are weak, then once all other code in the process drops the reference, we would not schedule it. This implies that tasks are seldom executed “for effect”, which is certainly a judgment call. It might be worth exposing an option that allows “for effect” tasks to be created not subject to this rule.
- The trickiest case is when a task is already running. For short-running tasks, this may not be a huge concern, but a lot of such tasks do recursively queue up additional ones. It would be nice if the fact that its results are no longer needed somehow flowed automatically to the task, perhaps through cancelation. This also means waking tasks from blocking calls.
It’s interesting to point out that 5 and 6 were part of the original motivation for the inventors of the future abstraction. They noted that representing computations as futures, and allowing the GC to collect them before they run once they’ve become unreachable, effectively makes computations garbage-collectable. This, I think, is a neat idea, particularly if your program uses futures pervasively.
In any case, I wanted to point these subtleties out, and hear any feedback folks out there might have. What I find particularly interesting about concurrency, as we move forward on things like Parallel Extensions, is that there are a lot of subtle implications to the way programs are written. This includes fundamental things like exceptions and resource management. There are other subtle impacts, like whether the ordering of results coming out of a computation matters. PLINQ surfaced this early on, and I didn’t expect the pervasive nature of the issue. Debugging and profiling are also extraordinarily different. I suspect we’ll continue running into many such things throughout the evolution to highly parallel software.
 Wednesday, February 13, 2008
A couple weeks back, we started filming a bunch of Channel9 videos about various aspects of the Parallel Computing team. This is the larger team responsible for Parallel Extensions to .NET, among other things. We'll of course spend some time on Parallel Extensions in upcoming videos.
But who better to kick it off than Burton Smith, a legend in the parallel computing arena?
On General Purpose Supercomputing and the History and Future of Parallelism http://channel9.msdn.com/Showpost.aspx?postid=382639
Burton's the kind of guy that you run into when meandering the hallways, have a 5 minute conversation, and walk away with at least 20 relevant and fascinating papers on parallel computing to go off and read. But now instead of reading 20 papers, you can just go watch the video.
 Saturday, February 09, 2008
Torn reads are possible whenever you read a shared value without synchronization that is either misaligned and/or which spans an addressible pointer-sized region of memory. This can lead to crashes and data corruption due to bogus values being seen. If not careful, torn reads can also violate type safety. If you have a static variable that points to an object of type T, and your program only ever writes references to objects of type T into it, you may still end up accessing a memory location that isn't actually a T. How could this be?
You guessed it. Torn reads apply to pointer values just as much as they do to ordinary values. So a thread reading a pointer in-flux could see bits of its value in separate pieces, blending the state before and after the update. Dereferencing this mutant pointer would lead you off into an unknown place in the address space, and most certainly not to an instance of T, breaking type safety. Since VC++ aligns pointer fields automatically, you'd have to go out of your way with __declspec(align(N)) or an unaligned allocator to create this situation. Similarly with .NET's StructLayoutAttribute. Moreover, it turns out that .NET guards against this problem in its type loader, by rejecting any types containing improperly aligned object references. This is good news, because otherwise a plethora of security vulnerabilities would be possible. But VC++ doesn't offer any such guarantees.
This is another example where trying to program in a lock-free manner can lead to difficulties that aren't present when you stick to ordinary locking.
 Saturday, February 02, 2008
I'm still plugging away at my new concurrency book.
The fact that we have cover art and that it's available for pre-order on Amazon are both great signs that it's getting close to being done:

The full TOC (as of right now) is available here. A few chapters are still in the works, and there's a bit of editing ahead. I've also decided to write Appendices on PFX and CCR. In the meantime, definitely feel free to pick up a PDF of some early content via RoughCuts. Believe it or not, this is a way to provide real-time feedback that can impact the book before it hits the presses.
I have to say this is the most complicated piece of writing I've ever undertaken. Not only does the material require going very deep in a lot of hard areas, but I've noticed that as a community we're learning to speak speak better and to use consistent terminology about various aspects of concurrency. To stay relevant, I'm finding a fair bit of content has had to be reworked several times.
At the same time, and from a selfish standpoint, this book has been a wonderful forcing function to learn everything there is to know about concurrency. Not that this is a realistic goal, mind you, but gosh darnit I'm trying. I'm convinced at this point that the best way to become an expert on something is to try to teach it to other people. And what better way than writing a book? If you can't explain it clearly to a broad, on-the-average-less-expert-than-you audience, you probably have a faulty mental model to begin with. The process of merely trying usually reveals this. I encourage everybody out there to try it. At least once.
 Thursday, January 17, 2008
Most schedulers try to keep the number of runnable threads (within a certain context) equal to the number of processors available. Aside from fairness desires, there’s usually little reason to have more: and in fact, having more can lead to more memory pressure due to the cost of stacks and working set held on the stacks, non-pageable kernel memory, per-thread data structures, etc., and also has execution time costs due to increased pressure on the kernel scheduler, more frequent context switches, and poor locality due to threads being swapped in and out of processors. In extreme cases, blocked threads can build up only for all of them to be awoken and released to wreak havoc on the system at once, hurting scalability.
A naïve approach of one-thread-per-processor works great until a thread on one of these processors blocks, either “silently” as a result of a pagefault or “explicitly” due to IO or a synchronization wait. (I should mention that due to the plethora of hidden blocking calls in the kernel, Win32 user-mode code, and the .NET Framework, a lot of IO and synchronization waiting is “silent” too.) In this case, a processor becomes idle (0% utilization) for some period of time. If there is other work that could be happening instead, this is clearly bad.
Many programs spend most of their time blocked.
Four particular solutions to this problem are commonplace on Windows:
- Create more threads than processors and hope for the best. This trades some amount of runtime efficiency for the insurance that processor time won’t go to waste.
- Periodically poll for blocked threads using some kind of daemon and respond to the presence of one by creating a new thread to execute work. Eventually this thread would go away, for instance when the blocked thread awakens. This is the approach used by the CLR ThreadPool, although it caps the total. (The TPL uses this appraoch today also, but we're changing/augmenting it.) For obvious reasons, this approach is quite flawed: you easily end up with more running threads than processors, have to trade-off more frequent polling--which implies more runtime overhead--with less frequent polling--which adds time to the latency in the scheduler’s response to a blocked thread.
- Block on an IO Completion Port at periodic intervals--e.g. when dispatching a new work item in a ThreadPool-like thing--which has the effect of throttling running threads. This still requires creating more threads than processors, but helps to ensure few of them run at the same time. Unfortunately, it still does lead to more of them actually running than you’d like since the port can only prevent a thread from running when it goes back and blocks on the port in the kernel. But this is only done periodically.
- Specialized systems like SQL have used Fibers in the past to avoid needing full-fledged threads to replace the blocking ones. To do this, they ensure all blocking goes through a cooperative layer, which notifies a user-mode scheduler (UMS). The user-mode scheduler maintains a list of blocked Fibers, but can multiplex runnable Fibers onto threads, keeping the number of threads equal to the number of processors. A thread effectively never blocks, Fibers do, but this requires all blocking to notify the UMS. Aside from extraordinarily closed world systems, this approach doesn’t usually work. That’s because Fibers are not threads and multiplexing entirely different contexts of work onto a shared pool of threads (at blocking points) can easily lead to thread affinity nightmares.
The CLR facilities #4 by funneling all synchronization waits in managed code through one point in the VM codebase. This was done initially to ensure consistent message pumping on STA threads, via CoWaitForMultipleHandles. But it was then exploited in 2.0 to expand the CLR Hosting APIs to enable custom hosts to hook all synchronization calls. This is convenient for building interesting debugging tools, like deadlock detecting hosts.
A fifth approach is often viable and even preferable, and that is to avoid blocking altogether. Often referred to as continuation passing style (CPS), the idea is that, where you’d normally have blocked, the callstack is transformed into a resumable continuation. For an example of this, look at Jeff Richter’s ReaderWriterLockGate class: it’s a reader/writer lock with no blocking. Asynchronous IO is supported by files and sockets on Windows, and enables a similar style of programming. The continuation is ordinarily just a closure object that has enough state to restart itself when the sought-after condition arises. When it does arise, the continuation is scheduled on something like the CLR ThreadPool. This avoids burning any threads while the wait occurs.
For obvious reasons, CPS is usually hard to achieve in .NET: there is no language support for first class continuations in .NET, all synchronization primitives are wait-based, and keeping a whole stack around in memory would be a terrible idea anyway. You’d also need to worry about resources held on the stack, including locks. Instead, you should save only that state which is needed during the continuation. In a message passing system this is much simpler, since most of the program is full of continuations in the form of message handlers. For an example of such a system, check out the Concurrent and Coordination Runtime (CCR) and/or Erlang.
Even in message passing systems, it’s impossible to escape the fundamental blocking issue, since it is platform-wide. And in ordinary imperative programs, the CPS transformation is near-impossible at the leaves of callstacks: unless you have whole program knowledge, who knows what your caller expects? Most APIs are synchronous. Futures and Promises potentially make this style of programming easier, though in the extreme all APIs would need to return a Promise rather than a true value.
Nothing conclusive, just some random thoughts ...
 Wednesday, January 02, 2008
I've been very remiss with blogging lately. This was mostly due to travel (5 weeks out of the past 2 months), but also because I'm focused intensely on the book, and have been super busy working on Parallel Extensions: the December CTP, hiring for our team, planning what we do in the next year, designing stuff, implementing stuff, ... it's been a lot of fun. I've also been trying to learn classical guitar after 15 years of playing electric and some acoustic (metal, rock, blues). It's more challenging than I anticipated... I guess I should have started when I was six.
Somewhere in there, I recorded an episode of .NET Rocks with Carl and Richard about Parallel Extensions: check it out. I haven't listened to it yet, but I distinctly remember having a lot of fun. When the hour was up, I couldn't believe how quickly the time had passed.
Happy New Year, everybody. I promise to blog more in the coming months. Famous last words, eh? ;)
 Thursday, November 29, 2007
Today is an extraordinarily exciting day for me. After about two years of work by several great people across the company, the first Parallel Extensions (a.k.a. Parallel FX) CTP has been posted to MSDN. Check out Soma’s blog post for an overview, and the new MSDN parallel computing dev center for more details. Keep an eye on the team’s new blog too, as we’ll be posting a lot of content there as we make progress on the library; in fact, thanks to Steve (who writes blog posts in his sleep), there’s already a bunch of reading to catch up on!
I began kicking the tires on PLINQ back in October of 2005. In September of 2006, I described PLINQ as “a fully functional prototype” and “research.” Well, it’s come a long way since then, and we’re finally ready for real human beings to start hammering on it. Not only that, but we’ve expanded the scope of the original project significantly, from PLINQ to Parallel FX, to include new imperative data parallel APIs (for situations where expressing a problem in LINQ is unnatural), powerful task APIs that offer waiting and cancelation, all supported by a common work scheduler based on CILK-style work-stealing techniques developed in collaboration with Microsoft Research. And there’s even more to come. Daniel Moth spilled some beans in his screencast on Channel9 when he described the additional data structures, like synchronization primitives and scalable collections, which will come online later. Some of them are even in the new CTP, but have remained internal for now.
The shift to parallel computing will have an industry-wide impact, and will undoubtedly take several phases and many years to tame completely. We have focused on the lowest hanging fruit and the most important foundational shifts in direction we can incite—like encouraging the over-representation of latent parallelism to aid in future scalability—but there are certainly things that the current CTP doesn’t fully address. GPGPUs, verifiable thread safety, automatic parallelism, great tools support, etc., are all topics that are of great interest to us. We have a lot of work to do for the final release of Parallel FX, and expect a whole lot of feedback from the community on specific features and general direction. So let us have it! You can use our Connect site, or even just email me directly at joedu AT you-know-where DOT com.
Consider this an early Christmas present. Now you have something fun to do, in the privacy of your own office, when trying to avoid family members during the holidays. Whoops—did I say that out loud? Enjoy!
 Saturday, November 17, 2007
I recently described an approach to adding immutability to existing, mutability-oriented programming languages such as C#. When motivating such a feature, I alluded to the fact that immutability can make concurrent programming simpler. This seems obvious: a major difficulty in building concurrent systems today is dealing with the ever-changing state of “the world,” requiring synchronization mechanisms to control concurrent reads and writes. This synchronization leads to inefficiencies in the end product, complexity in the design process, and, if not done correctly, bugs: race conditions, deadlocks due to the lack of composability of traditional locking mechanisms, and so forth.
Lock-free algorithms simplify matters (in some ways) by compressing state transitions into single, atomic writes. For instance, a lock-free stack has a single head node; pushing requires swapping the current head with the new node to be enqueued, and popping entails swapping the current head with its current next pointer. I mentioned some of the benefits of lock-freedom here. Sadly these lock-free techniques do not really compose. For instance, if I want to pop from a lock-free stack and push onto another, in an atomic fashion, single-word compare-and-swap is insufficient. We’ll return to this later in the context of immutability.
I will draw an analogy between lock-free algorithms and synchronization involving immutable objects. Imagine an immutable type represents “the world” in a concurrent system. Threads are constantly interacting with the world, by reading its state, and sometimes changing it. Because the world is immutable, we must copy it and publish an entire new one any time we wish our changes to become visible. This is key! It enables optimistic concurrency and synchronization protocols more similar to lock-free algorithms than lock-based algorithms.
Because individual components inside the world can’t change, the entire world can be read from without synchronization. Moreover, the creation of the new world needn’t use synchronization either, so long as we are willing to tolerate the possibility of wasted work, as is always the case with optimistic concurrency: we are being optimistic that the work will indeed not have been wasted, because writes are infrequent. That’s the basic premise of most concurrency algorithms, e.g. reader/writer locks. What requires synchronization is merely the publication of the new world.
internal static World s_theWorld = new World(…); // The initial world. internal void ReadTheWorld() { World w = s_theWorld; // This copy never changes! // Read state from the world … } internal void ChangeTheWorld() { World oldWorld; World newWorld; do { oldWorld = s_theWorld; // Read the world. // Read state to compute the new world … newWorld = new World(…); } while (Interlocked.CompareExchange(ref s_theWorld, newWorld, oldWorld) != oldWorld); }
In this scheme, there will always be inherent races between threads trying to call ReadTheWorld and threads trying to call ChangeTheWorld. This is a basic characteristic of the system. But it is more functional. If ReadTheWorld produces correct answers for any world sequentially, then it will also produce correct answers in a parallel program too, since worlds cannot change. And because writes are guaranteed to retire in order on the CLR 2.0 memory model, we can be assured that ReadTheWorld will not be subject to memory reordering bugs.
This is a very powerful technique, making code that reads from and changes the world much simpler to write, understand, and debug. How many times have you longed for a programming technique where code can be tested in a sequential context and still be guaranteed to produce the right answers in a parallel context?
With all that said, it’s sadly not applicable to every synchronization problem. Some of the problems that arise can be worked around, and others are more difficult. I will outline the most difficult of them.
Problem 1. As with most lock-free algorithms, livelock is a distinct possibility.
Livelock occurs because many threads may attempt to compute a new world simultaneously. If they both read the same old world, only one will succeed at publishing their copy of the new world. The other thread will have to go back ‘round the loop, re-read the current world, compute a new one, and try again. Computing a new world may take some time which, coupled with high arrival times, may lead to unfair forward progress and wasted work due to spinning. We hope, however, that in most cases the frequency of writes is small enough to make this a less pressing issue. Also note that the algorithm shown above is truly wait-free: the failure of one thread indicates that another thread has succeeded in publishing a new world. So forward progress of our system is not compromised by this problem.
Problem 2. ChangeTheWorld may need to perform impure, irreversible side effects.
A nasty issue in today’s programming languages is the reliance on side-effects. (OK, that’s unfair. This is a nasty issue in today’s conventional programming techniques and styles, and the languages we use simply accommodate these techniques. This is mostly because, if we didn’t, developers would find a way to circumvent the system, prefer to use alternative languages, and so on.) But if, corresponding with the update of the world, another side effect must be made, this technique doesn’t accommodate that. We could work around that with other synchronization techniques but it is likely to be cumbersome. In some cases, there is no harm in performing a side effect, so long as it does not “undo” the side effect associated with a newer world.
For instance, say we needed to update a GUI with the results of the new world computation. We would want to ensure the GUI was refreshed with the most recent update. Thankfully we don’t need to worry about the world changing itself, so we may do something like this:
internal void ChangeTheWorld() { … as before … myGuiControl.BeginInvoke(delegate { if (s_theWorld == newWorld) // Update the GUI with newWorld. }); }
This ensures we only update the GUI with the latest world. Imagine the sequence: world A is published, world B is published, we refresh the GUI with world B, we receive the BeginInvoke associated with world A now (out of order, which is perfectly reasonable). We must make sure that the graphical depiction of world A doesn’t overwrite world A. Thankfully the automatic mutual exclusion inherent in GUI programming ensures that our dirty check s_theWorld == newWorld is sufficient to prevent this from occurring. Clearly this technique can’t always apply.
Problem 3. This technique doesn’t expand to support multiple immutable fields.
This is a fundamental flaw with techniques that rely on a single, atomic compare-and-swap. Processor vendors have recently published papers for technologies like transactional memory and multi-word compare-and-swap (and in fact, there is plenty of research into pure software implementations of both), which would allow us to fix this problem.
The title of this article, by the way, is a play on Wadler’s paper ‘Linear types can change the world!’ I highly recommend that paper to anybody interested in immutability, isolation, and purity. This paper demonstrates type system support for linearity which, in essence, enables safe reading and mutual exclusion via the type system instead of locking. There is only one world, after all, the paper argues, so the traditional functional programming approach of disallowing mutability is actually a bit unrealistic.
In summary, immutable types add a lot of simultaneous power and simplicity to parallel programming. They are certainly not a panacea, since they require shifting to a more functional style of programming, where side-effects are discouraged and state is updated by making altered copies. This is often non-trviial. But this is one of many steps in what I consider to be the right direction. A direction that will enable us to tolerate large amounts of parallelism... without causing all of us to die of race-condition-induced anxiety attacks before the age of 30.
 Sunday, November 11, 2007
I’ve been asked a number of times about immutable types support for C#. Although C# doesn't offer first class language support in the way that F# does, you can get pretty far with what you do have in your hands already. Nothing prevents you from creating immutable data structures today, of course, but the problem is that there’s no compiler or runtime support to ensure you’ve done it right.
I just hacked together some new attributes and a handful of FxCop rules as an experiment. I’ve been very happy with the result. Sure it’s not baked into the language, but it’s a start. If there’s any interest, I can make the code available so you can play with it too.
Attributes and analysis
Imagine we had an ImmutableAttribute. Annotating a type with it indicates that objects of that particular type are immutable, i.e. that their state never changes after being constructed. This is great from a concurrency standpoint because it means access to such objects do not require synchronization. This can lead to more efficient code that not only has a higher chance of being correct but is also vastly easier to maintain. Well, what kind of restrictions would such a type be subject to?
1. Immutable types must have only initonly fields.
The first rule takes advantage of existing CLR type system and language support for initonly fields (a.k.a. readonly in C#). Marking a field as initonly ensures it is never written to after the constructor has finished executing. So long as all fields are initonly, the class is effectively already “shallow” immutable.
2. Immutable types must only contain fields of immutable types.
The second rule ensures transitivity, or “deep” immutability. The mutability of a complex object is typically, but not always, comprised of not only its own fields but also the state in the objects it refers to. With this rule and the prior rule, we’re about 90% there.
3. Immutable types should only subclass other immutable types.
To give the appearance that a particular object is immutable, that object’s type must not depend on other types that are mutable, as articulated by the previous rule. The ‘base’ reference is effectively just another field, and so this rule is derived from the previous one. If an immutable type could inherit mutable members and fields, then it wouldn’t really be immutable after all.
4. Mutable types should not subclass immutable types.
Similar to the previous rule, it is also a bad thing if a mutable subtype can override behavior from the subtype, giving the appearance of mutability. Say we have an immutable class IC with a virtual method f, and some mutable subclass MC overrides f to introduce logic that logically mutates the state of an object. Although the rules above are sufficient to ensure that the object is physically immutable, this can circumvent immutability safety through polymorphism. A related piece of advice: public immutable types should be sealed, to prevent outside classes that do not abide by immutability analysis from breaking code which assumes a given type is immutable. Alternatively, virtual members can be eliminated.
5. Immutable types must not leak the ‘this’ reference during construction.
This rule is subtle. Although initonly ensures fields are never written to after construction, these fields may be written to any number of times while an object is actively being constructed. If some code called during construction publishes a reference to the object (e.g. by storing it in a static variable), then other threads might access the object while it still appears to be mutable. They may witness a partially initialized object, an object whose fields are still changing, and so on.
And that’s it! 5 simple rules. It may sound complicated, but the code to perform the static analysis for all but the last one is straightforward and a dozen-or-so lines apiece. A few things are worth mentioning. First, the CLR’s memory model ensures that, after an object is constructed and published, reads of its fields cannot be reordered to break immutability. Additionally, there are many immutable types in the CLR today that are not verified as such. So in my analysis rules, I have hard-coded a set of well known immutable types so that you can use them w/out problem: Object, String, DateTime, TimeSpan, Boolean, Byte, SByte, Int16, Int32, Int64, IntPtr, UInt16, UInt32, UInt64, UIntPtr, Decimal, Double, Single, and ValueType. Ideally if we supported this in the .NET Framework, we'd annotate them.
Impact to imperative programming
Programming in an immutable world is rather tricky. As someone who has done most of his programming for the past 10+ years in C-style languages, I just take for granted that data structures change over time. With immutability, there tends to be a whole lot more copying and functional-style function calling, where data structures are passed as an input argument and the “mutated” copy is returned as an output argument. I’m trying to kick the mutability habit, since I fully believe immutability is one key to being successful with massive degrees of parallelism. And it usually leads to cleaner code too.
But it’s hard. Using something as simple as an array field on an immutable type will fail the above rules, since the CLR’s array types are mutable. I’ll explore building one below, but this probably points to a need for better immutability support in the .NET Framework. It’s not too difficult to imagine providing base classes for common needs when building immutable data structures.
Circumventing the analysis
As you begin to explore immutable types in a bit more depth, you’ll realize there are often cases where immutability-by-cleverness is possible. That is to say, although one or more of the rules above have been violated, the end result still appears to be immutable. I can build an immutable list out of a linked list to avoid depending on CLR arrays, and mark the nodes as immutable, but they must refer to elements stored within the list. Should we require the elements to also be immutable? Perhaps, but perhaps not, depending on whether you consider the state of the list to also include the state of the elements inside it. Usually that wouldn't be the case. And, besides, if we know what we’re doing, we can create an immutable list based on an array anyway, which enables O(1) IList<T>-style random access. We just need to be careful to encapsulate the array object and to never store an element into it post-construction.
To facilitate working around some of the rules in ways that are often necessary, I have provided options on ImmutableAttribute to suppress certain checks. Additionally, there is a MutableAttribute which can mark certain fields to indicate they are not subject to the same restrictions as other fields on an immutable type.
An ImmutableList<T>
As an illustration, here is an ImmutableList<T>. It implements IList<T>, but sadly it must throw exceptions in several circumstances because both IList<T> and ICollection<T> offer methods that are intrinsically mutable. Undoubtedtly there are bugs because I whipped it up quickly and have omitted a lot of needed error checking. I just wanted to give the general idea of how it might be done:
/// <summary> /// A list that has been written to be observationally immutable. A mutable array /// is used as the backing store for the list, but no mutable operations are offered. /// </summary> /// <typeparam name="T">The type of elements contained in the list.</typeparam> [Immutable] public sealed class ImmutableList<T> : IList<T> {
[Mutable] private readonly T[] m_array;
/// <summary> /// Create a new list. /// </summary> public ImmutableList() { m_array = new T[0]; }
/// <summary> /// Create a new list, copying elements from the specified array. /// </summary> /// <param name="arrayToCopy">An array whose contents will be copied.</param> public ImmutableList(T[] arrayToCopy) { m_array = new T[arrayToCopy.Length]; Array.Copy(arrayToCopy, m_array, arrayToCopy.Length); }
/// <summary> /// Create a new list, copying elements from the specified enumerable. /// </summary> /// <param name="enumerableToCopy">An enumerable whose contents will /// be copied.</param> public ImmutableList(IEnumerable<T> enumerableToCopy) { m_array = new List<T>(enumerableToCopy).ToArray(); }
/// <summary> /// Retrieves the immutable count of the list. /// </summary> public int Count { get { return m_array.Length; } }
/// <summary> /// A helper method used below when a mutable method is accessed. Several /// operations on the collections interfaces IList<T> and /// ICollection<T> are mutable, so we cannot support them. We offer /// immutable versions of each. /// </summary> private static void ThrowMutableException(string copyMethod) { throw new InvalidOperationException( String.Format("Cannot mutate an immutable list; " + "see copying method '{0}'", copyMethod)); }
/// <summary> /// Whether the list is read only: because the list is immutable, this /// is always true. /// </summary> public bool IsReadOnly { get { return true; } }
/// <summary> /// Accesses the element at the specified index. Because the list is /// immutable, the setter will always throw an exception. /// </summary> /// <param name="index">The index to access.</param> /// <returns>The element at the specified index.</returns> public T this[int index] { get { return m_array[index]; } set { ThrowMutableException("CopyAndSet"); } }
/// <summary> /// Copies the list and adds a new value at the end. /// </summary> /// <param name="value">The value to add.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndAdd(T value) { T[] newArray = new T[m_array.Length + 1]; m_array.CopyTo(newArray, 0); newArray[m_array.Length] = value; return new ImmutableList<T>(newArray); }
/// <summary> /// Returns a new, cleared (empty) immutable list. /// </summary> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndClear() { return new ImmutableList<T>(new T[0]); }
/// <summary> /// Copies the list and modifies the specific value at the index provided. /// </summary> /// <param name="index">The index whose value is to be changed.</param> /// <param name="item">The value to store at the specified index.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndSet(int index, T item) { T[] newArray = new T[m_array.Length]; m_array.CopyTo(newArray, 0); newArray[index] = item; return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="item">The element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemove(T item) { int index = IndexOf(item); if (index == -1) { throw new ArgumentException("Item not found in list."); }
return CopyAndRemoveAt(index); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="index">The index of the element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemoveAt(int index) { T[] newArray = new T[m_array.Length - 1]; Array.Copy(m_array, newArray, index); Array.Copy(m_array, index + 1, newArray, index, m_array.Length - index - 1); return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list adn inserts a particular element. /// </summary> /// <param name="index">The index at which to insert an element.</param> /// <param name="item">The element to insert.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndInsert(int index, T item) { T[] newArray = new T[m_array.Length + 1]; Array.Copy(m_array, newArray, index); newArray[index] = item; Array.Copy(m_array, index, newArray, index + 1, m_array.Length - index); return new ImmutableList<T>(newArray); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Add(T item) { ThrowMutableException("CopyAndAdd"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Clear() { ThrowMutableException("CopyAndClear"); }
/// <summary> /// Checks whether the specified item is contained in the list. /// </summary> /// <param name="item">The item to search for.</param> /// <returns>True if the item is found, false otherwise.</returns> public bool Contains(T item) { return Array.IndexOf<T>(m_array, item) != -1; }
/// <summary> /// Copies the contents of this list to a destination array. /// </summary> /// <param name="array">The array to copy elements to.</param> /// <param name="index">The index at which copying begins.</param> public void CopyTo(T[] array, int index) { m_array.CopyTo(array, index); }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> public IEnumerator<T> GetEnumerator() { for (int i = 0; i < m_array.Length; i++) { yield return m_array[i]; } }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<T>)this).GetEnumerator(); }
/// <summary> /// Finds the index of the specified element. /// </summary> /// <param name="item">An item to search for.</param> /// <returns>The index of the item, or -1 if it was not found.</returns> public int IndexOf(T item) { return Array.IndexOf<T>(m_array, item); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.Insert(int index, T item) { ThrowMutableException("CopyAndInsert"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> bool ICollection<T>.Remove(T item) { ThrowMutableException("CopyAndRemove"); return false; }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.RemoveAt(int index) { ThrowMutableException("CopyAndRemoveAt"); }
}
I won’t spend much time going over this code. Just notice that the type is marked with the ImmutableAttribute, the array field is marked with the MutableAttribute (since it’s not itself an immutable type and would fail the analysis otherwise), and that any operations that modify the list must make a copy of the entire thing.
Summary
This has been an interesting exercise. Through it, I have come to realize that first class immutability in the type system is not such a farfetched dream. The most onerous aspect to it is probably the restrictions it imposes on subclassing in the programming model, effectively bifurcating the type system into those types that are mutable and those types that are immutable. But, in the end, I’m not so sure it’s too bad a problem: interchanging the two seems like a very bad idea anyway.
Feedback on all of this would be appreciated. Do you see it as useful? If you had it, would you use it in your programs today? Do you believe that it is one step needed (of many!) to bring us towards a world in which building concurrent programs is simpler?
 Saturday, November 10, 2007
There are several docs out there that describe the CLR memory, most notably this article.
When describing the model, one can either use acquire/release, barrier/fence, or happens-before terminology. They all acheive the same goal, so I will simply choose one, acquire/release: an acquire operation means no loads or stores may move before it, and a release operation means no loads or stores may move after it. I can explain it with such simple terms because the CLR is homogeneous in the kinds of operations it permits or disallows to cross such a barrier, e.g. there's never a case where loads may cross such a chasm but stores may not.
Despite the great article referenced above, I find that it’s still not entirely straightforward. It is important to code to a well-understood abstract model when writing lock-free code. For reference, here are the rules as I have come to understand them stated as simply as I can:
- Rule 1: Data dependence among loads and stores is never violated.
- Rule 2: All stores have release semantics, i.e. no load or store may move after one.
- Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.
- Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).
- Rule 5: Loads and stores to the heap may never be introduced.
- Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.
Note that by this definition, non-volatile loads are not required to have any sort of barrier associated with them. So loads may be freely reordered, and writes may move after them (though not before, due to Rule 2). With this model, the only true case where you’d truly need the strength of a full-barrier provided by Rule 4 is to prevent reordering in the case where a store is followed by a volatile load. Without the barrier, the instructions may reorder.
It is unfortunate that we’ve never gone to the level of detail and thoroughness the Java memory model folks have gone to. We have constructed our model over years of informal work and design-by-example, but something about the JMM approach is far more attractive. Lastly, what I’ve described applies to the implemented memory model, and not to what was specified in ECMA. So this is apt to change from one implementation to the next. I have no idea what Mono implements, for example.
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 27 | 28 | 29 | 30 | 31 | 1 | 2 | | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 10 | 11 | 12 | 13 | 14 | 15 | 16 | | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | 24 | 25 | 26 | 27 | 28 | 29 | 1 | | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Browse by Category:
Notables:
|