|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Wednesday, February 27, 2008
I’ve mentioned before that the CLR has a central wait routine that is used by any synchronization waits in managed code. This covers WaitHandles (AutoResetEvent, ManualResetEvent, etc.), CLR Monitors (Enter, Wait), Thread.Join, any APIs that use such things, and the like. This routine even gets involved for waits that are internal to the CLR VM itself. This is primarily done so that the runtime can pump appropriately on STAs, and was later used to experiment with fiber-mode scheduler in SQL Server. Two years ago I showed how to use these capabilities to build a deadlock detection tool via the CLR’s hosting APIs. Sadly IO-based waits (like FileStream.Read) do not route through this.
The System.Threading.SynchronizationContext class has a very cool (but not widely known) feature that enables you to extend this central wait routine. To do so requires four steps: subclass SynchronizationContext; call base.SetWaitNotificationRequired; override the virtual Wait method to contain some custom wait logic; and then register your SynchronizationContext via the static SynchronizationContext.SetSynchronizationContext method. After you do this, most waits that occur on that thread will be redirected through your custom Wait method.
Here's a very simple example of this:
using System; using System.Threading;
class BlockingNotifySynchronizationContext : SynchronizationContext { public BlockingNotifySynchronizationContext() { SetWaitNotificationRequired(); }
public override int Wait( IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout) { Console.WriteLine("Begin wait: {0} handles for {1} ms", waitHandles.Length, millisecondsTimeout); int ret = base.Wait(waitHandles, waitAll, millisecondsTimeout); Console.WriteLine("Finished wait"); return ret; } }
class Program { public static void Main() { SynchronizationContext.SetSynchronizationContext( new BlockingNotifySynchronizationContext()); ManualResetEvent mre = new ManualResetEvent(false); mre.WaitOne(1000, false); } }
If you run this, you'll see some messages printed to the console to do with beginning and finishing waits.
A few things are worth noting:
- The Wait signature looks a lot like WaitForMultipleObjects. In fact, it's fairly trivial to turn around and call it via a P/Invoke. Recovering from APCs is a tad tricky however, and you'd have to do all of your own timeout management, message pumping, and the like.
- You receive an IntPtr[], making it incredibly difficult to correlate the objects being waited on with the actual synchronization objects from which they came (e.g. Monitors, EventWaitHandles, etc.).
- The code that runs inside Wait is the wait itself. In other words, when you return, whatever code initiated the wait is going to assume that the API is being honest and truthful.
Another subtlety is that this code, as written, is subject to stack overflow. Why is that? In this particular instance, Console.WriteLine may need to block internally because it automatically serializes access to the output stream. Well, when that blocks, it just goes through the same central wait routine, which calls back out, and so on and so forth. Obviously this extends to any code that uses locks, including CLR services like cctors. So the code you write here needs to be very carefully written so as not to ever block recursively.
Notice that some waits do not call out. The reason is that the callout stems from a routine deep inside the CLR VM itself. Some waits may occur while a GC is in progress, at which point it’s illegal to invoke managed code. The CLR just reverts to using its own default wait logic in such cases.
Lastly this is not a foolproof mechanism. Other components can register their own SynchronizationContexts, replacing the context you’ve installed completely. This may mean you miss some blocking calls. If you are building a ThreadPool, you can always reset it each time the thread is returned, or even use your own ExecutionContexts when running them. It is also possible that such a context will exist by the time you get around to installing your own. For example, ASP.NET, WinForms, and WPF use custom SynchronizationContexts.
If such a context exists already when you install this custom one, you can always defer to it for things like CreateCopy, Send, Post, and Wait. For example, here’s a SynchronizationContext implementation that allows custom before/after wait actions, but otherwise relies on the existing SynchronizationContext (if any) for things like Send, Post, and Wait:
using System; using System.Threading;
delegate object PreWaitNotification( IntPtr[] waitHandles, bool WaitAll, int millisecondsTimeout); delegate void PostWaitNotification( IntPtr[] waitHandles, bool WaitAll, int millisecondsTimeout, int ret, Exception ex, object state);
class BlockingNotifySynchronizationContext : SynchronizationContext { private SynchronizationContext m_captured; private PreWaitNotification m_pre; private PostWaitNotification m_post;
public BlockingNotifySynchronizationContext( PreWaitNotification pre, PostWaitNotification post) : this(SynchronizationContext.Current, pre, post) { }
public BlockingNotifySynchronizationContext( SynchronizationContext captured, PreWaitNotification pre, PostWaitNotification post) { SetWaitNotificationRequired();
m_captured = captured; m_pre = pre; m_post = post; }
public override SynchronizationContext CreateCopy() { return new BlockingNotifySynchronizationContext( m_captured == null ? null : m_captured.CreateCopy(), m_pre, m_post); }
public override void Post(SendOrPostCallback cb, object s) { if (m_captured != null) m_captured.Post(cb, s); else base.Post(cb, s); }
public override void Send(SendOrPostCallback cb, object s) { if (m_captured != null) m_captured.Send(cb, s); else base.Send(cb, s); }
public override int Wait(IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout) { object s = m_pre(waitHandles, waitAll, millisecondsTimeout); int ret = 0; Exception ex = null;
try { if (m_captured != null) ret = m_captured.Wait(waitHandles, waitAll, millisecondsTimeout); else ret = base.Wait(waitHandles, waitAll, millisecondsTimeout); } catch (Exception e) { ex = e; throw; } finally { m_post(waitHandles, waitAll, millisecondsTimeout, ret, ex, s); } return ret; } }
class Program { public static void Main() { SynchronizationContext.SetSynchronizationContext( new BlockingNotifySynchronizationContext( delegate { Console.WriteLine("PRE"); return null; }, delegate { Console.WriteLine("POST"); } ) ); ManualResetEvent mre = new ManualResetEvent(false); mre.WaitOne(1000, false); } }
That’s a fair bit of code, but it's mostly boilerplate. It allows you to easily specify a pre/post action to be invoked upon each blocking call, and will work on ASP.NET, GUI threads, and the like. The pre action can return an object for the post action to inspect. And the post action is given the return value and exception (if any). If no SynchronizationContext was present when installed, it just defers to the base SynchronizationContext implementation of Send, Post, and Wait.
Now what you actually do inside those callbacks, I suppose, is entirely your business …
 Tuesday, February 19, 2008
 Sunday, February 17, 2008
A long time ago, I wrote that you’d never need to write another finalizer again. I’m sorry to say, my friends, that I may have (unintentionally) lied. In my defense, the blog title where I proclaimed this fact did end with “well, almost never.”
Finalizers have historically been used to ensure reclamation of resources that are finite or outside of the purview of the CLR’s GC. Native memory and Windows kernel HANDLEs immediately come to mind. Without a finalizer, resources would leak; server apps would die, client apps would page like crazy, and life would be a mess. For such resources, properly authored frameworks also provide IDisposable implementations to eagerly and deterministically reclaim the resources when they are definitely done. Three years ago, I wrote a lengthy treatise on the subject.
The finalizer is there as a backstop. It is often meant to clean up after bugs , such as when a developer forgets to call Dispose in the first place, tried to but failed due to some runtime execution path skipping it (often exceptions-related), or a framework or library author hasn’t respect the transitive IDisposable rule, meaning that eager reclamation isn’t even possible. It also avoids tricky ref-counting situations as are prevalent in native code: since the GC handles tracking references, you, the programmer, can avoid needing to worry about such low-level mechanics. In all honesty, the finalizer’s main purpose is probably that we wanted to facilitate a RAD and VB-like development experience on .NET, where programmers don’t need to think about resource management at all, unlike C++ where it’s in your face. While one could reasonable argue that IDisposable is all you need (the C++ argument), that would have gone against this goal.
Concurrency changes things a little bit. A thread is just another resource outside of the purview of the CLR’s GC, and is actually backed by a kernel object and associated resources like non-pageable memory for the kernel stack, some data for the TEB and TLS, and 1MB of user-mode stack, to name a few. They also add pressure to the thread scheduler. Threads are fairly expensive to keep around, and “user” code is responsible for creating and destroying them.
Now, it’s true that we are moving towards a world where threads and logical tasks are not one and the same. This is a ThreadPool model. But it’s also true that a task that is running on a thread is effectively keeping that thread alive, and perhaps more concerning, preventing other tasks from running on it. Use of a resource is a kind of actual resource itself, although more difficult to quantify.
So, what does all of this have to do with finalization?
If some object kicks off a bunch of asynchronous work and then becomes garbage—i.e. the consumer of that object no longer needs to access it’s information—then it’s possible (or even likely) that any outstanding asynchronous operations ought to be killed as soon as possible. Otherwise they will continue to use up system resources (like threads, the CPU, IO, system locks, virtual memory, and so on), all in the name of producing results that will never be needed. The only reason this task stays alive is because the scheduler itself has a reference to it.
Just as with everything discussed above pertaining to non-GC resources, we’d like it to be the case that such a component would offer two methods of cleanup:
- Dispose: to get rid of associated asynchronous computations immediately when the caller knows they no longer need the object.
- Finalization: to get rid of associated resources that are still outstanding when the GC collects the root object that is responsible for managing those asynchronous computations.
You’ll notice that we support cancelation in a first class and deeply-ingrained way in the Task Parallel Library. While not exposed in PLINQ (yet), there is actually cancelation support built-in (though not as fundamental as we’d like (yet)). This is a useful hook to allow us to build support for both resource reclamation models. In this sense, cancelation as a pattern of stopping expensive things from happening is quite similar to resource cleanup. Clearly they aren't identical, but we will need to figure out the specific deltas.
I should also point out that we will prefer and push structured parallelism for many reasons. Parallel.For is an example, where the API looks synchronous but is internally highly parallel. One reason we like this model is that the point at which concurrency begins and ends is very specific. The call won’t return until all work is accounted for and completed. It’s only when you bleed computations into the background after a call returns that everything stated above becomes an issue. This is obviously nice for failures (e.g. you are forced to deal with them right away), but also because it alleviates this problem nicely.
I don’t think we’re at a point where we can recommend definite tried-and-true best practices for cancelation of asynchronous work and how it pertains to resource management. I do think we need to get there by the time we ship Parallel Extensions V1.0. And I think we will. Here’s a snapshot summary of my current thinking, however, and I would love to get feedback on it:
- We should tell people to implement IDisposable and to Cancel tasks inside Dispose, when their classes own unstructured asynchronous computations.
- We may or may not want people to implement a finalizer to do the same. I currently believe we will.
- I am undecided about whether these cancelations should be synchronous. In some sense, they should be since you’d like to know that all resources have definitely been reclaimed. But this would mean blocking (possibly indefinitely) on the finalizer thread. That’s a definite no-no. Blocking in Dispose would mean blocking (possibly indefinitely) inside a finally block. That’s also a no-no, although it’s less severe of one than the finalizer. It just means hosts can’t take over threads as easily when they need to abort them. Thankfully we offer the Task.Cancel method which is non-blocking. Possibly we should suggest synchronous cancelation inside of Dispose, and asynchronous inside of the finalizer.
- If we did do synchronous anywhere, presumably with Task.CancelAndWait, we’d need to recommend a practice for communicating failures. Throwing from Dispose is discouraged, but so is swallowing failures. The kind of code usually run inside of Dispose is much less likely to generate exceptions than running arbitrary tasks full of user code. Catch-22.
- There are some cases we can do the cancelation thing ourselves. Whether we do or not is subject to debate, but I believe we should. If we ensured the scheduler’s references are weak, then once all other code in the process drops the reference, we would not schedule it. This implies that tasks are seldom executed “for effect”, which is certainly a judgment call. It might be worth exposing an option that allows “for effect” tasks to be created not subject to this rule.
- The trickiest case is when a task is already running. For short-running tasks, this may not be a huge concern, but a lot of such tasks do recursively queue up additional ones. It would be nice if the fact that its results are no longer needed somehow flowed automatically to the task, perhaps through cancelation. This also means waking tasks from blocking calls.
It’s interesting to point out that 5 and 6 were part of the original motivation for the inventors of the future abstraction. They noted that representing computations as futures, and allowing the GC to collect them before they run once they’ve become unreachable, effectively makes computations garbage-collectable. This, I think, is a neat idea, particularly if your program uses futures pervasively.
In any case, I wanted to point these subtleties out, and hear any feedback folks out there might have. What I find particularly interesting about concurrency, as we move forward on things like Parallel Extensions, is that there are a lot of subtle implications to the way programs are written. This includes fundamental things like exceptions and resource management. There are other subtle impacts, like whether the ordering of results coming out of a computation matters. PLINQ surfaced this early on, and I didn’t expect the pervasive nature of the issue. Debugging and profiling are also extraordinarily different. I suspect we’ll continue running into many such things throughout the evolution to highly parallel software.
 Wednesday, February 13, 2008
A couple weeks back, we started filming a bunch of Channel9 videos about various aspects of the Parallel Computing team. This is the larger team responsible for Parallel Extensions to .NET, among other things. We'll of course spend some time on Parallel Extensions in upcoming videos.
But who better to kick it off than Burton Smith, a legend in the parallel computing arena?
On General Purpose Supercomputing and the History and Future of Parallelism http://channel9.msdn.com/Showpost.aspx?postid=382639
Burton's the kind of guy that you run into when meandering the hallways, have a 5 minute conversation, and walk away with at least 20 relevant and fascinating papers on parallel computing to go off and read. But now instead of reading 20 papers, you can just go watch the video.
 Saturday, February 09, 2008
Torn reads are possible whenever you read a shared value without synchronization that is either misaligned and/or which spans an addressible pointer-sized region of memory. This can lead to crashes and data corruption due to bogus values being seen. If not careful, torn reads can also violate type safety. If you have a static variable that points to an object of type T, and your program only ever writes references to objects of type T into it, you may still end up accessing a memory location that isn't actually a T. How could this be?
You guessed it. Torn reads apply to pointer values just as much as they do to ordinary values. So a thread reading a pointer in-flux could see bits of its value in separate pieces, blending the state before and after the update. Dereferencing this mutant pointer would lead you off into an unknown place in the address space, and most certainly not to an instance of T, breaking type safety. Since VC++ aligns pointer fields automatically, you'd have to go out of your way with __declspec(align(N)) or an unaligned allocator to create this situation. Similarly with .NET's StructLayoutAttribute. Moreover, it turns out that .NET guards against this problem in its type loader, by rejecting any types containing improperly aligned object references. This is good news, because otherwise a plethora of security vulnerabilities would be possible. But VC++ doesn't offer any such guarantees.
This is another example where trying to program in a lock-free manner can lead to difficulties that aren't present when you stick to ordinary locking.
 Saturday, February 02, 2008
I'm still plugging away at my new concurrency book.
The fact that we have cover art and that it's available for pre-order on Amazon are both great signs that it's getting close to being done:

The full TOC (as of right now) is available here. A few chapters are still in the works, and there's a bit of editing ahead. I've also decided to write Appendices on PFX and CCR. In the meantime, definitely feel free to pick up a PDF of some early content via RoughCuts. Believe it or not, this is a way to provide real-time feedback that can impact the book before it hits the presses.
I have to say this is the most complicated piece of writing I've ever undertaken. Not only does the material require going very deep in a lot of hard areas, but I've noticed that as a community we're learning to speak speak better and to use consistent terminology about various aspects of concurrency. To stay relevant, I'm finding a fair bit of content has had to be reworked several times.
At the same time, and from a selfish standpoint, this book has been a wonderful forcing function to learn everything there is to know about concurrency. Not that this is a realistic goal, mind you, but gosh darnit I'm trying. I'm convinced at this point that the best way to become an expert on something is to try to teach it to other people. And what better way than writing a book? If you can't explain it clearly to a broad, on-the-average-less-expert-than-you audience, you probably have a faulty mental model to begin with. The process of merely trying usually reveals this. I encourage everybody out there to try it. At least once.
 Thursday, January 17, 2008
Most schedulers try to keep the number of runnable threads (within a certain context) equal to the number of processors available. Aside from fairness desires, there’s usually little reason to have more: and in fact, having more can lead to more memory pressure due to the cost of stacks and working set held on the stacks, non-pageable kernel memory, per-thread data structures, etc., and also has execution time costs due to increased pressure on the kernel scheduler, more frequent context switches, and poor locality due to threads being swapped in and out of processors. In extreme cases, blocked threads can build up only for all of them to be awoken and released to wreak havoc on the system at once, hurting scalability.
A naïve approach of one-thread-per-processor works great until a thread on one of these processors blocks, either “silently” as a result of a pagefault or “explicitly” due to IO or a synchronization wait. (I should mention that due to the plethora of hidden blocking calls in the kernel, Win32 user-mode code, and the .NET Framework, a lot of IO and synchronization waiting is “silent” too.) In this case, a processor becomes idle (0% utilization) for some period of time. If there is other work that could be happening instead, this is clearly bad.
Many programs spend most of their time blocked.
Four particular solutions to this problem are commonplace on Windows:
- Create more threads than processors and hope for the best. This trades some amount of runtime efficiency for the insurance that processor time won’t go to waste.
- Periodically poll for blocked threads using some kind of daemon and respond to the presence of one by creating a new thread to execute work. Eventually this thread would go away, for instance when the blocked thread awakens. This is the approach used by the CLR ThreadPool, although it caps the total. (The TPL uses this appraoch today also, but we're changing/augmenting it.) For obvious reasons, this approach is quite flawed: you easily end up with more running threads than processors, have to trade-off more frequent polling--which implies more runtime overhead--with less frequent polling--which adds time to the latency in the scheduler’s response to a blocked thread.
- Block on an IO Completion Port at periodic intervals--e.g. when dispatching a new work item in a ThreadPool-like thing--which has the effect of throttling running threads. This still requires creating more threads than processors, but helps to ensure few of them run at the same time. Unfortunately, it still does lead to more of them actually running than you’d like since the port can only prevent a thread from running when it goes back and blocks on the port in the kernel. But this is only done periodically.
- Specialized systems like SQL have used Fibers in the past to avoid needing full-fledged threads to replace the blocking ones. To do this, they ensure all blocking goes through a cooperative layer, which notifies a user-mode scheduler (UMS). The user-mode scheduler maintains a list of blocked Fibers, but can multiplex runnable Fibers onto threads, keeping the number of threads equal to the number of processors. A thread effectively never blocks, Fibers do, but this requires all blocking to notify the UMS. Aside from extraordinarily closed world systems, this approach doesn’t usually work. That’s because Fibers are not threads and multiplexing entirely different contexts of work onto a shared pool of threads (at blocking points) can easily lead to thread affinity nightmares.
The CLR facilities #4 by funneling all synchronization waits in managed code through one point in the VM codebase. This was done initially to ensure consistent message pumping on STA threads, via CoWaitForMultipleHandles. But it was then exploited in 2.0 to expand the CLR Hosting APIs to enable custom hosts to hook all synchronization calls. This is convenient for building interesting debugging tools, like deadlock detecting hosts.
A fifth approach is often viable and even preferable, and that is to avoid blocking altogether. Often referred to as continuation passing style (CPS), the idea is that, where you’d normally have blocked, the callstack is transformed into a resumable continuation. For an example of this, look at Jeff Richter’s ReaderWriterLockGate class: it’s a reader/writer lock with no blocking. Asynchronous IO is supported by files and sockets on Windows, and enables a similar style of programming. The continuation is ordinarily just a closure object that has enough state to restart itself when the sought-after condition arises. When it does arise, the continuation is scheduled on something like the CLR ThreadPool. This avoids burning any threads while the wait occurs.
For obvious reasons, CPS is usually hard to achieve in .NET: there is no language support for first class continuations in .NET, all synchronization primitives are wait-based, and keeping a whole stack around in memory would be a terrible idea anyway. You’d also need to worry about resources held on the stack, including locks. Instead, you should save only that state which is needed during the continuation. In a message passing system this is much simpler, since most of the program is full of continuations in the form of message handlers. For an example of such a system, check out the Concurrent and Coordination Runtime (CCR) and/or Erlang.
Even in message passing systems, it’s impossible to escape the fundamental blocking issue, since it is platform-wide. And in ordinary imperative programs, the CPS transformation is near-impossible at the leaves of callstacks: unless you have whole program knowledge, who knows what your caller expects? Most APIs are synchronous. Futures and Promises potentially make this style of programming easier, though in the extreme all APIs would need to return a Promise rather than a true value.
Nothing conclusive, just some random thoughts ...
 Wednesday, January 02, 2008
I've been very remiss with blogging lately. This was mostly due to travel (5 weeks out of the past 2 months), but also because I'm focused intensely on the book, and have been super busy working on Parallel Extensions: the December CTP, hiring for our team, planning what we do in the next year, designing stuff, implementing stuff, ... it's been a lot of fun. I've also been trying to learn classical guitar after 15 years of playing electric and some acoustic (metal, rock, blues). It's more challenging than I anticipated... I guess I should have started when I was six.
Somewhere in there, I recorded an episode of .NET Rocks with Carl and Richard about Parallel Extensions: check it out. I haven't listened to it yet, but I distinctly remember having a lot of fun. When the hour was up, I couldn't believe how quickly the time had passed.
Happy New Year, everybody. I promise to blog more in the coming months. Famous last words, eh? ;)
 Thursday, November 29, 2007
Today is an extraordinarily exciting day for me. After about two years of work by several great people across the company, the first Parallel Extensions (a.k.a. Parallel FX) CTP has been posted to MSDN. Check out Soma’s blog post for an overview, and the new MSDN parallel computing dev center for more details. Keep an eye on the team’s new blog too, as we’ll be posting a lot of content there as we make progress on the library; in fact, thanks to Steve (who writes blog posts in his sleep), there’s already a bunch of reading to catch up on!
I began kicking the tires on PLINQ back in October of 2005. In September of 2006, I described PLINQ as “a fully functional prototype” and “research.” Well, it’s come a long way since then, and we’re finally ready for real human beings to start hammering on it. Not only that, but we’ve expanded the scope of the original project significantly, from PLINQ to Parallel FX, to include new imperative data parallel APIs (for situations where expressing a problem in LINQ is unnatural), powerful task APIs that offer waiting and cancelation, all supported by a common work scheduler based on CILK-style work-stealing techniques developed in collaboration with Microsoft Research. And there’s even more to come. Daniel Moth spilled some beans in his screencast on Channel9 when he described the additional data structures, like synchronization primitives and scalable collections, which will come online later. Some of them are even in the new CTP, but have remained internal for now.
The shift to parallel computing will have an industry-wide impact, and will undoubtedly take several phases and many years to tame completely. We have focused on the lowest hanging fruit and the most important foundational shifts in direction we can incite—like encouraging the over-representation of latent parallelism to aid in future scalability—but there are certainly things that the current CTP doesn’t fully address. GPGPUs, verifiable thread safety, automatic parallelism, great tools support, etc., are all topics that are of great interest to us. We have a lot of work to do for the final release of Parallel FX, and expect a whole lot of feedback from the community on specific features and general direction. So let us have it! You can use our Connect site, or even just email me directly at joedu AT you-know-where DOT com.
Consider this an early Christmas present. Now you have something fun to do, in the privacy of your own office, when trying to avoid family members during the holidays. Whoops—did I say that out loud? Enjoy!
 Saturday, November 17, 2007
I recently described an approach to adding immutability to existing, mutability-oriented programming languages such as C#. When motivating such a feature, I alluded to the fact that immutability can make concurrent programming simpler. This seems obvious: a major difficulty in building concurrent systems today is dealing with the ever-changing state of “the world,” requiring synchronization mechanisms to control concurrent reads and writes. This synchronization leads to inefficiencies in the end product, complexity in the design process, and, if not done correctly, bugs: race conditions, deadlocks due to the lack of composability of traditional locking mechanisms, and so forth.
Lock-free algorithms simplify matters (in some ways) by compressing state transitions into single, atomic writes. For instance, a lock-free stack has a single head node; pushing requires swapping the current head with the new node to be enqueued, and popping entails swapping the current head with its current next pointer. I mentioned some of the benefits of lock-freedom here. Sadly these lock-free techniques do not really compose. For instance, if I want to pop from a lock-free stack and push onto another, in an atomic fashion, single-word compare-and-swap is insufficient. We’ll return to this later in the context of immutability.
I will draw an analogy between lock-free algorithms and synchronization involving immutable objects. Imagine an immutable type represents “the world” in a concurrent system. Threads are constantly interacting with the world, by reading its state, and sometimes changing it. Because the world is immutable, we must copy it and publish an entire new one any time we wish our changes to become visible. This is key! It enables optimistic concurrency and synchronization protocols more similar to lock-free algorithms than lock-based algorithms.
Because individual components inside the world can’t change, the entire world can be read from without synchronization. Moreover, the creation of the new world needn’t use synchronization either, so long as we are willing to tolerate the possibility of wasted work, as is always the case with optimistic concurrency: we are being optimistic that the work will indeed not have been wasted, because writes are infrequent. That’s the basic premise of most concurrency algorithms, e.g. reader/writer locks. What requires synchronization is merely the publication of the new world.
internal static World s_theWorld = new World(…); // The initial world. internal void ReadTheWorld() { World w = s_theWorld; // This copy never changes! // Read state from the world … } internal void ChangeTheWorld() { World oldWorld; World newWorld; do { oldWorld = s_theWorld; // Read the world. // Read state to compute the new world … newWorld = new World(…); } while (Interlocked.CompareExchange(ref s_theWorld, newWorld, oldWorld) != oldWorld); }
In this scheme, there will always be inherent races between threads trying to call ReadTheWorld and threads trying to call ChangeTheWorld. This is a basic characteristic of the system. But it is more functional. If ReadTheWorld produces correct answers for any world sequentially, then it will also produce correct answers in a parallel program too, since worlds cannot change. And because writes are guaranteed to retire in order on the CLR 2.0 memory model, we can be assured that ReadTheWorld will not be subject to memory reordering bugs.
This is a very powerful technique, making code that reads from and changes the world much simpler to write, understand, and debug. How many times have you longed for a programming technique where code can be tested in a sequential context and still be guaranteed to produce the right answers in a parallel context?
With all that said, it’s sadly not applicable to every synchronization problem. Some of the problems that arise can be worked around, and others are more difficult. I will outline the most difficult of them.
Problem 1. As with most lock-free algorithms, livelock is a distinct possibility.
Livelock occurs because many threads may attempt to compute a new world simultaneously. If they both read the same old world, only one will succeed at publishing their copy of the new world. The other thread will have to go back ‘round the loop, re-read the current world, compute a new one, and try again. Computing a new world may take some time which, coupled with high arrival times, may lead to unfair forward progress and wasted work due to spinning. We hope, however, that in most cases the frequency of writes is small enough to make this a less pressing issue. Also note that the algorithm shown above is truly wait-free: the failure of one thread indicates that another thread has succeeded in publishing a new world. So forward progress of our system is not compromised by this problem.
Problem 2. ChangeTheWorld may need to perform impure, irreversible side effects.
A nasty issue in today’s programming languages is the reliance on side-effects. (OK, that’s unfair. This is a nasty issue in today’s conventional programming techniques and styles, and the languages we use simply accommodate these techniques. This is mostly because, if we didn’t, developers would find a way to circumvent the system, prefer to use alternative languages, and so on.) But if, corresponding with the update of the world, another side effect must be made, this technique doesn’t accommodate that. We could work around that with other synchronization techniques but it is likely to be cumbersome. In some cases, there is no harm in performing a side effect, so long as it does not “undo” the side effect associated with a newer world.
For instance, say we needed to update a GUI with the results of the new world computation. We would want to ensure the GUI was refreshed with the most recent update. Thankfully we don’t need to worry about the world changing itself, so we may do something like this:
internal void ChangeTheWorld() { … as before … myGuiControl.BeginInvoke(delegate { if (s_theWorld == newWorld) // Update the GUI with newWorld. }); }
This ensures we only update the GUI with the latest world. Imagine the sequence: world A is published, world B is published, we refresh the GUI with world B, we receive the BeginInvoke associated with world A now (out of order, which is perfectly reasonable). We must make sure that the graphical depiction of world A doesn’t overwrite world A. Thankfully the automatic mutual exclusion inherent in GUI programming ensures that our dirty check s_theWorld == newWorld is sufficient to prevent this from occurring. Clearly this technique can’t always apply.
Problem 3. This technique doesn’t expand to support multiple immutable fields.
This is a fundamental flaw with techniques that rely on a single, atomic compare-and-swap. Processor vendors have recently published papers for technologies like transactional memory and multi-word compare-and-swap (and in fact, there is plenty of research into pure software implementations of both), which would allow us to fix this problem.
The title of this article, by the way, is a play on Wadler’s paper ‘Linear types can change the world!’ I highly recommend that paper to anybody interested in immutability, isolation, and purity. This paper demonstrates type system support for linearity which, in essence, enables safe reading and mutual exclusion via the type system instead of locking. There is only one world, after all, the paper argues, so the traditional functional programming approach of disallowing mutability is actually a bit unrealistic.
In summary, immutable types add a lot of simultaneous power and simplicity to parallel programming. They are certainly not a panacea, since they require shifting to a more functional style of programming, where side-effects are discouraged and state is updated by making altered copies. This is often non-trviial. But this is one of many steps in what I consider to be the right direction. A direction that will enable us to tolerate large amounts of parallelism... without causing all of us to die of race-condition-induced anxiety attacks before the age of 30.
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 27 | 28 | 29 | 30 | 31 | 1 | 2 | | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 10 | 11 | 12 | 13 | 14 | 15 | 16 | | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | 24 | 25 | 26 | 27 | 28 | 29 | 1 | | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Browse by Category:
Notables:
|