|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Saturday, February 02, 2008
I'm still plugging away at my new concurrency book.
The fact that we have cover art and that it's available for pre-order on Amazon are both great signs that it's getting close to being done:

The full TOC (as of right now) is available here. A few chapters are still in the works, and there's a bit of editing ahead. I've also decided to write Appendices on PFX and CCR. In the meantime, definitely feel free to pick up a PDF of some early content via RoughCuts. Believe it or not, this is a way to provide real-time feedback that can impact the book before it hits the presses.
I have to say this is the most complicated piece of writing I've ever undertaken. Not only does the material require going very deep in a lot of hard areas, but I've noticed that as a community we're learning to speak speak better and to use consistent terminology about various aspects of concurrency. To stay relevant, I'm finding a fair bit of content has had to be reworked several times.
At the same time, and from a selfish standpoint, this book has been a wonderful forcing function to learn everything there is to know about concurrency. Not that this is a realistic goal, mind you, but gosh darnit I'm trying. I'm convinced at this point that the best way to become an expert on something is to try to teach it to other people. And what better way than writing a book? If you can't explain it clearly to a broad, on-the-average-less-expert-than-you audience, you probably have a faulty mental model to begin with. The process of merely trying usually reveals this. I encourage everybody out there to try it. At least once.
 Thursday, January 17, 2008
Most schedulers try to keep the number of runnable threads (within a certain context) equal to the number of processors available. Aside from fairness desires, there’s usually little reason to have more: and in fact, having more can lead to more memory pressure due to the cost of stacks and working set held on the stacks, non-pageable kernel memory, per-thread data structures, etc., and also has execution time costs due to increased pressure on the kernel scheduler, more frequent context switches, and poor locality due to threads being swapped in and out of processors. In extreme cases, blocked threads can build up only for all of them to be awoken and released to wreak havoc on the system at once, hurting scalability.
A naïve approach of one-thread-per-processor works great until a thread on one of these processors blocks, either “silently” as a result of a pagefault or “explicitly” due to IO or a synchronization wait. (I should mention that due to the plethora of hidden blocking calls in the kernel, Win32 user-mode code, and the .NET Framework, a lot of IO and synchronization waiting is “silent” too.) In this case, a processor becomes idle (0% utilization) for some period of time. If there is other work that could be happening instead, this is clearly bad.
Many programs spend most of their time blocked.
Four particular solutions to this problem are commonplace on Windows:
- Create more threads than processors and hope for the best. This trades some amount of runtime efficiency for the insurance that processor time won’t go to waste.
- Periodically poll for blocked threads using some kind of daemon and respond to the presence of one by creating a new thread to execute work. Eventually this thread would go away, for instance when the blocked thread awakens. This is the approach used by the CLR ThreadPool, although it caps the total. (The TPL uses this appraoch today also, but we're changing/augmenting it.) For obvious reasons, this approach is quite flawed: you easily end up with more running threads than processors, have to trade-off more frequent polling--which implies more runtime overhead--with less frequent polling--which adds time to the latency in the scheduler’s response to a blocked thread.
- Block on an IO Completion Port at periodic intervals--e.g. when dispatching a new work item in a ThreadPool-like thing--which has the effect of throttling running threads. This still requires creating more threads than processors, but helps to ensure few of them run at the same time. Unfortunately, it still does lead to more of them actually running than you’d like since the port can only prevent a thread from running when it goes back and blocks on the port in the kernel. But this is only done periodically.
- Specialized systems like SQL have used Fibers in the past to avoid needing full-fledged threads to replace the blocking ones. To do this, they ensure all blocking goes through a cooperative layer, which notifies a user-mode scheduler (UMS). The user-mode scheduler maintains a list of blocked Fibers, but can multiplex runnable Fibers onto threads, keeping the number of threads equal to the number of processors. A thread effectively never blocks, Fibers do, but this requires all blocking to notify the UMS. Aside from extraordinarily closed world systems, this approach doesn’t usually work. That’s because Fibers are not threads and multiplexing entirely different contexts of work onto a shared pool of threads (at blocking points) can easily lead to thread affinity nightmares.
The CLR facilities #4 by funneling all synchronization waits in managed code through one point in the VM codebase. This was done initially to ensure consistent message pumping on STA threads, via CoWaitForMultipleHandles. But it was then exploited in 2.0 to expand the CLR Hosting APIs to enable custom hosts to hook all synchronization calls. This is convenient for building interesting debugging tools, like deadlock detecting hosts.
A fifth approach is often viable and even preferable, and that is to avoid blocking altogether. Often referred to as continuation passing style (CPS), the idea is that, where you’d normally have blocked, the callstack is transformed into a resumable continuation. For an example of this, look at Jeff Richter’s ReaderWriterLockGate class: it’s a reader/writer lock with no blocking. Asynchronous IO is supported by files and sockets on Windows, and enables a similar style of programming. The continuation is ordinarily just a closure object that has enough state to restart itself when the sought-after condition arises. When it does arise, the continuation is scheduled on something like the CLR ThreadPool. This avoids burning any threads while the wait occurs.
For obvious reasons, CPS is usually hard to achieve in .NET: there is no language support for first class continuations in .NET, all synchronization primitives are wait-based, and keeping a whole stack around in memory would be a terrible idea anyway. You’d also need to worry about resources held on the stack, including locks. Instead, you should save only that state which is needed during the continuation. In a message passing system this is much simpler, since most of the program is full of continuations in the form of message handlers. For an example of such a system, check out the Concurrent and Coordination Runtime (CCR) and/or Erlang.
Even in message passing systems, it’s impossible to escape the fundamental blocking issue, since it is platform-wide. And in ordinary imperative programs, the CPS transformation is near-impossible at the leaves of callstacks: unless you have whole program knowledge, who knows what your caller expects? Most APIs are synchronous. Futures and Promises potentially make this style of programming easier, though in the extreme all APIs would need to return a Promise rather than a true value.
Nothing conclusive, just some random thoughts ...
 Wednesday, January 02, 2008
I've been very remiss with blogging lately. This was mostly due to travel (5 weeks out of the past 2 months), but also because I'm focused intensely on the book, and have been super busy working on Parallel Extensions: the December CTP, hiring for our team, planning what we do in the next year, designing stuff, implementing stuff, ... it's been a lot of fun. I've also been trying to learn classical guitar after 15 years of playing electric and some acoustic (metal, rock, blues). It's more challenging than I anticipated... I guess I should have started when I was six.
Somewhere in there, I recorded an episode of .NET Rocks with Carl and Richard about Parallel Extensions: check it out. I haven't listened to it yet, but I distinctly remember having a lot of fun. When the hour was up, I couldn't believe how quickly the time had passed.
Happy New Year, everybody. I promise to blog more in the coming months. Famous last words, eh? ;)
 Thursday, November 29, 2007
Today is an extraordinarily exciting day for me. After about two years of work by several great people across the company, the first Parallel Extensions (a.k.a. Parallel FX) CTP has been posted to MSDN. Check out Soma’s blog post for an overview, and the new MSDN parallel computing dev center for more details. Keep an eye on the team’s new blog too, as we’ll be posting a lot of content there as we make progress on the library; in fact, thanks to Steve (who writes blog posts in his sleep), there’s already a bunch of reading to catch up on!
I began kicking the tires on PLINQ back in October of 2005. In September of 2006, I described PLINQ as “a fully functional prototype” and “research.” Well, it’s come a long way since then, and we’re finally ready for real human beings to start hammering on it. Not only that, but we’ve expanded the scope of the original project significantly, from PLINQ to Parallel FX, to include new imperative data parallel APIs (for situations where expressing a problem in LINQ is unnatural), powerful task APIs that offer waiting and cancelation, all supported by a common work scheduler based on CILK-style work-stealing techniques developed in collaboration with Microsoft Research. And there’s even more to come. Daniel Moth spilled some beans in his screencast on Channel9 when he described the additional data structures, like synchronization primitives and scalable collections, which will come online later. Some of them are even in the new CTP, but have remained internal for now.
The shift to parallel computing will have an industry-wide impact, and will undoubtedly take several phases and many years to tame completely. We have focused on the lowest hanging fruit and the most important foundational shifts in direction we can incite—like encouraging the over-representation of latent parallelism to aid in future scalability—but there are certainly things that the current CTP doesn’t fully address. GPGPUs, verifiable thread safety, automatic parallelism, great tools support, etc., are all topics that are of great interest to us. We have a lot of work to do for the final release of Parallel FX, and expect a whole lot of feedback from the community on specific features and general direction. So let us have it! You can use our Connect site, or even just email me directly at joedu AT you-know-where DOT com.
Consider this an early Christmas present. Now you have something fun to do, in the privacy of your own office, when trying to avoid family members during the holidays. Whoops—did I say that out loud? Enjoy!
 Saturday, November 17, 2007
I recently described an approach to adding immutability to existing, mutability-oriented programming languages such as C#. When motivating such a feature, I alluded to the fact that immutability can make concurrent programming simpler. This seems obvious: a major difficulty in building concurrent systems today is dealing with the ever-changing state of “the world,” requiring synchronization mechanisms to control concurrent reads and writes. This synchronization leads to inefficiencies in the end product, complexity in the design process, and, if not done correctly, bugs: race conditions, deadlocks due to the lack of composability of traditional locking mechanisms, and so forth.
Lock-free algorithms simplify matters (in some ways) by compressing state transitions into single, atomic writes. For instance, a lock-free stack has a single head node; pushing requires swapping the current head with the new node to be enqueued, and popping entails swapping the current head with its current next pointer. I mentioned some of the benefits of lock-freedom here. Sadly these lock-free techniques do not really compose. For instance, if I want to pop from a lock-free stack and push onto another, in an atomic fashion, single-word compare-and-swap is insufficient. We’ll return to this later in the context of immutability.
I will draw an analogy between lock-free algorithms and synchronization involving immutable objects. Imagine an immutable type represents “the world” in a concurrent system. Threads are constantly interacting with the world, by reading its state, and sometimes changing it. Because the world is immutable, we must copy it and publish an entire new one any time we wish our changes to become visible. This is key! It enables optimistic concurrency and synchronization protocols more similar to lock-free algorithms than lock-based algorithms.
Because individual components inside the world can’t change, the entire world can be read from without synchronization. Moreover, the creation of the new world needn’t use synchronization either, so long as we are willing to tolerate the possibility of wasted work, as is always the case with optimistic concurrency: we are being optimistic that the work will indeed not have been wasted, because writes are infrequent. That’s the basic premise of most concurrency algorithms, e.g. reader/writer locks. What requires synchronization is merely the publication of the new world.
internal static World s_theWorld = new World(…); // The initial world. internal void ReadTheWorld() { World w = s_theWorld; // This copy never changes! // Read state from the world … } internal void ChangeTheWorld() { World oldWorld; World newWorld; do { oldWorld = s_theWorld; // Read the world. // Read state to compute the new world … newWorld = new World(…); } while (Interlocked.CompareExchange(ref s_theWorld, newWorld, oldWorld) != oldWorld); }
In this scheme, there will always be inherent races between threads trying to call ReadTheWorld and threads trying to call ChangeTheWorld. This is a basic characteristic of the system. But it is more functional. If ReadTheWorld produces correct answers for any world sequentially, then it will also produce correct answers in a parallel program too, since worlds cannot change. And because writes are guaranteed to retire in order on the CLR 2.0 memory model, we can be assured that ReadTheWorld will not be subject to memory reordering bugs.
This is a very powerful technique, making code that reads from and changes the world much simpler to write, understand, and debug. How many times have you longed for a programming technique where code can be tested in a sequential context and still be guaranteed to produce the right answers in a parallel context?
With all that said, it’s sadly not applicable to every synchronization problem. Some of the problems that arise can be worked around, and others are more difficult. I will outline the most difficult of them.
Problem 1. As with most lock-free algorithms, livelock is a distinct possibility.
Livelock occurs because many threads may attempt to compute a new world simultaneously. If they both read the same old world, only one will succeed at publishing their copy of the new world. The other thread will have to go back ‘round the loop, re-read the current world, compute a new one, and try again. Computing a new world may take some time which, coupled with high arrival times, may lead to unfair forward progress and wasted work due to spinning. We hope, however, that in most cases the frequency of writes is small enough to make this a less pressing issue. Also note that the algorithm shown above is truly wait-free: the failure of one thread indicates that another thread has succeeded in publishing a new world. So forward progress of our system is not compromised by this problem.
Problem 2. ChangeTheWorld may need to perform impure, irreversible side effects.
A nasty issue in today’s programming languages is the reliance on side-effects. (OK, that’s unfair. This is a nasty issue in today’s conventional programming techniques and styles, and the languages we use simply accommodate these techniques. This is mostly because, if we didn’t, developers would find a way to circumvent the system, prefer to use alternative languages, and so on.) But if, corresponding with the update of the world, another side effect must be made, this technique doesn’t accommodate that. We could work around that with other synchronization techniques but it is likely to be cumbersome. In some cases, there is no harm in performing a side effect, so long as it does not “undo” the side effect associated with a newer world.
For instance, say we needed to update a GUI with the results of the new world computation. We would want to ensure the GUI was refreshed with the most recent update. Thankfully we don’t need to worry about the world changing itself, so we may do something like this:
internal void ChangeTheWorld() { … as before … myGuiControl.BeginInvoke(delegate { if (s_theWorld == newWorld) // Update the GUI with newWorld. }); }
This ensures we only update the GUI with the latest world. Imagine the sequence: world A is published, world B is published, we refresh the GUI with world B, we receive the BeginInvoke associated with world A now (out of order, which is perfectly reasonable). We must make sure that the graphical depiction of world A doesn’t overwrite world A. Thankfully the automatic mutual exclusion inherent in GUI programming ensures that our dirty check s_theWorld == newWorld is sufficient to prevent this from occurring. Clearly this technique can’t always apply.
Problem 3. This technique doesn’t expand to support multiple immutable fields.
This is a fundamental flaw with techniques that rely on a single, atomic compare-and-swap. Processor vendors have recently published papers for technologies like transactional memory and multi-word compare-and-swap (and in fact, there is plenty of research into pure software implementations of both), which would allow us to fix this problem.
The title of this article, by the way, is a play on Wadler’s paper ‘Linear types can change the world!’ I highly recommend that paper to anybody interested in immutability, isolation, and purity. This paper demonstrates type system support for linearity which, in essence, enables safe reading and mutual exclusion via the type system instead of locking. There is only one world, after all, the paper argues, so the traditional functional programming approach of disallowing mutability is actually a bit unrealistic.
In summary, immutable types add a lot of simultaneous power and simplicity to parallel programming. They are certainly not a panacea, since they require shifting to a more functional style of programming, where side-effects are discouraged and state is updated by making altered copies. This is often non-trviial. But this is one of many steps in what I consider to be the right direction. A direction that will enable us to tolerate large amounts of parallelism... without causing all of us to die of race-condition-induced anxiety attacks before the age of 30.
 Sunday, November 11, 2007
I’ve been asked a number of times about immutable types support for C#. Although C# doesn't offer first class language support in the way that F# does, you can get pretty far with what you do have in your hands already. Nothing prevents you from creating immutable data structures today, of course, but the problem is that there’s no compiler or runtime support to ensure you’ve done it right.
I just hacked together some new attributes and a handful of FxCop rules as an experiment. I’ve been very happy with the result. Sure it’s not baked into the language, but it’s a start. If there’s any interest, I can make the code available so you can play with it too.
Attributes and analysis
Imagine we had an ImmutableAttribute. Annotating a type with it indicates that objects of that particular type are immutable, i.e. that their state never changes after being constructed. This is great from a concurrency standpoint because it means access to such objects do not require synchronization. This can lead to more efficient code that not only has a higher chance of being correct but is also vastly easier to maintain. Well, what kind of restrictions would such a type be subject to?
1. Immutable types must have only initonly fields.
The first rule takes advantage of existing CLR type system and language support for initonly fields (a.k.a. readonly in C#). Marking a field as initonly ensures it is never written to after the constructor has finished executing. So long as all fields are initonly, the class is effectively already “shallow” immutable.
2. Immutable types must only contain fields of immutable types.
The second rule ensures transitivity, or “deep” immutability. The mutability of a complex object is typically, but not always, comprised of not only its own fields but also the state in the objects it refers to. With this rule and the prior rule, we’re about 90% there.
3. Immutable types should only subclass other immutable types.
To give the appearance that a particular object is immutable, that object’s type must not depend on other types that are mutable, as articulated by the previous rule. The ‘base’ reference is effectively just another field, and so this rule is derived from the previous one. If an immutable type could inherit mutable members and fields, then it wouldn’t really be immutable after all.
4. Mutable types should not subclass immutable types.
Similar to the previous rule, it is also a bad thing if a mutable subtype can override behavior from the subtype, giving the appearance of mutability. Say we have an immutable class IC with a virtual method f, and some mutable subclass MC overrides f to introduce logic that logically mutates the state of an object. Although the rules above are sufficient to ensure that the object is physically immutable, this can circumvent immutability safety through polymorphism. A related piece of advice: public immutable types should be sealed, to prevent outside classes that do not abide by immutability analysis from breaking code which assumes a given type is immutable. Alternatively, virtual members can be eliminated.
5. Immutable types must not leak the ‘this’ reference during construction.
This rule is subtle. Although initonly ensures fields are never written to after construction, these fields may be written to any number of times while an object is actively being constructed. If some code called during construction publishes a reference to the object (e.g. by storing it in a static variable), then other threads might access the object while it still appears to be mutable. They may witness a partially initialized object, an object whose fields are still changing, and so on.
And that’s it! 5 simple rules. It may sound complicated, but the code to perform the static analysis for all but the last one is straightforward and a dozen-or-so lines apiece. A few things are worth mentioning. First, the CLR’s memory model ensures that, after an object is constructed and published, reads of its fields cannot be reordered to break immutability. Additionally, there are many immutable types in the CLR today that are not verified as such. So in my analysis rules, I have hard-coded a set of well known immutable types so that you can use them w/out problem: Object, String, DateTime, TimeSpan, Boolean, Byte, SByte, Int16, Int32, Int64, IntPtr, UInt16, UInt32, UInt64, UIntPtr, Decimal, Double, Single, and ValueType. Ideally if we supported this in the .NET Framework, we'd annotate them.
Impact to imperative programming
Programming in an immutable world is rather tricky. As someone who has done most of his programming for the past 10+ years in C-style languages, I just take for granted that data structures change over time. With immutability, there tends to be a whole lot more copying and functional-style function calling, where data structures are passed as an input argument and the “mutated” copy is returned as an output argument. I’m trying to kick the mutability habit, since I fully believe immutability is one key to being successful with massive degrees of parallelism. And it usually leads to cleaner code too.
But it’s hard. Using something as simple as an array field on an immutable type will fail the above rules, since the CLR’s array types are mutable. I’ll explore building one below, but this probably points to a need for better immutability support in the .NET Framework. It’s not too difficult to imagine providing base classes for common needs when building immutable data structures.
Circumventing the analysis
As you begin to explore immutable types in a bit more depth, you’ll realize there are often cases where immutability-by-cleverness is possible. That is to say, although one or more of the rules above have been violated, the end result still appears to be immutable. I can build an immutable list out of a linked list to avoid depending on CLR arrays, and mark the nodes as immutable, but they must refer to elements stored within the list. Should we require the elements to also be immutable? Perhaps, but perhaps not, depending on whether you consider the state of the list to also include the state of the elements inside it. Usually that wouldn't be the case. And, besides, if we know what we’re doing, we can create an immutable list based on an array anyway, which enables O(1) IList<T>-style random access. We just need to be careful to encapsulate the array object and to never store an element into it post-construction.
To facilitate working around some of the rules in ways that are often necessary, I have provided options on ImmutableAttribute to suppress certain checks. Additionally, there is a MutableAttribute which can mark certain fields to indicate they are not subject to the same restrictions as other fields on an immutable type.
An ImmutableList<T>
As an illustration, here is an ImmutableList<T>. It implements IList<T>, but sadly it must throw exceptions in several circumstances because both IList<T> and ICollection<T> offer methods that are intrinsically mutable. Undoubtedtly there are bugs because I whipped it up quickly and have omitted a lot of needed error checking. I just wanted to give the general idea of how it might be done:
/// <summary> /// A list that has been written to be observationally immutable. A mutable array /// is used as the backing store for the list, but no mutable operations are offered. /// </summary> /// <typeparam name="T">The type of elements contained in the list.</typeparam> [Immutable] public sealed class ImmutableList<T> : IList<T> {
[Mutable] private readonly T[] m_array;
/// <summary> /// Create a new list. /// </summary> public ImmutableList() { m_array = new T[0]; }
/// <summary> /// Create a new list, copying elements from the specified array. /// </summary> /// <param name="arrayToCopy">An array whose contents will be copied.</param> public ImmutableList(T[] arrayToCopy) { m_array = new T[arrayToCopy.Length]; Array.Copy(arrayToCopy, m_array, arrayToCopy.Length); }
/// <summary> /// Create a new list, copying elements from the specified enumerable. /// </summary> /// <param name="enumerableToCopy">An enumerable whose contents will /// be copied.</param> public ImmutableList(IEnumerable<T> enumerableToCopy) { m_array = new List<T>(enumerableToCopy).ToArray(); }
/// <summary> /// Retrieves the immutable count of the list. /// </summary> public int Count { get { return m_array.Length; } }
/// <summary> /// A helper method used below when a mutable method is accessed. Several /// operations on the collections interfaces IList<T> and /// ICollection<T> are mutable, so we cannot support them. We offer /// immutable versions of each. /// </summary> private static void ThrowMutableException(string copyMethod) { throw new InvalidOperationException( String.Format("Cannot mutate an immutable list; " + "see copying method '{0}'", copyMethod)); }
/// <summary> /// Whether the list is read only: because the list is immutable, this /// is always true. /// </summary> public bool IsReadOnly { get { return true; } }
/// <summary> /// Accesses the element at the specified index. Because the list is /// immutable, the setter will always throw an exception. /// </summary> /// <param name="index">The index to access.</param> /// <returns>The element at the specified index.</returns> public T this[int index] { get { return m_array[index]; } set { ThrowMutableException("CopyAndSet"); } }
/// <summary> /// Copies the list and adds a new value at the end. /// </summary> /// <param name="value">The value to add.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndAdd(T value) { T[] newArray = new T[m_array.Length + 1]; m_array.CopyTo(newArray, 0); newArray[m_array.Length] = value; return new ImmutableList<T>(newArray); }
/// <summary> /// Returns a new, cleared (empty) immutable list. /// </summary> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndClear() { return new ImmutableList<T>(new T[0]); }
/// <summary> /// Copies the list and modifies the specific value at the index provided. /// </summary> /// <param name="index">The index whose value is to be changed.</param> /// <param name="item">The value to store at the specified index.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndSet(int index, T item) { T[] newArray = new T[m_array.Length]; m_array.CopyTo(newArray, 0); newArray[index] = item; return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="item">The element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemove(T item) { int index = IndexOf(item); if (index == -1) { throw new ArgumentException("Item not found in list."); }
return CopyAndRemoveAt(index); }
/// <summary> /// Copies the list and removes a particular element. /// </summary> /// <param name="index">The index of the element to remove.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndRemoveAt(int index) { T[] newArray = new T[m_array.Length - 1]; Array.Copy(m_array, newArray, index); Array.Copy(m_array, index + 1, newArray, index, m_array.Length - index - 1); return new ImmutableList<T>(newArray); }
/// <summary> /// Copies the list adn inserts a particular element. /// </summary> /// <param name="index">The index at which to insert an element.</param> /// <param name="item">The element to insert.</param> /// <returns>A modified copy of this list.</returns> public ImmutableList<T> CopyAndInsert(int index, T item) { T[] newArray = new T[m_array.Length + 1]; Array.Copy(m_array, newArray, index); newArray[index] = item; Array.Copy(m_array, index, newArray, index + 1, m_array.Length - index); return new ImmutableList<T>(newArray); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Add(T item) { ThrowMutableException("CopyAndAdd"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void ICollection<T>.Clear() { ThrowMutableException("CopyAndClear"); }
/// <summary> /// Checks whether the specified item is contained in the list. /// </summary> /// <param name="item">The item to search for.</param> /// <returns>True if the item is found, false otherwise.</returns> public bool Contains(T item) { return Array.IndexOf<T>(m_array, item) != -1; }
/// <summary> /// Copies the contents of this list to a destination array. /// </summary> /// <param name="array">The array to copy elements to.</param> /// <param name="index">The index at which copying begins.</param> public void CopyTo(T[] array, int index) { m_array.CopyTo(array, index); }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> public IEnumerator<T> GetEnumerator() { for (int i = 0; i < m_array.Length; i++) { yield return m_array[i]; } }
/// <summary> /// Retrieves an enumerator for the list's collections. /// </summary> /// <returns>An enumerator.</returns> IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<T>)this).GetEnumerator(); }
/// <summary> /// Finds the index of the specified element. /// </summary> /// <param name="item">An item to search for.</param> /// <returns>The index of the item, or -1 if it was not found.</returns> public int IndexOf(T item) { return Array.IndexOf<T>(m_array, item); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.Insert(int index, T item) { ThrowMutableException("CopyAndInsert"); }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> bool ICollection<T>.Remove(T item) { ThrowMutableException("CopyAndRemove"); return false; }
/// <summary> /// This method is unsupported on this type, because it is immutable. /// </summary> void IList<T>.RemoveAt(int index) { ThrowMutableException("CopyAndRemoveAt"); }
}
I won’t spend much time going over this code. Just notice that the type is marked with the ImmutableAttribute, the array field is marked with the MutableAttribute (since it’s not itself an immutable type and would fail the analysis otherwise), and that any operations that modify the list must make a copy of the entire thing.
Summary
This has been an interesting exercise. Through it, I have come to realize that first class immutability in the type system is not such a farfetched dream. The most onerous aspect to it is probably the restrictions it imposes on subclassing in the programming model, effectively bifurcating the type system into those types that are mutable and those types that are immutable. But, in the end, I’m not so sure it’s too bad a problem: interchanging the two seems like a very bad idea anyway.
Feedback on all of this would be appreciated. Do you see it as useful? If you had it, would you use it in your programs today? Do you believe that it is one step needed (of many!) to bring us towards a world in which building concurrent programs is simpler?
 Saturday, November 10, 2007
There are several docs out there that describe the CLR memory, most notably this article.
When describing the model, one can either use acquire/release, barrier/fence, or happens-before terminology. They all acheive the same goal, so I will simply choose one, acquire/release: an acquire operation means no loads or stores may move before it, and a release operation means no loads or stores may move after it. I can explain it with such simple terms because the CLR is homogeneous in the kinds of operations it permits or disallows to cross such a barrier, e.g. there's never a case where loads may cross such a chasm but stores may not.
Despite the great article referenced above, I find that it’s still not entirely straightforward. It is important to code to a well-understood abstract model when writing lock-free code. For reference, here are the rules as I have come to understand them stated as simply as I can:
- Rule 1: Data dependence among loads and stores is never violated.
- Rule 2: All stores have release semantics, i.e. no load or store may move after one.
- Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.
- Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).
- Rule 5: Loads and stores to the heap may never be introduced.
- Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.
Note that by this definition, non-volatile loads are not required to have any sort of barrier associated with them. So loads may be freely reordered, and writes may move after them (though not before, due to Rule 2). With this model, the only true case where you’d truly need the strength of a full-barrier provided by Rule 4 is to prevent reordering in the case where a store is followed by a volatile load. Without the barrier, the instructions may reorder.
It is unfortunate that we’ve never gone to the level of detail and thoroughness the Java memory model folks have gone to. We have constructed our model over years of informal work and design-by-example, but something about the JMM approach is far more attractive. Lastly, what I’ve described applies to the implemented memory model, and not to what was specified in ECMA. So this is apt to change from one implementation to the next. I have no idea what Mono implements, for example.
 Friday, November 09, 2007
Lock-free algorithms are often “better” than lock-based algorithms:
- They are, by definition, wait-free, ensuring threads never block.
- State transitions are atomic such that failure at any point will not corrupt the data structure.
- Because threads never block, they typically lead to greater throughput, as the granularity of synchronization is a single atomic write or compare-exchange.
- In some cases, lock-free algorithms incur fewer total synchronizing writes (e.g. interlocked ops) and thus can be cheaper from a pure performance standpoint.
But lock-freedom is not a panacea. I’ve gained a lot more experience using lock-free algorithms in the past 3 years: first, when working on memory model improvements for Whidbey, and recently during the implementation of the ParallelFX library and as I write new content for my book.
There are some obvious drawbacks:
- The use of optimistic concurrency can lead to livelock for hot data structures.
- The code is significantly harder to test. Usually its correctness hinges on a correct interpretation of the target machine’s memory model –in my case, the .NET memory model (the topic of an upcoming post)—which can be misinterpreted and is hard to validate. Moreover, because the most popular hardware is stronger than the lesser popular hardware (e.g. X86 vs. IA64), testing needs to explicitly focus on the esoteric hardware to expose races.
- The code is significantly harder to write and maintain, for many of the same reasons.
I have learned about a less obvious drawback the hard way. When initially implementing a certain data structure, you may make a bunch of assumptions about the use cases your class needs to accommodate. And you may actually succeed in writing an implementation that is correct given those assumptions. But over time, as new use cases are discovered, it is much harder to retrofit the code and revalidate that the lock-free algorithms are still correct given the new assumptions. There is no magic oracle that says: hey, adding feature X is going to invalidate assumption Y over there, creating a memory reordering bug.
In several recent cases, I’ve discovered such problems, and dealt with them one-by-one, usually by adding additional memory barriers. As the numbers of memory barriers increase (roughly ½ the cycle-cost of acquiring and releasing a lock), however, the benefits of lock-free algorithms begin to dwindle. It’s easy to begin with an algorithm that scales and performs nicely, and over time add a memory barrier here and there, and eventually end up with something that performs worse than the lock-based equivalent. Unfortunately, this threshold isn’t always obvious, so you can end up with a real mess on your hands: a buggy, impossible to test, and difficult to understand hunk of code. All the drawbacks of lock-free code with none of the benefits. Whoops.
The moral of the story? Be careful and conservative in your use of lock-free code. There are many well-known published lock-free algorithms, and it’s usually a good idea to stick to them, if you use lock-free code at all. When in doubt, just use locks. Truthfully, they are hard enough.
 Wednesday, October 17, 2007
For those who haven't heard yet, F# is joining the "official" .NET language family. As Soma says, "[t]his is one of the best things that has happened at Microsoft since we created Microsoft Research over 15 years ago." I am just as excited as he is. A huge congrats to Don. He's been working super hard to build and evangelize F# for several years now, and getting the attention of a huge division like DevDiv isn't an easy thing, but in the end sanity does prevail. As a blanket statement, the more functional .NET programmers' brains become, the more latent parallelism the programs they write will contain.
 Saturday, October 13, 2007
Charles from Channel9 recorded a conversation with Anders and me a couple weeks back. The topic? Concurrency. More specifically, Parallel FX (PFX):
Programming in the Age of Concurrency: Concurrent Programming with PFX
Microsoft is developing a number of technologies to simplify the expression of parallelism in code. An example of this work is Parallel Extensions for the .NET Framework (PFX), a managed programming model for data parallelism, task parallelism, scheduling, and coordination on parallel hardware.
PFX makes it easier for developers to write programs that take advantage of parallel hardware (you've all heard of multi-core and what the future holds with many-core...), without having to deal with the complexities of threads and locks in today’s concurrent programming story
We don't go too deep, but you can bet we'll be doing more of these things as the technology matures and gets closer to general availability. Enjoy!
(Note: you may also be interested in Stephen Toub's PFX interview with Scott Hanselman, available here.)
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 27 | 28 | 29 | 30 | 31 | 1 | 2 | | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 10 | 11 | 12 | 13 | 14 | 15 | 16 | | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | 24 | 25 | 26 | 27 | 28 | 29 | 1 | | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Browse by Category:
Notables:
|