|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Thursday, July 01, 2010
In .NET today, readonly/initonly-ness is in the eye of the provider. Not the beholder.
Although both C# and the CLR verifier go to great pains to ensure you don't change a readonly/initonly field outside of its constructor (or class constructor, in the case of a static field), this guarantee doesn't imply what you might imagine. It means what it says: you can't change such fields except for in certain contexts.
If you try, C# won't let you, including forming byrefs to them:
v.cs(0,0): error CS0191: A readonly field cannot be assigned to (except in a constructor or a variable initializer)
v.cs(0,0): error CS0192: A readonly field cannot be passed ref or out (except in a constructor)
v.cs(0,0): error CS0198: A static readonly field cannot be assigned to (except in a static constructor or a variable initializer)
v.cs(0,0): error CS0199: A static readonly field cannot be passed ref or out (except in a static constructor)
And neither will the CLR verifier:
[IL]: Error: [c:\v.exe : C::Main][offset 0x00000001] Cannot change initonly
field outside its .ctor.
Of course, attempting to invoke an operation on a readonly struct will make a defensive copy locally, and invoke the method against that. This ensures the readonly contents cannot change.
One unfortunate hole in this safety is with unions. You do not need unsafe code to break readonly, and yet the effect is the same as with an unverifiable program that writes to a readonly field:
struct S1 {
public readonly int X;
}
struct S2 {
public int X;
}
[StructLayout(LayoutKind.Explicit)]
struct S3 {
[FieldOffset(0)]
public S1 A;
[FieldOffset(0)]
public S2 B;
}
Now we can change A.X via B.X, even though A.X is supposedly readonly:
S3 s3 = ...;
int x = s3.A.X;
s3.B.X++;
ASSERT(x == s3.A.X); // false; it is +1
The same would have been true even if the field S3.A was marked readonly.
This is quite an evil trick. I have to be honest that I believe this is a CIL verification hole, and should produce unverifiable MSIL much like when you try to overlay structs containing overlapping GC references. Nevertheless, it is what it is.
Let's step back. Why does all of this matter, anyway, and what guarantees were we hoping that readonly would provide?
It would be ideal, I assert, if the guarantee was not just "the target field can only be written to in the constructor", but also "the target field, once read, cannot be observed with a different value later on". This would not be true during construction, but we'd like to say it holds at all other times.
The above example throws a wrench in this idea. As does the following example. But this new example will be more disturbing, because the solution is not a simple verifier change.
What would you expect this program to print to the console?
struct S {
public readonly int X;
public S(int x) { X = x; }
public void MultiplyInto(int c, out S target) {
System.Console.WriteLine(this.X);
target = new S(X * c);
System.Console.WriteLine(this.X); // same? it is, after all, readonly.
}
}
S s = new S(42);
s.MultiplyInto(10, out s);
As you may or may not have guessed, the output is "42" followed by "420". Yes, the value of 'this.X' changes after we have assigned to 'target' inside MultiplyTo, because the caller aliases the out-param with the 'this' param. Recall that parameter passing for structs in C# is done byref, so that these two references actually physically point to the same location when that call is made. The assignment to 'target', therefore, actually replaces the entire contents of 'this' all at once. And hence this gives the illusion that readonly fields are shifting.
You might be tempted to say that this can be prevented with alias analysis. But this is deceptively difficult to do. Consider this more complicated example:
class C {
public struct S S;
}
void M1(C c) {
M2(c, out c.S);
}
void M2(C c, out S s) {
c.S.MultiplyInto(10, out s);
}
It is in no way clear inside M2 that the two aliases refer to the same location. The aliasing occurred higher up in the stack. Although byrefs are restricted to stack-only passing, making the necessary alias analysis tantalizingly close to attainable, it is nontrivial to say the least. Presumably we would have had to have blocked the forming of the byref within M1, rather than its use within M2. We could fall back to runtime checks, but that is also unfortunate for numerous reasons.
The moral of the story? Structs as containers of readonly values are not to be trusted, at least not for situations that call for bulletproof safety, such as caching values in the compiler rather than rereading them, because the fields are readonly. Although C# and the CLR do a good job at verifying readonly/initonly are done right at the initialization site, there are still places where these guarantees break down. Thankfully the byref aliasing problem does not threaten thread-safety, but the union problem does. And in conclusion, I do have to imagine all of this will get fixed somewhere down the road, it's just a matter of when and where.
 Sunday, June 27, 2010
Partially-constructed objects are a constant source of difficulty in object-oriented systems. These are objects whose construction has not completed in its entirety prior to another piece of code trying to use them. Such uses are often error-prone, because the object in question is likely not in a consistent state. Because this situation is comparatively rare in the wild, however, most people (safely) ignore or remain ignorant of the problem. Until one day they get bitten.
Not only are partially-constructed objects a source of consternation for everyday programmers, they are also a challenge for language designers wanting to provide guarantees around invariants, immutability and concurrency-safety, and non-nullability. We shall see examples below why this is true. The world would be better off if partially-constructed objects did not exist. Thankfully there is some interesting prior art that moves us in this direction from which to learn.
Seeing such a beast in the wild
In what situations might you see a partially-constructed object? There are two common ones in C++ and C#:
- ‘this’ is leaked out of a constructor to some code that assumes the object has been initialized.
- A failure partway through an object’s construction leads to its destructor or finalizer running against a partially-constructed object.
In the first case, the rule of thumb is “don’t do that.” This is easier said than done. The second case, on the other hand, is a fact of life, and the rule of thumb is “tread with care, and be intentional.” Let’s examine both more closely.
The evils of leaking ‘this’
Leaking ‘this’ during construction to code that expects to see a fully-initialized object is a terrible practice. Before moving on, it’s important to remember initialization order in C++ and C#: base constructors run first, and then more derived constructors. If I have E subclasses D subclasses C, then constructing an instance of E will run C’s constructor, and then D’s, and then lastly E’s. Destructors in C++, of course, run in the reverse order.
Member initializers, on the other hand, run in different orders in C++ versus C#. In C#, they run from most derived first, to least derived. So in the above situation, E’s initializers run, and then D’s, and then C’s. This happens fully before running the ad-hoc constructor code. In C++, however, member initializers run alongside the ordinary construction process. C’s member initializers run just before C’s ad-hoc construction code, and then D’s, and then E’s. Another difference is that C#’s initializers cannot access ‘this’, whereas C++’s initializers can.
For example, this C# program will print E_init, D_init, C_init, C_ctor, D_ctor, and then E_ctor:
using System;
class C {
int x = M();
public C() {
Console.WriteLine("C_ctor");
}
private static int M() {
Console.WriteLine("C_init");
return 42;
}
}
class D : C {
int x = M();
public D() : base() {
Console.WriteLine("D_ctor");
}
private static int M() {
Console.WriteLine("D_init");
return 42;
}
}
class E : D {
int x = M();
public E() : base() {
Console.WriteLine("E_ctor");
}
private static int M() {
Console.WriteLine("E_init");
return 42;
}
}
class Program {
public static void Main() {
new E();
}
}
And this C++ program will print C_init, C_ctor, D_init, D_ctor, E_init, E_ctor, ~E, ~D, and finally ~C:
#include
using namespace std;
struct C {
int x;
C() : x(M()) { cout << "C_ctor" << endl; }
~C() { cout << "~C" << endl; }
static int M() { cout << "C_init" << endl; return 42; }
};
struct D : C {
int x;
D(): x(M()) { cout << "D_ctor" << endl; }
~D() { cout << "~D" << endl; }
static int M() { cout << "D_init" << endl; return 42; }
};
struct E : D {
int x;
E() : x(M()) { cout << "E_ctor" << endl; }
~E() { cout << "~E" << endl; }
static int M() { cout << "E_init" << endl; return 42; }
};
static void main() {
E e;
}
It’s interesting to note that the CLR allows constructor chaining to happen in any order. The C# compiler emits the calls to base as the first thing a constructor does, but other languages can choose to do differently. The verifier ensures that a call has occurred somewhere in the constructor body before returning.
This IL program, for example, will print E, D, and then C – the reverse of what C# gives us:
.assembly extern mscorlib { }
.assembly ctor { }
.class C {
.method public specialname rtspecialname instance void .ctor() cil managed {
ldstr "C"
call void [mscorlib]System.Console::WriteLine(string)
ldarg.0
call instance void [mscorlib]System.Object::.ctor()
ret
}
}
.class D extends C {
.method public specialname rtspecialname instance void .ctor() cil managed {
ldstr "D"
call void [mscorlib]System.Console::WriteLine(string)
ldarg.0
call instance void C::.ctor()
ret
}
}
.class E extends D {
.method public specialname rtspecialname instance void .ctor() cil managed {
ldstr "E"
call void [mscorlib]System.Console::WriteLine(string)
ldarg.0
call instance void D::.ctor()
ret
}
}
.class Program {
.method public static void Main() cil managed {
.entrypoint
newobj instance void E::.ctor()
pop
ret
}
}
So why is leaking ‘this’ bad, anyway?
Say you’ve decided in the implementation of D’s constructor that you would like to stick ‘this’ into a global hash-map. Doing this sadly means other code could grab the pointer and begin accessing it before E’s constructor has even run. That is a race at-best and a ticking time-bomb in all likelihood. For example:
class C {
public static Dictionary s_globalLookup;
private readonly int m_key;
public C(int key) {
m_key = key;
s_globalLookup.Add(key, this);
}
}
Even though we have taken great care to initialize our readonly field m_key before sticking ‘this’ into a dictionary, any subclasses will not have been initialized at this point. Only if C is sealed can we be assured of this. Another piece of code that grabs the element out of the hashtable and begins calling virtual methods on it, for example, is in a race with the completion of the initialization code for subclasses.
Leaking ‘this’ isn’t always such an explicit act. Merely invoking a virtual method within the constructor may dispatch a more derived class’s override before the more derived class’s constructor has run. And therefore its state is most likely not in a place conducive to correct execution of that override. It is fairly common knowledge that invoking virtual methods during construction is an extraordinarily poor practice, and best avoided.
Failure to construct
Let’s move on to the second issue. Imagine we suffer an exception during construction of an object. Perhaps this is due to a failure to allocate resources, or maybe even due to argument validation. It is clear that a leaked ‘this’ object in such cases will be a problem, because the object will have escaped into the wild even though its initialization failed. Subsequent attempts to use the object will obviously pose problems. What is more subtle is that if the class in question declares a destructor (in C++) or finalizer (in C#), a problem may now be lurking.
Let’s say we have the original situation shown above: C derives from D derives from E. Now say an exception happens within D’s constructor. At this point in time, C’s constructor has run to completion, D’s constructor has run partially up to the point of failure, and E’s has not run at all. (And, of course, in the case of C#, all member-initializers for all classes have actually run.) What happens to the cleanup code?
In C++, only constructors that have run will have their corresponding destructors executed. In the above situation, where C, D, and E each declares a destructor, only C’s will be run during stack unwind. It is imperative, therefore, that D handles failure within its constructor rather than relying on the destructor. For example:
class D : C {
int* m_pBuf1;
int* m_pBuf2;
public:
D() {
m_pBuf1 = … alloc …;
m_pBuf2 = … alloc …;
}
~D() {
if (m_pBuf2) … free …;
if (m_pBuf1) … free …;
}
}
If the allocation destined for m_pBuf2 fails by throwing an exception, the destructor for D will not run, and therefore m_pBuf1 will leak. The C++ solution to this particular example is to use smart pointers and member initializers for the allocations, because successfully initialized members do get destructed, even when the constructor (or indeed one of the member initializers) subsequently fails. This means that destructors for a particular class do not have to tolerate that class’s state not having been fully constructed, because those destructors will never run. Destructors must not, however, invoke virtuals, for two obvious reasons: (a) subclasses may not have ever been initialized, and (b) destructors run in reverse order, and so the destruction code for the subclass will have already run by the time a baseclass’s destructor runs.
In C#, finalizers will run, regardless of whether an object’s constructor ran fully, partway through, or not at all. If the object is allocated – and so long as GC.SuppressFinalize isn’t called on it – the finalizer runs. This distinctly means that C# finalizers must always be resilient to partially-constructed objects (unlike C++ destructors). Thankfully finalizers are a rare bird, and therefore this issue is seldom even noticed by .NET programmers.
This issue does not arise in the case of .NET’s IDisposable pattern. If a constructor throws, the assignment to the target local variable does not occur. And therefore the variable enclosed in, say, a using statement remains null. This means that there is no way to possibly invoke Dispose on the object. Moreover, the allocation in using occurs prior to entering the try/finally block. Hence, you really had better be writing constructors that don’t throw and/or protecting such resources with smart-pointer-like things with finalizers, a la SafeHandle.
Impediments to language support
As if these weren’t sufficient cause for concern, I also mentioned earlier – and somewhat vaguely – that partially-constructed objects interfere with language designers’ efforts to add invariants, immutability and concurrency-safety, and non-nullability to the language. And all of these are important properties to consider in our present age of complexity and concurrency, so this point is worth understanding more deeply. Let’s look at each in-order.
Invariants and safe-points
A partially-constructed object obviously may have broken invariants. By definition, invariants are meant to hold at the end of construction, and so if construction never completes, the rules of engagement are being broken.
Imagine, for example:
class C {
int m_x;
int m_y;
invariant m_x < m_y;
public C(int a) {
m_x = a;
m_y = a + 1;
}
}
It is ordinarily very difficult to ensure that each instruction atomically transitions the state of an object from one invariant safe-point to another. A common technique is to define well-defined points at which invariants must hold. We might model each failure point as one such technique. But even in C#, the above program does not satisfy this constraint, because an overflow exception may be generated at the ‘m_y = a + 1’ line. Or a thread-abort exception may be generated right in the middle of those two functions. Or, if addition were implemented as a JIT helper, an out-of-memory exception could get generated due to failure to JIT the helper function.
In such cases, we’d like to say that the object does not exist. But the sad fact is that the object *does* exist, and if the object has acquired physical resources at the time of failure to construct, we must compensate and reclaim them. The ideal world looks a lot like object construction as transactions, where the end of construction is the commit-point. The state-of-the-art is very different from this, however, and so any static verification and theorem proving that depends on invariants on an object holding, well, invariantly, is subject to being broken by partially-constructed objects.
Immutability… or not
Immutability is also threatened by partially-constructed objects. Immutability is a one of many first class techniques for solving concurrency-safety in the language, so this one is quite unfortunate.
In C#, for example, we might be tempted to say that a shallowly immutable type is one whose fields are all marked readonly. (And a deeply immutable type is one whose fields are all readonly, and also refer to types that are immutable.) A readonly field cannot be written to after construction has completed. Unfortunately, if the ‘this’ leaks out during construction, we may see those readonly fields in an uninitialized or even changing state:
class C {
public static C s_c;
readonly int m_x;
public C() {
m_x = 42;
s_c = this;
while (true) {
++m_x;
}
}
}
This is quite evidently a terrible and malicious program. C appears to be immutable, because it only contains readonly fields, but is quite clearly not, because the value of m_x is assigned to multiple times. If we had a guarantee that all readonly fields were definitely assigned once-and-only-once before ‘this’ can escape, then we’d be close to a solution. But of course we have no such guarantee. In C#, at least.
A related issue is co-initialization of objects. This is interesting and relevant, because in such cases we actually want to leak out partially-constructed objects. Imagine we’d like to build a cyclic graph comprised of two nodes, A and B, each referring to the other. With a naïve approach to immutability, we simply cannot make this work. Either A must first refer to B, in which case A refers to a partially-constructed B; or B must first refer to A, in which case B refers to a partially-constructed A. Both are equally as bad. The two assignments are not atomic.
Cyclic data structures are a commonly cited weakness of immutability, and an argument in favor of supporting partially-instantiated objects in a first class way, although there are approaches that can work. One example is to separate edges from nodes, and represent them with different data structures. We can then build the nodes A and B, and then build the edges A->B and B->A without needing to use cycles.
It’s not-null, or at least it wasn’t supposed to be
Tony Hoare called it his billion-dollar mistake: the introduction of nulls into a programming language. I think he sells himself short, however, because the absence of nulls in an imperative programming language – however worthy a pursuit – is actually a notoriously difficult to attain.
Spec# is one example of a C-style language with non-nullability, in which T! represents a “non-null T”, for any T. Although this is done in a pleasant way conducive to C#-compatibility – a significant goal of Spec# -- I’d personally prefer to see the polarity switched: T would mean “non-null T” and T? would mean “nullable T”, for any T, reference- and value-types included. This is much more like Haskell’s Maybe monad, and is the syntax I’ll use below for illustration purposes. But I digress.
Non-nullability is a wonderful invention, because it is common for 75% or more of the contracts and assertions in modern programs to be about a pointer being non-null prior to dereferencing it, both in C# and in C++. Relying on the type-system to prove the absence of nulls is one big step towards creating programs that are robust and correct out-of-the-gate, particularly for systems software where such degrees of reliability are important. And it cuts down on all those boilerplate contracts sprinkled throughout a program. Instead of:
void M(C c, D d, E e)
requires c != null, d != null, e != null
{
… use(c, d, e) …
}
You simply say:
void M(C c, D d, E e)
{
… use (c, d, e) …
}
No opportunity to miss one, and no need for runtime checks. It’s beautiful.
A problem quickly arises, however, having to do with partially-constructed objects. All of an object’s fields cannot possibly be non-null while the constructor is executing, because the object has been zero’d out and the assignments to its fields have not yet been made. Clearly constructor code needs to be treated “differently” somehow. We cannot simply live with the fact that ‘this’ escaping leads to a partially-constructed object leaking out into the program, because that could lead to serious errors. These serious errors include potential security holes, if unsafe code is manipulating the supposedly non-null pointer. One advantage to adding non-nullability is that runtime null checks can be omitted, because the type system guarantees the absence of nulls in certain places. In this situation, partially-constructed objects lead to holes in our nice type system support. Either the dynamic non-null checks are required as back-stop, or we’ll need some other coping technique.
There are related issues with non-nullability, like with partially-populated arrays. Imagine we’d like to allocate an array of 1M elements of type T, and we will proceed to populate those elements following the array’s allocation. There’s clearly a window of time during which the array contains 1M nulls, and then 1M-1 nulls, …, and if we finish 1M-1M nulls, i.e. 0 nulls. It is only at that last transition that the array can be considered to contain non-null T’s. The standard technique is to use an explicit dynamic conversion check, or to force the creation of the list to supply all of the elements of the array at construction time.
Coping techniques
There are, thankfully, some interesting ways to cope with partially-constructed objects. There is, in fact, a spectrum of techniques, ranging from escape analysis in various forms, to limitations around how objects are constructed such that a partially-constructed one can never leak, to automatic insertion of dynamic checks to prevent the use of partially-constructed objects, to static annotations that treat partially-constructed instances as first class things in the type system. And of course there’s always the technique of “deal with it”, which is the one that most C++-style languages have chosen, including our beloved C#.
The F# approach: restrictions and dynamic checks
F#, it turns out, does a very good job in preventing partially-initialized objects. A first important step is that fields in F# are readonly by-default, unless you opt-in to mutability using the mutable keyword. Therefore data structures are mostly immutable. And the construction rules are meant to make it very unlikely that you’ll expose a partially-constructed object during construction. How so? It’s simple: such fields must be initialized prior to running ad-hoc construction code, and if you attempt to initialize them multiple times, the compiler supplies an error. In other words, you really have to work hard to screw yourself, unlike C++ and C# where it’s very easy.
It’s of course possible to do some dirty tricks to publish or access a partially-initialized object, despite needing to work very hard at it. There is, however, a nice surprise awaiting us when we try. For example:
type C() =
abstract member virt : unit -> unit
default this.virt() = ()
type D() as this =
inherit C()
do this.virt()
type E =
inherit D
val x : int
new() = { x = 42; }
override this.virt() = printf "x: %d" this.x
let e = new E()
This example attempts to perform a virtual invocation from C before the more derived class has been fully initialized. This overridden virtual simply (attempts to) prints out the value of x. If we compile and run this program, however, we will notice that we get an exception: “InvalidOperationException: the initialization of an object or value resulted in an object or value being accessed recursively before it was fully initialized.”
Pretty neat. The compiler will stick in the checks necessary when ‘this’ is being accessed, to dynamically verify that an object is not being leaked before having been fully initialized. The F# approach can be summed up as trying to make things airtight as possible statically at compile-time, but admitting that there are holes – primarily due to inheritance – and dealing with them by inserting dynamic runtime checks. This tradeoff clearly makes sense for F#, because it is attempting to attain a robust level of reliability around immutability.
F# also adds non-nullability for the most part. Like Haskell’s Maybe monad, F# adds an option type that can store a single None value which lies outside of the underlying type’s domain to effectively represent null. Because F# is a .NET language it of course also needs to worry about nulls at interop boundaries with other languages like C# and VB. This is a great step forward; first class CLI support would be a nice next step.
A slight variant of the F# idea is to initialize data up the whole class hierarchy in one pass, and then run ad-hoc construction code in a second pass in the usual way. So long as readonly data can be initialized without running the ad-hoc construction code, this helps to statistically cut down on the chances for accidental leaking of ‘this’.
Type system: T versus notconstructed-T
We can model two kinds of T in the type-system: T and notconstructed-T. The constructor for any type T would then see the ‘this’ pointer as an notconstructed-T, and everything else – by default – sees ordinary T’s.
What good does this distinction do? It enables us to add verification and restrictions around the use of notconstructed-T’s and limitations to the conversion between the two types. See this paper by Manuel Fahndrich and Rustan Leino for an example of how this approach was taken in Spec#’s non-nullability work.
For example, we can prohibit conversion between T and notconstructed-T altogether, thereby disallowing escaping ‘this’ references altogether. If the type of ‘this’ within a constructor is different than all other references to type T, and they are not convertible, we’ve successfully walled the problem off in the type system. This protects against erroneous method calls, so that a constructor could not call any of its own methods, because these methods expect an ordinary T whereas the constructor only has a notconstructed-T. And because you cannot state notconstructed-T in the language, you cannot let one leak by storing it into a field.
We could add more sophisticated support, by allowing a programmer to explicitly tag certain non-field references as T-notconstructed. This makes the concept first-class in the language, and allows one to explicitly declare the fact that code is interacting with a partially-constructed T:
class C
{
int m_x;
public C() {
m_x = V();
}
protected virtual int V() notconstructed {
… I know to be careful …
}
}
In this example, the programmer has annotated V with ‘notconstructed’. This enables the call from the constructor because the method’s ‘this’ is an uninitialized-T, and serves as a warning to the programmer that he or she should take care, much like the code written inside a constructor.
We must also decide whether fields are offlimits via notconstructed-T’s. If yes, we can add F#-style dynamic checks for initialization, but only for attempted accesses against notconstructed object’s fields. This is nice because it means the scope of dynamic checks are limited, and used in a pay-for-play manner. And we could even enable a programmer to sidestep the error by stating at the use-site that they understand a particular field access may be to uninitialized memory, like Field.ReadMaybeUninitialized(&m_field).
To be honest, the reason this approach has likely not yet seen widespread use is that the cost is not commensurate with the benefit. At least, in my opinion. To make something like partially-constructed objects a first-class concept in a programming language, programmers would need to want to know where they are dealing with them. For systems programmers, this makes sense. For many other programmers, it would be useless ceremony with no perceived value. And yet the initial approach where nothing new needed to be stated, but yet escaping ‘this’ was prevented, blocks certain patterns of legal use. This is where theory and practice run up against one another. There is, however, presumably a nice middle ground awaiting discovery.
Winding down
I meant for this to be a short post. But the topic really is fascinating, and has been coming up time and time again as we do language work here at Microsoft. But it is truly fascinating mainly because, like nulls, the problem is widespread yet tolerable, and most C++ and C# programmers learn the rules and make do. Partially-constructed objects are a major blemish, but not a crisis that threatens the complete abandonment of imperative programming.
I do believe language trends indicate that more will move away from C++-style object initialization and related issues, and more towards immutability and treating data and its initialization separately from imperative ad-hoc initialization code. Haskell, F#, and Clojure, for example, show us some promising paths forward. There are a plethora of other attempts at solving related problems, and I unfortunately could only scratch the surface.
Although these techniques are not new, the primary question – to me – is how close to “the metal” in systems programming these concepts can be made to work. Typically for those situations, you need to rely on a mixture of static verification and complete freedom, because the dynamic checking is too costly and the code to work around overly-limiting static verification also adds too much cost. But as soon as you add complete freedom into the picture, you run into partially-constructed objects as a consequence, and all the issues I’ve mentioned above.
Anyway, I hope it was interesting.
 Sunday, May 16, 2010
My article about Transactional Memory (TM) was picked up by a few news feeds recently.
If I had known this would occur, I would have written it with greater precision. Because my article presents a mixture of technical challenges interspersed among more subjective and cultural issues, I am sure it is difficult to tease out my intended conclusion. To summarize, I merely believe adding TM to a shared memory architecture alone is insufficient.
Indeed, I remain a big fan of transactions. Atomicity, consistency, and isolation, and coming up with strategies for achieving all three in tandem, are part and parcel to architecting software.
After watching Barbara Liskov's OOPSLA Turing Award reprise, I decided to reacquaint myself with some old Argus papers this weekend. It has been some time since I last read them. Argus was Liskov's language for distributed programming and her follow-on to CLU. As with most research done by brilliant people, the work was way ahead of its time, has appeared in ad-hoc incarnations and permutations over time, and remains relevant today. This research is particularly interesting to work that my team is doing right now, especially its notion of guardians. And it is relevant to the TM discussion too.
The Argus approach of using isolation to coarsely partition state and operations into independent bubbles, and then communicating asynchronously between the so-called guardians that are responsible for this state, is an architecture that is common among most highly concurrent programs. This aids state management and fault tolerance. Argus makes an interesting observation that, although guardians may be sent messages concurrently -- and indeed activities themselves may even introduce local concurrency -- manipulation of state can be done safely and even in parallel thanks to transactions.
The requirement is that messages are atomic and commute. Transactions, it turns out, are a convenient way of implementing this requirement.
You will observe a similar architecture in other places, including in some languages that have adopted TM. Haskell has moved in this direction. Everything is purely functional and so, of course, no state is mutated in an unsafe way by default. However, with the introduction of concucrrency comes mutable cells for message passing and with parallelism comes indeterminism. You can push the state management problem up indefinitely, but at the top there are almost always mutable operations on real-world state (even if it is "just I/O"). Haskell programs have a safe architecture to begin with, and it is the intentional and careful addition of specific facilities that forces one to focus on the problematic seams. One could say that Haskell starts clean and stays clean, versus most shared memory-based languages which start dirty and try to attain cleanliness (at least when it comes to concurrency).
Why aren't transactions sufficient, then, given that the I in ACID stands for Isolation? You wouldn't model a database as one flat table in which each row is a single byte, however, would you? As you begin to decompose your program into isolated state, your bubbles (or guardians) are the tables, and your objects are the rows. This is just an analogy but I find it useful to think in these terms. Taking a bunch of intermingled state and pouring transactions on top is not going to give you this nicely partitioned separation of state which has proven to be the lifeblood of concurrency.
Even once you've attained a more isolated architecture, however, transactions are not a panacea. They are just one of many viable state management techniques in a programmer's arsenal, hierarchical state machines being another notable example. And in fact, many of the problems I mentioned in the TM article are still worrisome even when you start from the right place. From within a guardian, you may wish to enlist the aid of another unrelated guardian to perform a coordinated atomic activity, because a higher level invariant relationship between them must be preserved. Or an application which composes multiple guardians may wish to do so atomically. Even Argus required manual compensation for such things. This can be solved in part by DTC. But experience suggests that continuing to push the enlistment scope one level higher eventually leads to substantial problems. A topic for another day, I suppose.
My primary conclusion is that TM is a great complement to highly concurrent programs, but only so long as you start from the right place. The Argus and Haskell approaches are conducive to large-scale concurrency, but it is primarily because of the natural isolation those models provide; the addition of transactions address problems that remain after taking that step. But without that first step, they would have gotten nowhere.
 Sunday, April 25, 2010
We use static analysis very heavily in my project here at Microsoft, as a way of finding bugs and/or enforcing policies that would have otherwise gone unenforced. Many of the analyses we rely on are in fact minor extensions to the CLR type system, and verge on “effect typing”, an intriguing branch of type systems research that has matured significantly over the years.
Many of these annotations are done on methods, rather than types. A few examples include:
- [MayBlock] indicates that a method is free to call methods that might block.
- [NoAllocations] indicates that a method is neither allowed to allocate, nor call another method that might allocate
- [Throws(...)] indicates that a method is allowed to throw an exception of a type in the set { … }, or call other methods that may throw exceptions in the set { … }.
- And so on.
It turns out there’s a general way for handling these annotations. And indeed, you will quickly find that pursuing ad-hoc solutions to each independently leads to troubles. We shall briefly look at the generalization.
We must first observe that each falls into one of two categories: additive or subtractive. MayBlock and Throws are additive. They say what is permitted. NoAllocations, on the other hand, is subtractive, because the annotation declares what is not permitted. This distinction, we shall see, is crucial.
First we can imagine that each distinct effect shown above has a distinct effect type.
The types EMayBlock, ENoAllocations, and EThrows correspond to the annotations above. This will permit us to model effects using subtyping polymorphism. We will use the usual notation, i.e. “S <: T” means “S is a subtype of T”, or “a S is substitutable in place of a T”. For example, String <: Object. Throws is special, because it has a type hierarchy of its own beneath the root type. As you might guess, this hierarchy is infinite in size and is comprised of each possible permutation of exception types.
There are two special kinds of effects: the null effect (ENil), and a set of other effects (EMany). The latter permits us to create a new, unique effect type merely by concatenating a list of other effect types.
Each method is then given an EMany effect type containing its full set of effect types. For example:
[MayBlock, Throws(typeof(FileNotFoundException)), NoAllocations] void M() { … }
Is given the distinct effect type EMany { EMayBlock, EThrows(typeof(FileNotFoundException)), ENoAllocations }.
We should make one generalization before moving on. ENil ~ EMany { }. In other words, having no effects is equivalent to a list of no effects. Furthermore, EMany { } ~ EMany { ENil }. In other words, having a list of no effects is equivalent to having no effects.
Now we are ready to weave everything together. The main question confronting us is as follows: What is the subtyping relationship between the various effect types, including the null and list types?
The easiest to do away with is the EMany type. Given two EMany types E and F, then E <: F if, for all effects T in E’s type set, there exists an effect type U in F’s type set such that T <: U. In simpler terms, a list is a subtype of another list so long as all of its components are also subtypes of a component of the other. This is very abstract, but we shall see soon why it is useful.
Now we get to see why the additive and subtractive distinctions are so important:
Given an additive effect type EAdditive, we say ENil <: EAdditive.
Given a subtractive effect type ESubtractive, we say ESubtractive <: ENil.
The first statement says that a method with no effects is substitutable for a method with additive effects, and the second says that a method with subtractive effects is substitutable for a method with no effects. The corollaries are perhaps just as important. A method with additive effects cannot take the place of a method with no effects, whereas a method with subtractive effects can.
For the simple single-effect case, effects depicted in this way represent points on a line, where ENil is zero, subtractive effects are negative integers, and additive effects are positive integers. The lattice obviously becomes rather complicated as many effects accumulate.
Where does substitutability come up with respect to methods, anyway, you may ask? The first is in determining which other methods can be called. If a method M with effects E is trying to call another method N with effects F, this is permitted so long as F <: E. The next is in virtuals and overriding. A virtual with effects E may be overridden by a method with effects F so long as F <: E. The following example illustrates this idea, in addition to the composition of the subtyping rules we have shown so far:
class C { public virtual void M() {} [MayBlock, NoAllocations] public virtual void N() {} }
class D : C { [MayBlock, NoAllocations] public override void M() {} public override void N() {} }
In this example, the four methods are given the following effect types:
- C::M gets EMany { ENil }, or just ENil.
- C::N gets EMany { MayBlock, NoAllocations }.
- D::M gets EMany { MayBlock, NoAllocations }.
- D::N gets EMany { ENil }, or just ENil.
What does all this gibberish mean? Well it’s straightforward and intuitive, actually.
We are attempting to add the MayBlock and NoAllocations effects to the overridden M method which has none. Because MayBlock is additive, this is illegal (someone might call C::M thinking the code will not block), whereas it is OK for NoAllocations (calls through D::M are assured no allocations will happen, even though calls through C::M are guaranteed no such thing). Similarly, we are attempting to remove both effects from the overridden N method. Because MayBlock is additive, this is OK (M isn’t required to block, even though calls through C::M may suspect it of doing so), whereas it is decidedly not OK for NoAllocations (calls through C::M will reasonable assume allocations do not happen, whereas D::M would be free to perform them). It may take some thought to convince yourself that this is correct, but I hope that you find that it is. All of this works because of the subtyping of effect types.
All of this works similarly with delegates. The source delegate signature is akin to the base class in the above example, whereas the target method being bound to is like the override.
Things get a little more complicated when considering the EThrows effect. It is additive, so it is of course true that ENil <: EThrows(*). However, what if we have two different EThrows, and wish to inquire about substitutability of one in place of the other? We can come up with a simple rule that is general purpose for all set-of-type kinds of effects. Namely, consider two instances A and B of the same effect type:
Given an additive effect type EAdditive, then A <: B if, for all types T in A’s set-of-types, there exists a type U in B’s set-of-types such that T <: U.
Given a subtractive effect type ESubtractive, then A <: B if, for all types T in A’s set-of-types, there exists a type U in B’s set-of-types such that U <: T.
These sound quite similar, except that they end differently (i.e. T <: U vs. U <: T). We may illustrate the additive case with EThrows; to illustrate the subtractive case, let us imagine we can declare a ENoAllocations effect type that specifies which precise types may not be allocated:
class A{} class B : A{}
class C { [Throws(typeof(Exception)), NoAllocations(typeof(A))] public virtual void M() {} [Throws(typeof(FileNotFoundException)), NoAllocations(typeof(B))] public virtual void N() {} }
class D : C { [Throws(typeof(FileNotFoundException)), NoAllocations(typeof(B))] public override void M() {} [Throws(typeof(Exception)), NoAllocations(typeof(A))] public override void N() {} }
The results should not be surprising. D::M overrides C::M’s exception list, by being more specific and declaring that FileNotFoundException is thrown instead of just Exception. This is OK. Whereas D::N overrides C::N’s list by being more general purpose, specifying Exception instead of FileNotFoundException. This is clearly not OK. The NoAllocations type works in exactly the reverse. D::M attempts to prohibit allocations of B, but this is merely one possible subtype of the base method C::M’s declaration of A, and therefore this is illegal. Whereas D::N ensures no instances of A are allocated, which of course subsumes the base method C::N’s declaration that no B’s are allocated.
Everything gets a little more interesting when you consider generics. For example, how would we type a general purpose Map method? (This pattern arises quite frequently.) We would presumably want it to somehow “acquire” the effects of the delegate it invokes on all elements in a list. For example:
U[] Map<T, U>(T[] input, Func<T, U> func);
This declaration is stronger than necessary. The Func<T, U> class – prepackaged with the .NET Framework – does not have any effects on it. So it may not, for example, bind to a method that has any additive effects like Throws on it. This is rather unfortunate.
To solve this we could imagine treating effects with parametric polymorphism:
[Effects(E)] U Func<T, U, [EffectParameter] E>(T x);
This fictitious syntax merely says that Func can be instantiated with an effect type E, and that the Func “method” itself acquires the effect E. (Admittedly I should stop using faux-attribute syntax for illustrations since we’ve reached this level of language integration.) Now Map can be declared as such:
[Effects(E)] U[] Map<T, U, [EffectParameter] E>(T[] input, Func<T, U, E> func);
This says that Map has the same effects as the Func that is supplied as an argument. It turns out that we may want to extend this further, by enabling symbolic manipulation of effects. We may wish, for example, to specify that the Func is not allowed to block, by stating it does not have [MayBlock] in it. You could imagine using something very similar to generic constraints to achieve this. It is also interesting to allow concatenation of multiple effect types, both through partial and full specialization. For example, Map above may clearly have effects of its own. You also tend to want generic constraints like, 'where E : F', which of course just depends on the aforementioned subtyping rules. And of course C# 4.0's co- and contravariance can be applied to effects too.
Anyway, I have probably gone beyond most readers’ interest level in this subject. Things sure do get very interesting when you allow symbolic manipulation of effects. They get even more interesting when you begin to think of types as having “permissible effects” attached to them. However, the main thing I wanted to point out with this brief article is that this pattern arises quite frequently. And despite everyone struggling through what seem to be odd corner cases as they develop ad-hoc solutions, there really is a sound generalization behind it all. Many languages have first class effect typing, and I have found it liberating to think of many of these type system annotations through that lens. Perhaps you shall too.
 Sunday, February 28, 2010
Simon Peyton Jones was in town a couple weeks back to deliver a repeat of his ECOOP’09 keynote, “Classes, Jim, but not as we know them. Type classes in Haskell: what, why, and whither”, to a group of internal Microsoft language folks. It was a fantastic talk, and pulled together multiple strains of thought that I’ve been pondering lately, most notably the common thread amongst them.
In the talk, he compared polymorphism in Java-like languages (including C# which I will switch to referring to over Java hereforth) with ML and Haskell. In other words, how does a programmer commonly write code in each language that is maximally reusable? Of course, C# programmers primarily achieve this through subclassing, whereas functional programmers rely on type parameterization. Over the years, however, the former group has begun to borrow a great deal from the latter; as evidence, witness the growingly-pervasive use of generics in both Java and C# over the past decade. The talk was given mainly through the lens of this evolution, which appears to approach an interesting limit if projected far enough into the future.
Type classes came on the scene towards the end of the 1980’s, and immediately became a fertile seed for research and exploration in the relationship between subclass and parametric polymorphism. Type classes are much closer to subclass polymorphism than Haskell’s borrowed ML-style, which is to say parametric polymorphism. This is most intriguing because Haskell does not rely on subclassing, and so the mixture of two breeds new patterns.
I thought that it might be interesting to compare the mixture of subclass and parametric polymorphism in Haskell vis-à-vis type classes with the same in C# vis-à-vis a mixture of interfaces, generics, and generic constraints. Hence this post. We shall proceed by examining some basic type classes in Haskell with their equals in C#. Though similar, the dissimilarities are as stark as the similarities. And the lack of higher kinds -- particularly when combined with type classes -- means that some Haskell patterns simply are not expressible in C#.
The Simple Case: Equality (or Lack Thereof)
The most basic type class of all is Eq, which allows the comparison of two like-typed pieces of data. This may seem like a commodity if you ordinarily write code in languages like Java and C# which have a strong notion of object identity. In Haskell, however, equality is value equality over algebraic data types rather than objects, so polymorphism over equality operators is quite a bit more important. Indeed, as we shall see, Haskell’s approach is more powerful than == in Java-like languages. (Witness the neverending dichotomy between reference and value equality vis-à-vis Object.Equals in C#.) But alas, let us proceed by crawling in a series of logical steps, rather than leaping to the conclusions.
Haskell’s Eq type class is defined as such: class Eq a where
(==), (/=) :: a -> a -> Bool
x /= y = not (x == y)
x == y = not (x /= y)
As you see, Eq provides two operators: == and /=. Default implementations of each define == as the inverse of /= and /= as the inverse of ==. Not only is this a convenience, but it also specifies the desired contract implementations ought to abide by. Other types may become members of the Eq class by mapping the one or both operators to type-specific functionality. You will immediately recognize the similarity to virtual methods in OOP languages, where the operators can be overridden by subclasses.
Of course all of the primitive data types already implement Eq, so you get value equality over numbers, strings, etc. Imagine we declared a new Coords type – comprised of two integers – and want to make it a member of Eq also – wherein equality is determined by a pairwise comparison of each’s members:
data Coords = Coords { fst :: Integer, snd :: Integer }
We make Coords a member of the Eq type class, and thereby define equality over instance, through the ‘instance Eq Coords where’ construct. This maps type class functions to real implementation functions. The example here defines them inline, though you may of course refer to existing functions instead: instance Eq Coords where
(Coords fst1 snd1) == (Coords fst2 snd2) = (fst1 == fs2) && (snd1 == snd2)
Now we can take a ‘[Coords]’ and ask whether a particular ‘Coords’ exists within it.
A function may constrain a type variable to a certain class, and thereby access members of that class. For example, the following ‘isin’ function tests whether an instance of some type ‘a’ is contained within a list of type ‘[a]’. To do this, it demands that ‘a’ is a member of Eq using the syntax “Eq a =>”:
isin :: Eq a => a -> [a] -> Bool
x `isin` [] = False
x `isin` (y:ys) = x == y || (x `isin` ys)
The moral equivalent to the Eq type class in C# is not so easy to decide. The most obvious first guess is the built-in == and != operators. However, we will quickly find that this is not quite right, because these operators are not polymorphic in C#. To illustrate this point, let’s try to write the ‘isin’ method in C#, using generics and the == operator, for example:
bool IsIn<T>(T x, T[] ys)
{
foreach (T y in ys) {
if (x == y)
return true;
}
}
This function will not compile. The reason is that == and != in C# are not defined over all types (specifically not for value types). You can get IsIn to compile by restricting the T to a reference type:
bool IsIn<T>(T x, T[] ys) where T : class
{
… same as above …
}
Although this code is deceptively similar to the Haskell example, it is actually quite different. The == used to compare two instances compiles into the MSIL CEQ operator, effectively hard-coding an object identity comparison. Even if an overloaded == operator for a particular instantiated T is available, the compiler will not bind to it. Why? Because it is overloading and specifically *not* overriding. For example, say we had a MyData type and an overloaded == operator for comparing two instances:
class MyClass
{
public static bool operator ==(MyClass a, MyClass b) { return true; }
public static bool operator !=(MyClass a, MyClass b) { return false; }
}
According to this, all MyClass objects are equal. However, the following call yields the answer ‘false’:
IsIn<MyClass>(new MyClass(), new MyClass[] { new MyClass() });
The same problem arises should instances of MyClass get referred to by Object references. == and != do not perform any kind of virtual dispatch; the selection of implementation is chosen statically.
Perhaps it is the Equals method inherited from System.Object, then? This, at least, is virtual. And indeed, this gets much closer to Eq. Any type may override Equals, and a generic definition defined in terms of it dispatches virtually and allows subclasses to change behavior on a type-by-type basis: bool IsIn<T>(T x, T[] ys)
{
foreach (T y in ys) {
if (x == y || (x != null && x.Equals(y)))
return true;
}
return false;
}
(Even this is slightly different, because it assumes a certain type-agnostic behavior about nulls.)
This is cheating, however. We’ve taken advantage of the fact that someone thought to put an Equals method on System.Object, thereby giving all Ts such a method. There are clearly limits to how many crosscutting things can be added to System.Object before it becomes overwhelmed with concepts, not to mention the size (e.g. v-tables). Moreover, Equals on Object is weakly typed; a better solution is to use interfaces, like the IEquatable<T> interface that introduces a strongly typed Equals method:
public interface IEquatable<T>
{
bool Equals(T other); }
And to use a generic type constraint on IsIn’s T, much more akin to what ‘isin’ in Haskell above did:
bool IsIn<T>(T x, T[] ys)
where T : IEquatable<T>
{
foreach (T y in ys) {
if (x == y || (x != null && x.Equals(y)))
return true;
} return false;
}
This is cheating a little less, because we can implement an interface after-the-fact without impacting a class’s type hierarchy. This, in fact, looks remarkably similar to the Haskell ‘isin’ shown earlier, using type classes and parametric polymorphism, where here we have used interfaces in place of type classes.
We might be tempted to define a default NotEquals method over all IEquatable<T> instances, just like Haskell does by implementing the defaults for == and /= as the inverse of each other:
public static class Equatable
{
public static bool NotEquals<T>(this IEquatable<T> @this, IEquatable<T> other)
{
return !this.Equals(other);
}
}
This is not perfect. It is not polymorphic; see my previous post for an extensive discussion of this and related points. And what about nulls? If '@this' is null, the default implementation is going to AV. We’d need to bake in type-agnostic knowledge of null again. Sigh!
Sadly, it turns out this whole approach in general isn’t quite right anyway. For two reasons:
- First, we still infect the type in question with the interface being implemented; it cannot be done completely outside of the type’s definition, as with type classes.
- Second, type classes in Haskell do not actually require a value of the type in question to dispatch against the class’s functions, whereas we clearly do in the above example: we need to virtually dispatch against the object, and rely on this virtual dispatch to execute different code for each type. This will come up as we look at the numeric classes, but it is a critical difference.
A closer analogy is to use IEqualityComparer<T>:
public interface IEqualityComparer<T>
{
bool Equals(T x, T y);
}
(IEqualityComparer<T> in .NET also has a GetHashCode method on it. Let’s ignore that for now.)
Unfortunately, if our IsIn method were to use IEqualityComparer<T> to do its job, callers would be required to pass an instance explicitly; we cannot infer a “default” comparer based solely on the T:
bool IsIn<T>(T x, T[] ys, IEqualityComparer<T> eq)
{
foreach (T y in ys) {
if (eq.Equals(x, y))
return true;
}
return false;
}
Type classes actually function rather similarly, with two major differences:
- The interface object – called a dictionary – is passed and used implicitly.
- The mapping from types to dictionaries is done implicitly, whereas in .NET you’ll need to find an instance of the interface in question through other means.
This second difference is solved by a little hack in .NET. If you take a look at the EqualityComparer<T>.Default property, you shall see a lot of slightly gross reflection code to return an instance of IEqualityComparer<T> for any arbitrary T. The code checks some well-known types and conditions, and ultimately falls back to the aforementioned interfaces and default Equals method for the most general case. It’s not pretty, but it’s a beautiful hack given the tools at our disposal in C#.
A Harder Case: Polymorphic Numbers, on Output Parameters
The Eq type class is easy. The functions it defines are polymorphic on their inputs, but not on their outputs; both == and /= return Bool values. Once we transition to polymorphic output parameters or return values, we encounter a pattern quite different from that which is found in most .NET interfaces.
Let’s illustrate these differences by looking at Haskell’s Num type class:
class (Eq a, Show a) => Num a where
(+), (-), (*) :: a -> a -> a
negate :: a -> a
abs, signum :: a -> a
fromInteger :: Integer -> a
Here we see another feature of Haskell type classes: inheritance. Num derives from both Eq and Show – indicated by “(Eq a, Show a) => Num a” – the latter class of which we have not yet shown but is the moral equivalent to .NET’s Object.ToString method. It enables pretty printing of values, clearly something that would be expected to be common among all numeric data types. Haskell’s numeric class hierarchy is quite elegant, enabling highly polymorphic computations. A nice little tutorial of can be found here: http://www.haskell.org/tutorial/numbers.html.
But the question at hand is what the C# equivalent would be.
Our first approach would be to mimic the IEquatable<T> solution above:
interface INumeric<T>
{
T Add(T d);
T Subtract(T d);
T Multiply(T d);
T Absolute();
T FromInteger(int x);
}
This works fine, and primitive types in .NET could presumably implement it:
struct int : INumeric { .. }
struct float : INumeric { .. }
struct double : INumeric { .. }
…
This enables polymorphic code, like a Sum method, through the use of generic type constraints:
public static T Sum<T>(params T[] values)
where T : INumeric<T>
{
T accum = default(T);
foreach (T v in values)
accum = v.Add(accum);
return accum;
}
This example works great. Why then, you might wonder, doesn’t LINQ use this instead of providing special-case overloads of Average, Min, Max, Sum, etc. for all well-known primitive data types?
The primary reason is the performance hit taken to perform addition through O(N) interface calls versus O(N) MSIL ADD instructions. It is just a basic fact of life that today’s leading edge separate compilation techniques will not achieve parity with the hand specialized variants. While it is true that the JIT compiler *could* specialize the code for specific Ts and specific interfaces to emit more efficient instructions, like int, float, etc. over INumeric<T> calls, it will not do so today. This reduces the ability to share code – which admittedly is what we want here – and is tangled up in a judgment call based on heuristics. But I digress.
There is a larger problem that arises with other examples, at least from a language expressiveness point-of-view: the need to have an instance in hand to invoke interface methods. FromInteger, for example, is rather awkward to write. In fact, we cannot write a method with INumeric<T> like we could in Haskell:
public static T MakeT<T>(int value)
where T : INumeric<T>
{
… ? …
}
How do we invoke FromInteger, given that no T is available at the time of MakeT’s invocation? You can’t; you need to arrange for an instance to be available. There are ways out of this corner. One solution is to mandate that T has a default constructor:
public static T MakeT<T>(int value)
where T : INumeric<T>, new()
{
return new T(value).FromInteger(value);
}
That is always acceptable for structs, since they always have such a constructor; but this practice requires that classes be designed to possibly not hold invariants at all times, and so is not always acceptable or at the very least requires design accommodation.
The alternative is probably obvious. Use a similar approach to IEqualityComparer<T>:
interface INumericProvider<T>
{
T Add(T x, T y);
T Subtract(T x, T y);
T Multiply(T x, T y);
T Absolute(T x);
T FromInteger(int x);
}
And now, of course, each method that does polymorphic number crunching must accept an instance of INumericProvider<T>. That’s particularly cumbersome, so it’s more likely that .NET developers would prefer the aforementioned approach, where the type must provide a default constructor.
Admittedly, I seldom run into this particular problem in practice; but when I do, I really wish I had something like Haskell type classes to help me out.
Before moving on, it is worth pointing out one Haskell type class problem that explicit interface object passing in .NET helps to avoid. Should you need multiple implementations of a given class for the same type, as is relatively common with equality comparisons, you must disambiguate in Haskell by separation by module and being careful about what you import. This is similar to C#’s extension methods. With explicitly passed interface objects, however, it is trivial to manage and pass separate objects if you’d like.
Close, but No Cigar: Higher Kinds
There is one last feature that Haskell provides – a pretty big one, I might add – that C# simply cannot do: higher kinded types, or polymorphism over constructed types. This feature is orthogonal to type classes, but gets used pervasively in conjunction with them. An example will make this stunningly clear:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b
return :: a -> m a
fail :: String -> m a
m >> k = m >>= \_ -> k
fail s = error s
Let’s try to transcribe the core of this class in C#, renaming >>= to Bind, and omitting the >> and ‘fail s’ operators because they have default implementations: public interface IMonad<M, A>
{
M<B> Bind<B>(M<A> m, Func k);
M<A> Return(A a);
M<A> Fail(string s);
}
This approach is tantalizingly close. It suffers from the already-admitted problem that, for any M<A> instance, you will need to pass the appropriate IMonad<A> provider object – just as with the IEqualityComparer<T> and INumericProvider<T> examples above.
But the code of course won’t *actually* compile, because the type variable M cannot be constructed as shown here. We find references to M<A> and M<B>, which are complete nonsense to C#. M is just a plain type variable. M is required to be what Haskell calls a type constructor (* -> *), which is a generic type that must be instantiated before it is a terminal type. I’ve written about this before. Although it seems like a trivial omission in C#’s language definition, it strikes at the heart of the type system.
A fictitious syntax for expressing this in C# might be:
public interface IMonad
where M : <>
{ ...
}
And if, say, M were expected to be a two- or three-parametered type, we would find, respectively:
... where M : <,>
... where M : <,,>
And so on.
This could in theory work. But C# -- and more worrisome .NET and the CLR – do not support this presently, and, to be quite honest, likely never will. It is immensely powerful, however. Life without monads is a life destined to continuous repetition. The “LINQ Pattern”, for example, is one example case in .NET where, for each ‘source’ type, we must create a “copy” of the original System.Linq.Enumerable variant. And shame on those who wish to write polymorphic code that will work for any LINQ provider.
Winding Down
Let’s wind down. I need to go grab dinner at Mama's Fish House on Maui right now.
I hope to have shown some of the similarities and dissimilarities between type classes and interfaces, and some patterns that arise when these things are mixed with parametric polymorphism. The mix of inheritance for type classes, but not for implementation types, in Haskell is unique. C#, of course, allows inheritance both amongst interfaces and implementations which is both a blessing and a curse.
I do think both camps have something to teach one another. For example, having a default interface lookup mechanism for arbitrary types in C# would be wonderful, and indeed might provide a replacement for extension methods that has more longevity. I’m sure much of this will happen with time; either “in place” as the respective languages evolve, or as new languages are created with time.
But most importantly, I hope that the blog post was educational and fun. Enjoy.
 Tuesday, February 09, 2010
One of my comments in the 2nd edition of the .NET Framework design guidelines (on page 164) was that you can use extension methods as a way of getting default implementations for interface methods. We've actually begun using these techniques here on my team. To illustrate this trick, let's rewind the clock and imagine we were designing new collections APIs from day one.
Let's say we gave the core interfaces the most general methods possible. These may neither be the most user friendly overloads nor the ones that most people use all the time. They would, however, be those from which all the other convenience methods could be implemented. An INewList<T> interface that was designed with these principles in mind may look like this:
public interface INewList<T> : IEnumerable<T>
{
int Count { get; }
T this[int index] { get; set; }
void InsertAt(int index, T item);
void RemoveAt(int index);
}
This interface is missing all the nice convenience methods you will find on .NET's IList<T>, like Add, Clear, Contains, CopyTo, IndexOf, and Remove. So it's not really as nice to use. You can't write an API that takes in an INewList<T> and performs an Add against it, for example, like you can with IList<T>.
One approach to solving this might be to write a concrete class -- much like .NET's System.Collections.ObjectModel.Collection<T> -- that provides concrete implementations of all of these methods, and then other lists can simply subclass that. But we can do better.
Instead, let's give INewList<T> default implementations of all of these methods. How do we do this? That's right: with extension methods. Voila!
public static class NewListExtensions
{
public static void Add<T>(this INewList<T> lst, T item)
{
lst.InsertAt(lst.Count, item);
}
public static void Clear<T>(this INewList<T> lst)
{
int count;
while ((count = lst.Count) > 0) {
lst.RemoveAt(count - 1);
}
}
public static bool Contains<T>(this INewList<T> lst, T item)
{
return lst.IndexOf(item) != -1;
}
public static void CopyTo<T>(this INewList<T> lst, T[] array, int arrayIndex)
{
for (int i = 0; i < lst.Count; i++) {
array[arrayIndex + i] = lst[i];
}
}
public static int IndexOf<T>(this INewList<T> lst, T item)
{
var eq = EqualityComparer<T>.Default;
for (int i = 0; i < lst.Count; i++) {
if (eq.Equals(item, lst[i])) {
return i;
}
}
return -1;
}
public static bool Remove<T>(this INewList<T> lst, T item)
{
int index = lst.IndexOf(item);
if (index == -1) {
return false;
}
lst.RemoveAt(index);
return true;
}
}
Well isn't that neat. We've now given any INewList<T> implementations all these common methods without dirtying their class hierarchies, built atop a tiny core of extensibility. This is much like .NET's Collection<T> which exposes the core as abstract methods. Indeed, we can go even further. Any convenience overloads, like the multitude of CopyTos on List<T> in .NET, can be given to all INewList<T>'s also. And yet implementing INewList<T> remains as braindead simple as it was before: two properties and two methods. In fact, it's simpler than doing a more feature-rich IList<T>, because the convenience methods come for free.
It would be even niftier if you could add these methods straight onto INewList<T>, and have the C# compiler emit the extension methods silently for you. In other words:
public interface INewList<T> : IEnumerable<T>
{
... interface methods (as above) ...
void Add(T item)
{
InsertAt(Count, item);
}
void Clear()
{
int count;
while ((count = Count) > 0) {
RemoveAt(count - 1);
}
}
... and so on ... }
Although this would just be sugar for the NewListExtensions class shown earlier, it sure saves some typing and makes it the pattern more apparent and first class.
Though cool, this whole idea is certainly not perfect.
For one, there are no extension properties. So you can't use this trick for properties.
But the more obvious and severe downside to this approach that these methods are not specialized for the given concrete type. For example, the Clear method is potentially far less efficient than a hand-rolled List<T>, because it does O(N) RemoveAts rather than a single O(1) fixup of the count.
Recall now that the compiler binds more tightly to instance methods than extension methods. So we could implement our own little list class with a faster Clear method if we'd like:
class MyList : INewList<T>
{
... the two properties and two methods from INewList<T> ...
public void Clear()
{
.. efficient! ... }
}
Now when someone calls Clear on a MyList<T> directly, the compiler will bind to the efficient Clear.
This is still not perfect. If you pass the MyList<T> to an API that takes in an INewList<T>, any calls to Clear will fall back to the extension method. Extension methods are not virtual in any way. You can try to simulate virtual dispatch, but it gets messy quick. For example, say we defined an IFasterList<T> that includes all those convenience methods that lists frequently want to make faster; we can then do a typecheck plus virtual dispatch in the extension method.
For now, let's pretend that's just the Clear method:
public interface IFasterList<T> : INewList<T>
{
void Clear();
}
Of course, MyList<T> above would now implement IFasterList<T>. Invocations through IFasterList<T> will automatically bind to the faster variant; but if objects that implement IFasterList<T> get passed around as IList<T>s, you lose this ability. So the Clear extension method can now do a typecheck:
public static void Clear<T>(this INewList<T> lst)
{
IFasterList<T> fstLst = lst as IFasterList<T>;
if (fstLst != null) {
fstLst.Clear();
return;
}
int count;
while ((count = lst.Count) > 0) {
lst.RemoveAt(count - 1);
}
}
This works but is obviously a tedious and hard-to-maintain solution. It would be neat if someday C# figured out a way to "magically" reconcile virtual dispatch and extension methods. I don't know if there is a clever solution out there. I am skeptical. Nevertheless, despite this flaw, the above techniques are certainly thought provoking and interesting enough to play around with and consider for your own projects. And at the very least, it's fun. Enjoy.
 Friday, January 08, 2010
Sometimes you need to wait for something before proceeding with a computation.
Perhaps you need to know the value of some integer that is being computed concurrently. Maybe you need to wait for the bytes to flush to disk before telling another process the file is consistent and ready to read. Or you need to get that next row back from the database before painting it on the UI. It could be that you need to wait for the missile to leave the bay before closing the bay door. And so on.
And sometimes there’s simply nothing better to do while waiting for these things to happen other than to let the CPU halt (or let other processes on the machine run). You need to twiddle your thumbs a bit, and exhibit a little patience. Or at least your program does. This is simply an unfortunate fact of life.
This manifests numerous ways in our programming models:
1) Waiting on an event. 2) Waiting to acquire an already-held lock. 3) Finding that the GUI message queue is empty and doing a MsgWaitForMultipleObjectsEx. 4) Finding that the COM RPC queue is empty and doing a CoWaitForMultipleHandles. 5) Issuing an Ada rendezvous ‘accept’ and finding that no messages await you, thus blocking. 6) Issuing an Erlang ‘receive’ and finding that no messages await you, thus blocking. 7) Waiting on a .NET 4.0 task. 8) Issuing a ContinueWith on a .NET 4.0 task. 9) And so on.
There are three big distinctions to make about the characteristic nature of this waiting: namely, (1) what condition's establishment is being sought -- i.e. the reason for the wait, (2) whether multiple such conditions of interest may be waited on simultaneously, and, related, (3) whether waiting for said condition(s) necessarily means that the processing of some other conditions that may arise elsewhere, but require the blocked context to run, cannot occur.
I will be the first to admit that this statement is rather abstract. But it really does matter.
For example, MsgWaitForMultipleObjectsEx is a pumping wait. Not only do you wait for the occurrence of one of several events to get set, but the arrival of a new top-level message at the message queue (either GUI or COM RPC-related) causes immediate processing of that message, presuming the thread is blocked at that call at the time. Although you can be deeply nested in some complicated code, you “jump” to the event loop to run the message handling code. Vanilla WaitForMultipleObjectsEx works in a similar way vis-à-vis APCs, provided the wait is alertable. This is quite different from a fully blocking non-pumping wait, which only waits for one or more very specific events, but does not dispatch messages simultaneously.
Win32 esoterica aside, the concepts appear elsewhere. The moral equivalent in Ada or Erlang is to do a selective-accept or -receive, intentionally not dispatching certain messages that might arrive in the meantime. (To be fair, you can also do this in COM with message filters.) This often happens when you nest accepts and receives. You may be capable of processing messages A-Z at the top-level tail recursive loop; but if that nested accept only knows about message kinds M and N, then there are 24 other kinds that will not be picked up in the meantime.
Not pumping for messages is dangerous. And it can lead to deadlock if you pump for the wrong ones. Like if you’re accepting M or N, yet the triggering of M or N depends on first processing some message K waiting in the queue. COM RPCs with cycles run face first into this. And/or not pumping can lead to responsiveness and scalability problems. Perhaps M or N eventually does arrive, yet little old K needs to wait an indeterminate amount of time before it is seen. Whereas we could have overlapped its processing. This is why most STAs pump while waiting, and, similarly, why many Erlang processes consist of a main loop that is prepared to handle any message the process accepts at that top level loop. They may seem very different but they are strikingly not.
Yet paradoxically pumping for messages is also dangerous. You must predict all the kinds of messages that may reentrantly get executed, and your state at the point of the blocking call must be consistent enough to tolerate them. (At least those that will actually happen.) In COM STAs, this can be wholly unpredictable and indeed because the CLR auto-pumps on STAs the blocking points can be hidden. Overly aggressively admitting messages may seem like the right thing to do, until you’ve wedged yourself into some unforeseen inconsistent state. You can avoid this by making each message handler atomic; see Argus. But if you can't or don't have the discipline to do that, or aren't quite sure, you must not pump. You either avoid pumping altogether or you selectively pump messages that do not touch the state encapsulated by the pump. Or you lock access to state with a non-recursive lock and run the risk of deadlock.
I have found it clarifying to think about blocking in event loop concurrency and state machine terms, advancing from one state to the next in between waits. It’s a slippery model, but particularly when working in message passing systems that employ event loops, it can help to identify all the familiar problems with shared memory, blocking, and consistency.
Indeed it is interesting how blocking and non-blocking systems can rapidly approach each other. Starting from either extreme tempts you to tiptoe closer and closer to the middle. The familiarity of the other extreme tempts you. Until, alas, you just might meet in the middle.
 Sunday, January 03, 2010
Rewind the clock to mid-2004. Around this time awareness about the looming “concurrency sea change” was rapidly growing industry-wide, and indeed within Microsoft an increasing number of people – myself included – began to focus on how the .NET Framework, CLR, and Visual C++ environments could better accommodate first class concurrent programming. Of course, our colleagues in MSR and researchers in the industry more generally had many years’ head start on us, in some cases dating back 3+decades. It is safe to say that there was no shortage of prior art to understand and learn from.
One piece of prior art was particularly influential on our thoughts: software transactional memory. (STM, or, in short just TM.) In fact, right around that time, Tim Harris’s TM work grew in notoriety (my first exposure arriving by way of OOPSLA’03’s proceedings, which contained the “Language Support for Lightweight Transactions” paper). TM was immediately fascinating, and simultaneously promising. For a number of reasons:
- TM hid sophisticated synchronization mechanisms under a simple veil.
- It could be implemented using sophisticated (and scalable) techniques, again under a simple veil.
- It built on decades of experience in building scalable and parallel transactional databases.
- Among others. But most of all, it was a bright shiny light in a sea of complexity.
- And how fortunate: Tim was a colleague in our neighboring MSR Cambridge offices (and still is).
In a nutshell, TM offered declarative concurrency-safety. You declare what you’d like in as few simple words as possible, and you get what you want. In this case, those simple words are ‘atomic { S; }’.
Many people latched onto TM rapidly and simultaneously, both inside and outside of Microsoft. I hacked together a little prototype built atop SSCLI (“Rotor”), and another architect on our team built an even more feature-rich prototype using MSIL rewriting. We compared notes, began jointly exploring the design space, and talking more regularly with other colleagues like Tim in MSR. Soon thereafter we kicked off a small working group with about a dozen architects and researchers from around the company, aiming to articulate what a real productized TM might look like. Fun times.
We were eventually given the OK for an official “incubation” project, and multiple years’ of exploration and hard work ensued. In fact, the fruits of a team of many’s labor recently got released in the form of a Community Technology Preview -- a good conduit for experimentation, but with no commitment to add it to any of Microsoft’s products. To be clear, I had only a small part to play in this ambitious project, and mostly towards the start. Partway through, I stepped away to do PLINQ and Parallel Extensions to .NET, both of which are now part of the .NET Framework 4.0. Dozens of amazing people played a significant role in the project over the years. But I am getting way ahead of myself…
I’ve been away from the nitty-gritty day-to-day details of TM for about 3 years now, which feels sufficiently long to develop a healthy perspective on the project. So here it is. What follows is of course in no way Microsoft’s “official position” on the technology, but rather my own personal one. I’ve interspersed generalizations with specific details because that’s just how my brain thinks about TM.
Towards the North Star
A wondrous property of concurrent programming is the sheer number and diversity of programming models developed over the years. Actors, message-passing, data parallel, auto-vectorization, ...; the titles roll off the tongue, and yet none dominates and pervades. In fact, concurrent programming is a multi-dimensional space with a vast number of worthy points along its many axes.
This rich history is simultaneously a blessing and a daunting curse. But in any case can make for some very interesting multi-year-long immersion. My UW talk from 1 1/2 years ago just barely touches on the sheer breadth.
TM’s greatest virtue is the first word in its name: transactional. It turns out that, no matter your concurrent programming model du jour, three fundamental concepts crop up again and again: isolation (of state), atomicity (of state transitions), and consistency (of those atomic transitions). We use locks in shared-memory programming, coarse grained messages in message-passing, and functional programming to achieve all of these things in different ways. Transactions are another such mechanism, sure, but more than that, transactions are an all-encompassing way of thinking about how programs behave at their most fundamental core. Transaction is a religion.
Not everybody believes this, and of course why would they: it is an immensely subjective and qualitative statement. Some will claim that models like message passing entirely avoid the likes of “race conditions,” and such, but this is clearly false: state transitions are made, complicated state invariants are erected amongst a sea of smaller isolated states, and care must be taken, just as in shared memory. Even Argus, a beautiful early incarnation of message-passing (via promises) demands that messages are atomic in nature. This property is not checked and, if done improperly, leads to “races in the large.” Even Argus introduced the notion of transactions and persistence in the form of guardians.
Of course, message passing helps push you in the right direction. It is not, however, a panacea.
I was reading my ICFP proceedings recently and was reminded of research done in the context of Erlang that supports this assertion. In it, they apply CHESS-like techniques (with clever search space culling) to find race conditions. Indeed we use similar techniques very successfully for our message-passing programming models on my team here at Microsoft.
Transactions are terrific because they are “automatic”. You declare the boundaries, and the transactional machinery takes care of the rest. This is true of databases and also TM. Countless developers in the wild write massively concurrent programs by issuing operations against databases: they can do this so easily because they grok the simple façade that transactions provide. Numerous server-side state-based applications use transactions to shield programmers from the pitfalls of concurrency. Behold MSDTC. The bet we were making is that similar models would scale down just as well “in the small”.
The canonical syntactic footprint of TM is also beautiful and simple. You say:
atomic {
… concurrency-safe code goes here …
}
And everything in that block is magically concurrency-safe. (Well, you still need to ensure the consistency part, but isolation and atomicity are built-in. Mix this with Eiffel- or Spec#-style contracts and assertions like those in .NET 4.0, run at the end of each transaction, and you’re well on your way to verified consistency also. The ‘check E’ work in Haskell was right along these lines.) You can read and write memory locations, call other methods, all without worrying about whether concurrency-safety will be at risk.
For example, consider three transactions running concurrently:
int x = 0, y = 0, z = 0;
atomic { atomic { atomic {
x++; y++; z++;
} x++; y++;
} x++;
}
No matter the order in which these run, the end result will be x == 3, y == 2, z == 1.
Contrast this elegant simplicity with the many pitfalls of locks:
- Data races. Like forgetting to hold a lock when accessing a certain piece of data. And other flavors of data races, such as holding the wrong lock when accessing a certain piece of data. Not only do these issues not exist, but the solution is not to add countless annotations associating locks with the data they protect; instead, you declare the scope of atomicity, and the rest is automatic.
- Reentrancy. Locks don’t compose. Reentrancy and true recursive acquires are blurred together. If a locked region expects reentrancy, usually due to planned recursion, life is good; if it doesn’t, life is bad. This often manifests as virtual calls that reenter the calling subsystem while invariants remain broken due to a partial state transition. At that point, you’re hosed.
- Performance. The tension between fine-grained locking (better scalability) versus coarse-grained locking (simplicity and superior performance due to fewer lock acquire/release calls) is ever-present. This tension tugs on the cords of correctness, because if a lock is not held for long enough, other threads may be able to access data while invariants are still broken. Scalability pulls you to engage in a delicate tip-toe right up to the edge of the cliff.
- Deadlocks. This one needs no explanation.
In a nutshell, locks are not declarative. Not even close. They are not associated with the data protected by those locks, but rather the code that accesses said data. (For example: in the above code snippet, do we need three locks? Or one? Or …? Imagine we choose three: one for each variable, x, y, and z. What if we increment z, release its associated lock, and some other thread can now see the newly incremented z before the y and x get incremented. Whether this is acceptable depends on the program.) Sure, you can achieve atomicity and isolation, but only by intimately reasoning about your code by understanding the way they are implemented. And if you care about performance, you are also going to need to think about hardware esoterica such as CMPXCHG, spin waiting, cache contention, optimistic techniques with version counters and memory models, ABA, and so on.
The contrast is stark. Atomic-block-style transactions provide automatic serializability of whole regions of code, no matter what that code does, and the TM infrastructure does the rest, choosing between: optimistic, pessimistic, coarse, fine, etc. The linearization point of a transaction is clear: the end of the atomic block. TM can even adjust strategies based on the surrounding environment: hardware, dynamic program behavior, etc. (“Policy”.) In comparison to locks, TM is an order of magnitude simpler. There have even been studies whose conclusions support this assertion.
(Transactions unfortunately do not address one other issue, which turns out to be the most fundamental of all: sharing. Indeed, TM is insufficient – indeed, even dangerous – on its own is because it makes it very easy to share data and access it from multiple threads; first class isolation is far more important to achieve than faux isolation. This is perhaps one major difference between client and server transactions. Most server sessions are naturally isolated, and so transactions only interfere around the edges. I’m showing my cards too early, and will return to this point much, much later in this essay.)
TM also has the attractive quality of automatic rollback of partial state updates. (How did I get this far without discussing rollback?) Concurrency aside, this avoids needing to write backout code to run in the face unhandled exceptions. In retrospect this capability alone is almost enough to justify TM in limited quantities. Reams of code “out there” contain brittle, untested, and, therefore, incorrect error handling code. We have seen such code lead to problems ranging in severity: reliability issues leading to data loss, security exploits, etc. Were we to replace all those try/catch/rethrow blocks of code with transactions, we could do away with this error prone spaghetti. We’d also eliminate try/filter exploits thanks to Windows/Win32 2-pass SEH. Sometimes I wish we focused on this simple step forward, forgot about concurrency-safety, and baby stepped our way forward. Likely it wouldn’t have been enough, but I still wonder to this day.
We also toyed with the ability to replace reliability-oriented CER blocks with transactions. As you go through a transaction, there is a log of forward progress and how to undo it. So no matter the kind of failure, including OOM, you can rollback the partial state updates with zero allocation required.
At some point we began describing an ‘atomic’ block as though the program used a single global lock for all its concurrency operations. This would be grossly inefficient, of course, and fails to capture the precise isolation and rollback properties, but nevertheless conveys the basic idea. It also, as an aside, foreshadows a few of the difficult problems that lie ahead, namely strong vs. weak atomicity. Even though there is only one, if you forget to hold this one global lock while accessing shared data, you’ve still got a data race on your hands. This model won’t save you. We will return to this later on.
Tough Decisions: Life as a Starving Artist
We faced some programming model decisions requiring artistic license early on.
One that we quickly decided was whether to automatically roll back a transaction in response to an unhandled exception thrown from within. Such as with this code:
atomic {
x++;
if (p)
throw new Exception(“Whoops”);
}
If p evaluates to true, and hence an unhandled exception thrown, should that x++ be rolled back?
Most on the team said “Yes” as a gut reaction, whereas some argued we should require the programmer to catch-and-rollback by hand. We settled on the automatic approach because it seemed to do what you would expect in all the cases we looked at. Your transaction failed to complete normally and consistently. We also debated whether to support a unilateral “Transaction.Abort()” capability; while we agreed a “Transaction.Commit()” would be silly – the only way to commit a transaction being to reach its end non-exceptionally – the jury remained split on unilateral abort. We eventually found that, particularly when nesting is involved, the ability to detect a dire problem with the universe and bail unilaterally can be useful.
And we also hit some tough snags early on. Some were trivial, like what happens when an exception is thrown out of an atomic block. Of course that exception was likely constructed within the atomic block (‘throw new SomeException()’ being the most common form of ‘throw’), so we decided we probably need to smuggle at least some of that exception’s state out of the transaction. Like its stack trace. And perhaps its message string. I wrote the initial incarnation of the CLR exception subsystem support, and stopped at shallow serialization across the boundary. But this was a slippery slope, and eventually the general problem was seen, leading to more generalized nesting models (which I shall describe briefly below). Another snag, which was quite non-trivial, was the impact to the debugging experience. Depending on various implementation choices – like in-place versus buffered writes – you may need to teach the debugger about TM intrinsically. And some of the challenges were fundamental to building a first class TM implementation. Clearly the GC needed to know about TM and its logging, because it needs to keep both the “before” and “after” state of the transaction alive, in case it needed to roll back. The JIT compiler was very quickly splayed open and on the surgery table. And so on.
Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy.
So we knocked down many early snags, and kept plowing forward, eagerly and excitedly. None of these challenges were insurmountable. We remained hopeful and happy (perhaps even blissful) to continue exploring the space of possible solutions. More irksome snags lurked right around the corner, however. And little did we know that some decisions we were about to make would subject us to some of the biggest such snags. TM’s greatest feature – slap an atomic around a block of code and it just gets better – would turn out to be its greatest challenge… but alas, I am again jumping ahead; more on that later.
Turtles, but How Far Down? Or, Bounded vs. Unbounded Transactions
Not all transactions are equal. There is a broad spectrum of TMs, ranging from those that are bounded to updating, say, 4 memory locations or cache lines, to those that are entirely unbounded. Indeed TM blurs together with other hardware-accelerated synchronization techniques, like speculative lock elision (SLE). The more constrained TM models are often hardware-hybrids, and the limitations imposed are typically due to physical hardware constraints. Models can be pulled along other axes, however, such as whether memory locations must be tagged in order to be used in a transaction or not, etc. Haskell requires this tagging (via TVars) so that side-effects are evident in the type system as with any other kind of monad.
We quickly settled on unbounded transactions. Everything else looked like multi-word CAS and, although we knew multi-word CAS would be immensely useful for developing new lock-free algorithms, our aim was to build something radically new and with broader appeal. If we ended up with a hardware-hybrid, we would expect the software to pick up the slack; you’d get nice acceleration within the hardware constraints, and then “fall off the silent cliff” to software emulation thereafter. Thus the unbounded approach was chosen.
In hindsight, this was a critical decision that had far-reaching implications. And to be honest, I now frequently doubt that it was the right call. We had our hearts in the right places, and the entire industry was trekking down the same path at the same time (with the notable exception of Haskell). But with the wisdom of age and hindsight, I do believe limited forms of TM could be wildly successful at particular tasks and yet would have avoided many of the biggest challenges with unbounded TM.
And believe me: many such challenges arose in the ensuing months.
An example of one challenge that didn’t threaten the model of TM per se, but sure did make our lives more difficult, is the compilation strategy we were forced to adopt. Transactions cost something. To transact a read or write entails a non-trivial amount of extra work; we spent a lot of time optimizing away redundant work, and developing new optimizations that reduced the overhead of TM. But at the end of the day, the cost is not zero – and in fact, the common case is far from it. Imagine you have an unbounded transaction model and are faced with compiling a particular method from MSIL to native code. A simple separate-module -based compiler (i.e. not whole-program) will not necessarily know whether this method will get called from a transaction, or from non-transactional code, such that in the worst case the method must be prepared for transactional access. There are a variety of techniques to use to produce code that supports both: the two extremes are (1) cloning, or (2) sharing w/ conditional dynamic checking. Neither extreme is particularly attractive, and this choice represents a classic space-time tradeoff that entails finding a reasonable middle ground. A JIT compiler can dynamically produce the version that is needed at the moment, but offline compilers – like the CLR’s NGEN – do not have this luxury. And within Microsoft at least, and among shrink-wrap ISVs, offline compilation is of greater importance than JIT compilation. For better or for worse.
The model of unbounded transactions is the hard part. You surround any arbitrary code with ‘atomic { … }’ and everything just works. It sounds beautiful. But just think about what can happen within a transaction: memory operations, calls to other methods, P/Invokes to Win32, COM Interop, allocation of finalizable objects, I/O calls (disk, network, databases, console, …), GUI operations, lock acquisitions, CAS operations, …, the list goes on and on. Versus bounded transactions, where we could say something like: if you do more than N things, the transaction will fail to run – deterministically.
Unbounded really was the golden nugget. But we should not be shy about what this decision implies.
Implementing the Idea
This leads me to a brief tangent on implementation. Given that we didn’t implement TM with a single global lock, as the naïve mental model above suggests, you may wonder how we actually did do it. Three main approaches were seriously considered:
- IL rewriting. Use a tool that passes over the IL post-compilation to inject transaction calls.
- Hook the (JIT) compiler. The runtime itself would intrinsically know how to inject such calls.
- Library-based. All transactional operations would be explicit calls into the TM infrastructure.
Approaches #1 and #2 would look similar, but the latter would be quite different. Instead of:
atomic {
x++;
}
Or:
Atomic.Run(() => {
x++;
});
You might say something like:
Atomic.Run(() => {
Atomic.Write(Atomic.Read(ref x) + 1);
});
With enough language work, we could have tried to desugar the latter into the former, but when you start crossing method boundaries, everything gets more complicated. (Do you create transactional clones of every method, and rewrite calls from ordinary methods to the transactional clone? This is easy to do with a rewriter or compiler, but quite difficult with a pure library approach.) We also knew we’d need to do some very sophisticated compiler optimizations to get TM’s performance to the point of acceptable. So we chose approach #2 for our “real” prototype, and never looked back.
After this architectural approach was decided, a vast array of interesting implementation choices remained.
We moved on to building the primitive library with all the TM APIs that the JIT would introduce calls into. We quickly settled an approach much like Harris’s (and, at the time, pretty much the industry/research standard): optimistic reads, in-place pessimistic writes, and automatic retry. That means reads do not acquire locks of any sort, and instead, once the end of the transaction has been reached, all reads are validated; if any locations read have been modified concurrently (or an uncommitted value was read), the whole transaction is thrown away and reattempted from the start. Writes work like locks. This approach makes reads cheap: a single read consists of reading the value, and a version number whose address is at a statically known offset. No interlockeds. This is great since reads typically far outnumber writes. Down the line, we explored adding more sophisticated policy than this, which I will detail in brief below.
So the compiler would inject hooks for the above code like so:
while (true) {
TX tx = new TX();
try {
// x++;
tx.OpenReadOptimistic(ref x);
int tmp = x;
tx.OpenWritePessimistic(ref x);
x = (tmp + 1);
if (!tx.Validate())
continue;
tx.Commit();
}
catch {
tx.Rollback();
throw;
}
}
Notice there are some obvious overheads in here:
- The atomic block becomes a loop (to support automated retry).
- A new transaction must be allocated and likely placed in TLS (if methods are called).
- A try/catch block is used to initiate rollback on unhandled exceptions.
- Each unique location read in a block requires at least one call to OpenReadOptimistic.
- Each unique location written requires at least one call to OpenWritePessimistic.
- Each location read must be validated (at Validate), and finally the transaction is committed (at Commit).
Much of the work in the compiler was meant to reduce these overheads. For example, if the same location is read multiple times, there’s no need to call OpenReadOptimistic more than once. If the compiler can statically detect this, it may elide some of the calls. If the same location is read and then written – as in the above example – only the write lock must be acquired. If no methods are called, the transaction object can be enregistered, and we needn’t add it to TLS so long as the exception trap code knows how to move it from register to TLS on demand. Et cetera.
There are other overheads that are not so obvious. Optimistic reads mandate that there is a version number for each location somewhere, and pessimistic writes mandate that there is a lock for each location somewhere.
A straightforward technique is to use a hashing scheme to associate locations with this auxiliary data: each address is hashed to index into a table of version numbers and locks. This leads to false sharing, of course, but reduces space overhead and makes lookup fast. Unfortunately, in a garbage collected environment, addresses are not stable and therefore hashing becomes complicated. You can use object hash codes for this purpose, but .NET hash codes are overridable; and generating them is not nearly as cheap as using the memory location’s address, which by definition is already in-hand. Other alternatives of course exist. You can associate version numbers and locks with the objects themselves, just like monitors and object headers/sync-blocks in the CLR: this provides object-granularity locking. Ahh, the age old tension of fine vs. coarse grained locking comes up again.
We eventually realized we’d want both optimistic and pessimistic reads, the latter of which worked a lot like reader/writer locks. We crammed all these into a clever little word-sized data structure which worked a lot like Vista’s SRWL data structure. Except that it also contained a version number.
It was always surprising to me what strange things in the runtime we bumped up against. We realized a nice GC optimization: instead of keeping strong references to all intermediary states in a transaction log, we could keep weak references to all but the “before” and “after” state. This is important when transacting synthetic situations like this:
static BigHonkinFoo s_f;
…
atomic {
for (int i = 0; i < 1000000; i++)
s_f = new BigHonkinFoo();
}
Of course you wouldn’t write that code exactly. But there’s no need to keep alive all but the s_f that existed prior to entering the atomic block and the current one at any given time. But this leads to particularly hairy finalization issues. If a finalizable object is allocated within a transaction (say BigHonkinFoo), and is then reclaimed, its Finalize() method will be scheduled to run on a separate thread. Yet the transaction log may contain references to it. Thus there is a race between the transaction’s final outcome and the invocation of the finalizer. We came up with a clever solution for this, but there were countless other clever solutions for various things not worth diving too deep into.
Hacking is fun. However, it was not going to be what made or broke TM as a model.
Disillusionment Part I: the I/O Problem
It wasn’t long before we realized another sizeable, and more fundamental, challenge with unbounded transactions. Finalizers touched on this. What do we do with atomic blocks that do not simply consist of pure memory reads and writes? (In other words, the majority of blocks of code written today.) This was not just a pesky question of how to compile a piece of code, but rather struck right at the heart of the TM model.
You already saw the OpenReadOptimistic, OpenWritePessimistic, Validate, Commit, and Rollback pseudo-TM infrastructure calls, each of which operated on memory locations. But what about a read or write from a single block or entire file on the filesystem? Or output to the console? Or an entry in the Event Log? What about a web service call across the LAN? Allocation of some native memory? And so on. Ordinarily these kinds of operations will be composed with other memory operations, with some interesting invariant relationship holding between the disparate states. A transaction comprised of a mixture still ought to remain atomic and isolated.
The answer seemed clear. At least in theory. The transaction literature, including Reuter and Gray’s classic, had a simple answer for such things: on-commit and on-rollback actions, to perform or compensate the logical action being performed, respectively. (Around this same time, the Haskell folks were creating just this capability in their STM, where ‘onCommit’ and ‘onRollback’ would take arbitrary lambdas to execute the said operation at the proper time.) Because we were working primarily in .NET – with a side project targeting C++ -- we decided to use the new System.Transactions technology in 2.0 to hook into inherently transactional resources, like transacted NTFS, registry, and, of course, databases.
(Digging through my blog, I found this article written back in June 2006 about building a volatile resource manager for memory allocation/free operations, just as an example.)
This worked, though we were quite obviously swimming upstream. Numerous challenges confronted us.
A significant problem was that not all operations are inherently transactional, so in many cases we were faced with the need to add faux transactions on top of existing non-transactional services. (Already-transactional services were easy, like databases. Except that mixing fine-grain TM transactions with distributed DTC transactions makes my skin crawl.) For example, how would you undo a write to the console? Well, you can’t, really. So we decided maybe the right default for Console.WriteLine was to use an on-commit action to perform the actual write only once the transaction had committed.
But in even thinking this thought, we realized we were standing on shaky ground. What if the WriteLine was followed by something like a ReadLine, for example, where the program was meant to wait for the user to enter something into the console (likely in response to the prompt output by WriteLine)? (This example is a toy, of course, but represents a more fundamental pattern common in networked programs.) The basic problem was immediately clear. Adding isolation to an existing non-isolated operation is not always behavior-preserving, particularly when I/O is involved. Sometimes it is necessary to step outside of the isolation that would otherwise get poured on top by a simple transactional model.
This particular problem isn’t specific to traditional I/O per se.
Foreign function interface calls through.NET’s P/Invoke suffer from like problems. A call to CreateEvent may be compensatable (via an on-rollback action) with a call to CloseHandle. But this is flawed. Once that event’s HANDLE is requested, and/or it is passed to other Win32 APIs like MsgWaitForMultipleObjects, then the isolation of the faux transaction is broken, and real state must be provided to the Win32 APIs. And if another thread were to look up that HANDLE – perhaps through a name given to it in the call to CreateEvent – it may be able to see and interact with that event before the enclosing transaction has been committed. The abstraction leaks. And even if the abstraction is perfect, it is obvious there’s quite a bit of work to be had in order to transact all the touch points between .NET and Win32, of which there are many. And I mean many.
Other issues wait just around the corner. For example, how would you treat a lock block that was called from within a transaction? (You might say “that’s just not supported”, but when adding TM to an existing, large ecosystem of software, you’ve got to draw the compatibility line somewhere. If you draw it too carelessly, large swaths of existing software will not work; and in this case, that often meant that we claimed to provide unbounded transactions, and yet we would be placing bounds on them such that a lot of existing software could not be composed with transactions. Not good.) A seemingly straightforward answer is to treat a lock block like an atomic block. So if you encounter:
atomic {
lock (obj) { … }
}
it is logically transformed into:
atomic {
atomic { … }
}
On the face of it, this looks okay. (Forget problems like freeform use of Monitor.Enter/Exit for now.) We’re strengthening the atomicity and isolation, so what could go wrong? Well, it turns out that examples like this can also suffer from the “too much isolation” problem. Adding transactions to a lock-block extends the lifetime of the isolation of that particular block’s effects, possibly leading to lack of forward progress. In fact, you don’t need locks to illustrate the problem. Imagine a simple lock-free algorithm that communicates between threads using shared variables:
volatile int flag = 0;
…
flag = 1; while (flag != 1) ;
while (flag == 1) ; flag = 2;
If you invoke this code from within a transaction (on each thread), you’re apt to lead to deadlock. Both transactions’ effects will be isolated from the others’, whereas we are quite obviously intending to publish the updates to the flag variable immediately.
Anyway, the whole lock thing is a bit of a digression. The simple fact is that very little .NET code would actually run inside an atomic block but for things like collections and pure computations due to the I/O problem. You can develop one-off solutions for each problem that arises – and indeed we did so for many of them – and even hang those solutions underneath one general framework – like System.Transactions – but you cannot help but eventually become overwhelmed by the totality of the situation. The team experimented with static checking to turn these dynamic failures into static ones, but this only marginally improved matters.
I could go on and on about the I/O problem, its various incarnations, and what we did about it. Instead I will sum it up: this problem was, and still is, the “elephant in the room” threatening unbounded TM’s broader adoption.
The question ultimately boils down to this: is the world going to be transactional, or is it not?
Whether unbounded transactions foist unto the world will succeed, I think, depends intrinsically on the answer to this question. It sure looked like the answer was going to be “Yes” back when transactional NTFS and registry was added to Vista. But the momentum appears to have slowed dramatically.
Nesting
Let’s get back to some fun, less depressing material. There are more surprises lurking ahead.
I already mentioned a great virtue of transactions is their ability to nest. But I neglected to say how this works. And in fact when we began, we only recognized one form of nesting. You’re in one atomic block and then enter into another one. What happens if that inner transaction commits or rolls back, before the fate of the outer transaction is known? Intuition guided us to the following answer:
- If the inner transaction rolls back, the outer transaction does not necessarily do so. However, no matter what the outer transaction does, the effects of the inner will not be seen.
- If the inner transaction commits, the effects remain isolated in the outer transaction. It “commits into” the outer transaction, we took to saying. Only if the outer transaction subsequently commits will the inner’s effects be visible; if it rolls back, they are not.
For example, consider this code:
void f() { void g() {
atomic { // Tx0 atomic { // Tx1
x++; y++;
try { if (p1)
g(); throw new BarException();
} catch { }
if (p0) }
throw; }
}
if (p2)
throw new FooException();
}
}
Imagine x = y = 0 at the start, and we invoke f. Many outcomes are possible.
- If p1 is true, g will throw an exception, aborting Tx1’s write to y. There are then two possibilities. (1)If p0 is true, the exception is repropagated and Tx0 will also abort, rolling back its write to x; this leaves x == y == 0. (2) If p0 is false, the exception is swallowed, and Tx0 proceeds to committing its write to x; this leaves x == 1, whereas y == 1.
- If p1 is false, on the other hand, g will not throw anything. Tx1 will commit its write to y “into” the outer transaction Tx0. One of two outcomes will now occur depending on the value of p2. (1) If p2 is true, an exception is thrown out of f, and Tx0 rolls back both the inner transaction Tx1’s effects and its own, leaving x == y == 0. (2) Else, f completes ordinarily, and Tx0 commits both Tx1’s and its own effects, leading to x == y == 1.
We expected most peoples’ intuition to match this behavior.
The canonical working example was a BankAccount class:
class BankAccount {
decimal m_balance;
public void Deposit(decimal delta) {
atomic { m_balance += delta; }
}
public static void Transfer(
BankAccount a, BankAccount b, decimal delta) {
atomic {
a.Deposit(-delta);
b.Deposit(delta);
}
}
}
This was an illustrative and beautiful example. It made beautiful slide-ware. We are composing the Deposit operations of two separate bank accounts into a single Transfer method. Of course doing the a.Deposit(-delta) and b.Deposit(delta) must be made atomic, else a failure could either lead to missing money, and/or someone could witness the world with the money in transit (and nowhere except for one a thread’s stack) rather than having been transferred atomically. And building the same thing with locks is frustratingly difficult: using fine-grained per-account locks can lead to deadlock very quickly.
Intuitively we walked down many variants of this mode of nesting. We reacquainted ourselves with Moss’s great dissertation on the topic, and remembered this intuitive nesting mode as closed nested transactions. And we shortly recognized the need for another mode: open nested transactions.
To motivate the need for open nesting, imagine we’ve got a hashtable whose physical storage is independent from its logical storage. Resizing the table of buckets, for example, has little to do with whether a particular {key, value} pair exists within those buckets. The resizing operation, in fact, is logically idempotent and isolated: the same set of keys will exist within the table both before and after such an operation. So we can actually commit the physical effects of such an operation eagerly. With a naïve TM implementation, two independent keys hashing to the same bucket will conflict, and the reads and writes for such operations will live as long as the enclosing user-level transactions. Instead, we can serialize logical operations with respect to one another at a “higher level” than physically independent operations do, leading to greater concurrency. Two transactions will only conflict in long-running transactions if they truly operate on the same keys, rather than just happening to hash to the same bucket.
Open nesting forced us to contemplate the sharing of state between outer and inner transactions more deliberately, and gave us some troubles syntactically. We had wanted to say:
atomic { // ordinary closed nesting.
Foo f = new Foo();
atomic(open) { /// open nesting.
… f? …
}
}
But is it really legal for the inner transaction here to access the ‘f’, which has been constructed and is presumably uncommitted in the outer transaction? With closed nested transactions there is lock compatibility between the outer and inner transactions. An inner closed nested transaction can of course read a memory location write-locked by the outer transaction, for example. However, the same must not true of open nesting, because an open nested transaction commits “to the world” rather than into its outer transaction. Allowing it to read and then potentially publish uncommitted state would violate serializability. It’s possible that the inner open nested transaction will commit, whereas the outer will roll back. (The reverse situation is equally problematic.) And yet it’s darn useful to pass state from an outer to an inner transaction – and indeed, often impossible to do anything otherwise – yet what if the key itself were a complicated object graph rather than value, and the key bleeds across transaction boundaries?
Many issues like this arose. Our straightforward answer was that only pass-by-value worked across such a boundary. I don’t think we ever found nirvana here.
We developed other transaction modes also.
As we added data parallel operations within a nested transaction, we realized that we’d need something a lot like closed nesting but with special accommodation for intra-transaction parallelism. This led us to parallel nested transactions, enabling lock sharing from a parent to its many data parallel children. These children could not communicate with one another other than to “commit into” the parent, and subsequently reforking, thereby ensuring non-interference between them. Of course children could share read-locks amongst one another, just not write locks.
And we continued to reject the temptation of adding weakened serializability modes a la relational databases (unrepeatable reads, etc). Although we expected this to arise out of necessity with time, it never did; the various nesting modes we provided seemed to satisfy the typical needs.
A Better Condition Variable
Here’s a brief aside on one of TM’s bonus features.
Some TM variants also provide for “condition variable”-like facilities for coordination among threads. I think Haskell was the first such TM to provide a ‘retry’ and ‘orElse’ capability. When a ‘retry’ is encountered, the current transaction is rolled back, and restarted once the condition being sought becomes true. How does the TM subsystem know when that might be? This is an implementation detail, but one obvious choice is to monitor the reads that occurred leading up to the ‘retry’ – those involved in the evaluation of the predicate – and once any of them changes, to reschedule that transaction to run. Of course, it will reevaluate the predicate and, if it has become false, the transaction will ‘retry’ again.
A simple blocking queue could be written this way. For example:
object TakeOne()
{
atomic {
if (Count == 0)
retry;
return Pop();
}
}
If, upon entering the atomic block, Count is witnessed as being zero, we issue a retry. The transaction subsystem notices we read Count with a particular version number, and then blocks the current transaction until Count’s associated version number changes. The transaction is then rescheduled, and races to read Count once again. After Count is seen as non-zero, the Pop is attempted. The Pop, of course, may fail because of a race – i.e. we read Count optimistically without blocking out writers – but the usual transaction automatic-reattempt logic will kick in to mask the race in that case.
The ‘orElse’ feature is a bit less obvious, though still rather useful. It enables choice among multiple transactions, each of which may end up issuing a ‘retry’. I don’t think I’ve seen it in any TMs except for ours and Haskell’s.
To illustrate, imagine we’ve got 3 blocking queues like the one above. Now imagine we’d like to take from the first of those three that becomes non-empty. ‘orElse’ makes this simple:
BlockingQueue bq1 = …, bq2 = …, bq3 = …;
atomic {
object obj =
orElse {
bq1.TakeOne(),
bq2.TakeOne(),
bq3.TakeOne()
};
}
While ‘orElse’ is perhaps an optional feature, you simply can’t write certain kinds of multithreaded algorithms without ‘retry’. Anything that requires cross-thread communication would need to use spin variables.
Deliberate Plans of Action: Policy
I waved my hands a bit above perhaps without you even knowing it. When I talk about optimistic, pessimistic, and automatic retry, I am baking in a whole lot of policy. It turns out there is a wide array of techniques. The simplest question we faced early on was, when an optimistic read fails to validate at the end of a transaction, when should we reattempt execution of that transaction?
The naïve answer is “immediately”. But obviously that would lead to livelock under some conditions. A more reasonable answer is “spin for N cycles and then retry”. But this too can lead to livelock. A better answer is to either choose some random strategy, or to make an intelligent adaptive choice. We experimented with many such variants, including random backoff, sophisticated waiting and signaling based on the memory locations in question, among others. We even played games like giving transactions karma points for cooperatively acquiescing to other competing transactions, and allowing those transactions with the most karma points to make more forward progress before interrupting them.
A few good papers supplied useful (and entertaining) reading material on the topic, but to be honest, nobody had a good answer at the time. Thankfully these are all implementation details. So we were free to experiment.
Deadlock breaking also requires policy. Thankfully we can actually roll back the effects of transactions engaged in a deadly embrace with TM, so we merely need to know how often to run the deadlock detection algorithm. There was a similar problem when deciding to back off outer layers of nesting, and in fact this becomes more complicated when deadlocks are involved. Imagine:
atomic { atomic {
x++; y++;
atomic { atomic {
y++; x++;
} }
} }
This deadlock-prone example is tricky because rolling back the inner-most transactions won’t be sufficient to break the deadlock that may occur. Instead the TM policy manager needs to detect that multiple levels of nesting are involved and must be blown away in order to unstick forward progress.
Another variant that went beyond deciding when to favor one transaction over another was to upgrade to pessimistic locking if optimistic let us down. The whole justification behind optimistic is that, …well, we’re optimistic that conflicts won’t happen. So it seems reasonable that, if they do occur, we fall back to something more, …well, pessimistic. There is a dial here too. Perhaps you only want to fall back to pessimistic after failing optimistically N times in a row, where N > 1. As I mentioned above, our single-word lock associated with each object supported both locking and versioning cheaply.
Disillusionment Part II: Weak or Strong Atomicity?
All along, we had this problem nipping at our heels. What happens if code accesses the same memory locations from inside and outside a transaction? We certainly expected this to happen over the life of a program: state surely transitions from public and shared among threads to private to a single thread regularly. But if some location were to be accessed transactionally and non-transactionally concurrently, at once, we’d (presumably) have a real mess on our hands. A supposedly atomic, isolated, etc. transaction would no longer be protected from the evils of racey code.
For example:
atomic { // Tx0 x++; // No-Tx
x++;
}
Can we make any statements about the value of x after Tx0 commits (or rolls back)? Not really. It depends on the way the particular TM being used has been implemented. An in-place model that rolls back could not only roll back Tx0’s but also the unprotected x++’s write. And so on.
On one hand, this code is racey. So you could explain away the undefined behavior as being a race condition. On the other hand, it was also troublesome. All those problems with locks begin cropping up all over the place. It would have been ideal if we could notify developers that they made a mistake. Then we could have made the assertion that data races are simply not possible with TM.
(Except for consistency-related ones, of course.)
At the same time, many hardware models were being explored. And of course in hardware you’ve got the physical addresses that variables resolve to and needn’t worry about aliasing. So it was actually possible to issue a fault if a location was used transactionally and non-transactionally at once. But given that our solution was software-based, we were uncomfortable betting the farm on hardware support.
Another approach was static analysis. We could require transactional locations to be tagged, for example. This had the unfortunate consequence of making reusable data structures less, well, reusable. Collections for example presumably need to be usable from within and outside transactions alike. After-the-fact analysis could be applied without tagging, but false positives were common. We never really took a hard stance on this problem, but always assumed the combination of static analysis, tooling, and, perhaps someday, hardware detection would make this problem more diagnosable. But I think we generally resolved ourselves to the fact that our TM would suffer from weak atomicity problems.
We thought this was explainable. Sadly it led to something that surely was not.
Disillusionment Part III: the Privatization Problem
I still remember the day like it was yesterday. A regular weekly team meeting, to discuss our project’s status, future, hard problems, and the like. A summer intern on board from a university doing pioneering work in TM, sipping his coffee. Me, sipping my tea. Then that same intern’s casual statement pointing out an Earth-shattering flaw that would threaten the kind of TM we (and most of the industry at the time) were building. We had been staring at the problem for over a year without having seen it. It is these kinds of moments that frighten me and make me a believer in formal computer science.
Here it is in a nutshell:
bool itIsOwned = false;
MyObj x = new MyObj();
…
atomic { // Tx0 atomic { // Tx1
// Claim the state for my use: if (!itIsOwned)
itIsOwned = true; x.field += 42;
} }
int z = x.field;
...
The Tx0 transaction changes itIsOwned to true, and then commits. After it has committed, it proceeds to using whatever state was claimed (in this case an object referred to by variable x) outside of the purview of TM. Meanwhile, another transaction Tx1 has optimistically read itIsOwned as false, and has gone ahead to use x. An update in-place system will allow that transaction to freely change the state of x. Of course, it will roll back here, because isItOwned changed to true. But by then it is too late: the other thread using x outside of a transaction will see constantly changing state – torn reads even – and who knows what will happen from there. A known flaw in any weakly atomic, update in-place TM.
If this example appears contrived, it’s not. It shows up in many circumstances. The first one in which we noticed it was when one transaction removes a node from a linked list, while another transaction is traversing that same list. If the former thread believes it “owns” the removed element simply because it took it out of the list, someone’s going to be disappointed when its state continues to change.
This, we realized, is just part and parcel of an optimistic TM system that does in-place writes. I don’t know that we ever fully recovered from this blow. It was a tough pill to swallow. After that meeting, everything changed: a somber mood was present and I think we all needed a drink. Nevertheless we plowed forward.
We explored a number of alternatives. And so did the industry at large, because that intern in question published a paper on the problem. One obvious solution is to have a transaction that commits a change to a particular location wait until all transactions that have possibly read that location have completed – a technique we called quiescence. We experimented with this approach, but it was extraordinarily complicated, for obvious reasons.
We experimented with blends of pessimistic operations instead of optimistic, alternative commit protocols, like using a “commit ticket” approach that serializes transaction commits, each of which tended to sacrifice performance greatly. Eventually the team decided to do buffered writes instead of in-place writes, because any concurrent modifications in a transaction will simply not modify the actual memory being used outside of the transaction unless that transaction successfully commits.
This, however, led to still other problems, like the granular loss of atomicity problem. Depending on the granularity of your buffered writes – we chose object-level – you can end up with false sharing of memory locations between transactional and non-transactional code. Imagine you update two separate fields of an object from within and outside a transaction, respectively, concurrently. Is this legal? Perhaps not. The transaction may bundle state updates to the whole object, rather than just one field.
All these snags led to the realization that we direly needed a memory model for TM.
Disillusionment Part IV: Where is the Killer App?
Throughout all of this, we searched and searched for the killer TM app. It’s unfair to pin this on TM, because the industry as a whole still searches for a killer concurrency app. But as we uncovered more successes in the latter, I became less and less convinced that the killer concurrency apps we will see broadly deployed in the next 5 years needed TM. Most enjoyed natural isolation, like embarrassingly parallel image processing apps. If you had sharing, you were doing something wrong.
In Conclusion
I eventually shifted focus to enforcing coarse-grained isolation through message-passing, and fine-grained isolation through type system support a la Haskell’s state monad. This would help programmers to realize where they accidentally had sharing, I thought, rather than merely masking this sharing and making it all work (albeit inefficiently).
I took this path not because I thought TM had no place in the concurrency ecosystem. But rather because I believed it did have a place, but that several steps would be needed before getting there.
I suspected that, just like with Argus, you’d want transactions around the boundaries. And that you’d probably want something like open nesting for fine-grained scalable data structures, like shared caches. These are often choke points in a coarse-grained locking system, and often cannot be fully isolated, at least in the small. Ironically I am just now arriving there. In the system I work on I see these issues actually staring us in the face.
This is just my own personal view on TM. You may also be interested in reading the current STM.NET team’s views also, available on their MSDN blog.
For me the TM project was particularly enjoyable. And it was a great learning experience. I worked with some amazing people, and it was a privilege. You really had the sense that something big was right around the corner, and every day was a rush of enjoyment. Despite running as fast as we could, it seemed like we could just barely keep pace with the research community. Over time more and more researchers turned to TM, and I distinctly recall reading at least one new TM paper per week.
This was also the first time I realized that Microsoft, at its core, really does operate like a collection of many startups. Our TM work was a grassroots movement, and there was no official sponsorship for our effort at the start. It was just a group of people independently getting together to discuss how TM might fit into the direction the industry was headed. Eventually TM started showing up on slide decks in presentations to management, followed by dedicated TM reviews, and even a BillG review. I will never forget, a couple years after that review – during an overall concurrency review – Bill standing up at the whiteboard, drawing the code “atomic { … }” and asking something to the effect: “Why can’t you just use transactional memory for that?” I guess the idea stuck with him too.
Who knows. Maybe in 10 years, the world will be transactional after all.
 Sunday, November 01, 2009
Say you've got a Task<T>. Well, now what?
You know that eventually a T will become available, but until then you're out of luck. You could go ahead and be a naughty little devil by calling Wait on it -- blocking the current thread (eek!) -- or you could call ContinueWith on the task to get back a new Task<U>, representing the work you would do to create some new U object if only you presently had a T in hand. And then perhaps you will find yourself in the same situation for that U.
These are those dataflow graphs I mentioned in the previous blog post. Things of beauty.
To be more concrete about the situation I describe, imagine you've got the following IFoo interface:
interface IFoo
{
int Bar();
string Baz(int x);
}
Now, given a Task<IFoo>, you can't do anything related to an IFoo. And yet presumably that's why you've got the task in the first place: because you care about the IFoo. What if you ultimately want to invoke the Bar method, for example?
Task<IFoo> task = ...;
You can of course block the thread:
// Option A: block the thread.
int resultA = task.Result.Bar();
...
Or you can choose to program in a very clunky way:
// Option B: use dataflow.
Task<int> resultB = task.ContinueWith(t => t.Result.Bar());
But what if, instead, you could do something like this?
// Option C: magic.
Task<int> resultC = task.Bar();
Whoa, wait a minute. We're calling Bar() on a Task<IFoo>? Neat, but how can that be?
This is obviously a trick. All of the members of T are somehow being made available on the Task<T> object, so that they can be called before the task has actually been resolved to a concrete value. Of course, were we to allow this, what you get back to represent the result of such calls would need to be task objects too: hence we get back a Task<int> from the call on Bar(), instead of an int. This is similar to call streams in Barbara Liskov's Argus language (her primary focus immediately after CLU).
This kind of lifting from the inner type outward is much like what you get in languages that allow generic mixins. C# already has one semi-such type, though you may not realize it: Nullable<T> actually allows you to directly access interfaces implemented by T without needing to call Value on it. It's almost like Nullable<T> was defined as deriving from T itself which is clearly not actually possible (for numerous reasons, not the least significant of which is that it's a struct). Try it. This works because the type system treats Nullable<T> and T somewhat uniformly (though you'd be surprised by some dangers lurking within -- effectively Nullable<T> mustn't implement any interfaces *ever* otherwise a type hole would result). But I digress...
Unfortunately without deep language changes we can't get this to work the way we'd like. I have found numerous occasions where a general lifting capability in C# would be useful: Lazy<T> is but one example. That said, each time we run across an instance, it demands slightly different type system treatment, and it seems unlikely such a general facility would be as usable as the one off features.
Type systems aside, I am actually using a very dirty trick to make this work: I'm using the new System.Dynamic features in .NET 4.0 to do it all dynamically. You may love or hate this, depending on your stance on type systems. Being an ML guy, I'll let you figure out what I think. (Hint: gross hack!)
We can go further. (Although sadly I won't demonstrate how to do so in this blog post. I had wanted to go all the way, but need to get some actual language work done today, in addition to a little Riemann study, instead of having endless fun tinkering with Visual Studio 2010. Shucks.) Notice that Baz accepts an int as input. Well, what if all we've got is a Task<int>? We can of course also allow that to get passed in too:
Task<string> resultD = task.Baz(42); // Real input. Fine.
Task<int> arg = ...;
Task<string> resultE = task.Baz(arg); // A task as input! Cool!
But wait, there is more! It slices and dices too. The next trick is difficult -- if not impossible -- to do without far reaching language changes. But we could also even bridge the world of ordinary methods too, not just those that have been accessed by tunneling through a Task<T>. For example:
string f(int x) {...}
...
Task<int> task = ...;
Task<string> result = f(task);
Not to even mention:
Task<int> x = ...;
Task<int> y = ...;
Task<int> z = x + y;
This is deep. What we are saying is that anywhere a T is expected, we can supply a Task<T>. Of course once we've entered the world of tasks, we cannot escape until values actually begin resolving. So when we invoke the method f in this example, we of course get back a Task<string> for its result. Once we've stepped onto a turtle's back, well, it's turtles all the way down.
(Which reminds me of the well known tale:
A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: "What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise." The scientist gave a superior smile before replying, "What is the tortoise standing on?" "You're very clever, young man, very clever", said the old lady. "But it's turtles all the way down!"
Tasks are not greasy hamburgers after all, as I had claimed in the last post, but rather they are turtles.
I've wasted all of my energy speaking of turtle hamburgers drenched in asynchronous aioli, and have left only a little to go over the hacked up implementation of this idea. Sigh. Well, we had better get to it.)
In summary: we'll just rely on dynamic dispatch to do the lifting, thanks to the new .NET 4.0 DynamicObject class. This is wildly less efficient than a proper type system design would yield, not to mention the utter lack of static type checking. Of course a proper implementation that designed for this from Day One would also avoid the tremendous amount of object allocation that relying on the current Task<T> objects and ContinueWith overloads imply. But nevertheless, this approach will allow us to at least have a good ole' time and stimulate the creative side of the noggin.
First, I shall provide an extension method for getting a DynamicTask<T> -- the thing that actually derives from DynamicObject and implements the custom dynamic binding:
public static class DynamicTask
{
public static dynamic AsDynamic<T>(this Task<T> task)
{
return new DynamicTask<T>(task);
}
}
Notice that this changes our calling conventions ever so slightly. Namely:
// Option C: magic.
Task<int> resultC = task.AsDynamic().Bar();
The AsDynamic places the caller into the lifted context. As invocations are made, the results become real tasks, and not dynamic ones, such that to continue the calling will require many AsDynamic()s. This is a minor inconvenience and we could certainly automatically wrap the return values in DynamicTask<T> objects if we wanted to eliminate this problem, i.e. to make chaining less verbose.
Second, we must implement the DynamicTask<T> class. We will do a very simple translation. Given a member access expression 'x.m', where m is either a field or property of type U, we will morph this into the new expression 'x.Task.ContinueWith(v => v.Result.m)', which is of type Task<U>. Similarly, given a method invocation 'x.M(a1,...,aN)', whose return value is of type U, we will morph it into the new expression 'x.Task.ContinueWith(v => v.Result.M(a1,...,aN))', which is of type Task<U> (or just Task if U is the void type). To support the ability to pass a task argument where an actual one is expected would require packing the argument with the target into an array, and doing a ContinueWhenAll on it.
(Perhaps I will illustrate how to do these other tricks in a later post, but I'm tight for time right now. I'm only sketching the general idea. Even in what I show below, things will be incomplete, because topics such as getting exception propagation right when tasks begin failing are tricky. Ideally the whole dataflow chain will be "broken" by such an exception. Additionally, I've only implemented what was necessary to get a few interesting examples working. The binder, for example, certainly has a few loose ends. Blog reader beware.)
Here is the implementation of DynamicTask<T>:
public class DynamicTask<T> : DynamicObject
{
private Task<T> m_task;
public DynamicTask(Task<T> task)
{
if (task == null) {
throw new ArgumentNullException("task");
}
m_task = task;
}
public Task<T> Task {
get { return m_task; }
}
public override DynamicMetaObject GetMetaObject(Expression parameter) {
if (parameter == null) {
throw new Exception("parameter");
}
return new TaskLiftedObject(this, parameter);
}
class TaskLiftedObject : DynamicMetaObject
{
...
}
}
Simple. All of the dynamic magic resides in the implementation of TaskLiftedObject, which derives from the DynamicMetaObject class. It is constructed with an instance of the DynamicTask<T> along with the expression tree that can be used to dynamically load up an instance of that task. All of the dynamic features work with expression trees. For example, in response to an attempt to invoke a method M on a DynamicTask<T>, our binder will need to find the right method M on the underlying T, and then return an expression tree that does the ContinueWith and so forth.
Let's start cracking open TaskLiftedObject:
class TaskLiftedObject : DynamicMetaObject
{
private DynamicTask<T> m_task;
public TaskLiftedObject(DynamicTask<T> task, Expression expression) :
base(expression, BindingRestrictions.Empty, task)
{
m_task = task;
}
We will override two of DynamicMetaObject's functions. BindGetMember is called when a member is accessed (like a property or field), whereas BindInvokeMember is called when a method call is made. There are several other methods that a proper binder would need to override in order to make delegate dispatch and such work properly. But this suffices to get started:
public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
{
// We have a member access:
// x.m
//
// which must become:
// x.Task.ContinueWith(v => { v.Result.m; })
//
return new DynamicMetaObject(
MakeContinuationTask(Bind(binder.Name, -1), null),
BindingRestrictions.GetInstanceRestriction(Expression, Value),
Value
);
}
public override DynamicMetaObject BindInvokeMember(InvokeMemberBinder binder, DynamicMetaObject[] args)
{
// We have a call:
// x.Foo(a1,...,aN)
//
// which must become:
// x.Task.ContinueWith(v => { v.Result.Foo(a1,...,aN); })
//
Expression[] argsEx = new Expression[args.Length];
for (int i = 0; i < args.Length; i++) {
argsEx[i] = args[i].Expression;
}
return new DynamicMetaObject(
MakeContinuationTask(Bind(binder.Name, binder.CallInfo.ArgumentCount), argsEx),
BindingRestrictions.GetInstanceRestriction(Expression, Value),
Value
);
}
Clearly the workhorses here are Bind and MakeContinuationTask. Bind is responsible for performing dynamic lookup for a matching member on T that has the requested Name and, if a method call is being made, the proper number of parameters. For brevity, I've omitted anything to do with argument type checking, an obvious hole that we'd want to fix some day:
private static MemberInfo Bind(string name, int argCount)
{
// Lookup the target member on the T, rather than the (Dynamic)Task<T>.
return
(from m in typeof(T).GetMembers(BindingFlags.Instance | BindingFlags.Public)
where m.Name.Equals(name) &&
(argCount == -1 ?
!(m is MethodInfo) :
((MethodInfo)m).GetParameters().Length == argCount)
select m).
Single();
}
Nothing too interesting here either -- just a bit of hacky reflection code done with a fancy LINQ query. If anything other than exactly one method was found, the call to Single() will throw an exception. If you want to see what a "real" dynamic binder looks like, you won't find it here: check out VB's or IronPython's.
Now for the meat. The MakeContinuationTask method takes the target member that we've found dynamically via Bind, as well as an optional array of expression trees, each representing an argument being passed to the target method (and which will be null for property and field access), and manufactures the expression tree that represents the execution of the dynamic call itself:
private Expression MakeContinuationTask(MemberInfo target, Expression[] targetArgs)
{
var lambdaParam = Expression.Parameter(typeof(Task<T>), "v");
var lambdaParamResult = Expression.Property(lambdaParam, "Result");
Expression lambdaBody;
Type lambdaReturnType;
if (target is MethodInfo) {
lambdaBody = Expression.Call(lambdaParamResult, (MethodInfo)target, targetArgs);
lambdaReturnType = ((MethodInfo)target).ReturnParameter.ParameterType;
}
else if (target is PropertyInfo) {
lambdaBody = Expression.Property(lambdaParamResult, (PropertyInfo)target);
lambdaReturnType = ((PropertyInfo)target).PropertyType;
}
else if (target is FieldInfo) {
lambdaBody = Expression.Field(lambdaParamResult, (FieldInfo)target);
lambdaReturnType = ((FieldInfo)target).FieldType;
}
else {
throw new Exception("Unsupported dynamic invoke: " + target.GetType().Name);
}
return Expression.Call(
Expression.Property(
Expression.Convert(this.Expression, typeof(DynamicTask<T>)),
typeof(DynamicTask<T>).GetProperty("Task")
),
GetContinueWith(lambdaReturnType), // ContinueWith
new Expression[] {
// v => { v.Result.M(a0,...,aN) }
Expression.Lambda(lambdaBody, lambdaParam)
}
);
}
You should be able to convince yourself that this code generates the desired transformation described earlier. It uses a method to find the overload of Task<T>.ContinueWith that we want to bind against, and invokes that on the Task<T> contained within the DynamicTask<T> against which the dynamic call was made. It is rather unfortunate that the CLR does not allow the void type as a generic type argument, so we have to be a little bit inconsistent with our treatment of void returns, by choosing a different ContinueWith overload.
If the above reflection code was called hacky, the ContinueWith lookup is worse. It's very inefficient, not to mention fragile (because it depends on the current layout of Task<T>'s overloads, what with instantiating generic methods and the like). C'est la vie:
private static MethodInfo GetContinueWith(Type returnType)
{
// @TODO: caching to avoid expensive lookups each time.
if (returnType == typeof(void)) {
return typeof(Task<T>).GetMethod(
"ContinueWith",
new Type[] { typeof(Action<Task<T>>) }
);
}
else {
foreach (MethodInfo mif in typeof(Task<T>).GetMethods()) {
if (mif.Name == "ContinueWith" && mif.IsGenericMethodDefinition) {
MethodInfo mifOfT = mif.MakeGenericMethod(returnType);
ParameterInfo[] mifParams = mifOfT.GetParameters();
if (mifParams.Length == 1 &&
mifParams[0].ParameterType == typeof(Func<,>).MakeGenericType(typeof(Task<T>), returnType)) {
return mifOfT;
}
}
}
}
throw new Exception("Fatal error: ContinueWith overload not found");
}
}
And that's it. With that, we can get dynamic invocations on unresolved T's via Task<T> objects. Nifty.
I'm not saying any of this is a really good idea. Honestly, I'm not. Of course, there's a kernel of a good idea there and the systems we are working on take this kernel to its extreme. By providing a programming model that encourages deep chains of datafow to be expressed speculatively in a natural and familiar manner, greater degrees of latent parallelism can lie resident in an application waiting to be unlocked as more processors become available. Doing it for real requires impactful changes to the language, supporting infrastructure, and particularly tooling. Just imagine what it means to break into a debugger to inspect deep dataflow graphs that have been constructed by compiler magic underneath you. And the use of ContinueWith is a little lame, because of course the target of our call may be something that can be run speculatively too with first class pipleining, rather than completely delaying the invocation of it.
So we won't be seeing lifted tasks in .NET anytime soon. Writing up this blog post was merely an excuse to toy around with the new C# dynamic features and to have a little recreational time. And to generate excitement about what .NET 4.0 holds in store. I hope you have enjoyed it. Now back to reality.
 Saturday, October 31, 2009
Well, Visual Studio 2010 Beta 2 is out on the street. It contains plenty of neat new things to keep one busy for at least a rainy Saturday. I proved this today.
Of course, Parallel Extensions is in the box. .NET 4.0's Task and Task<T> abstractions are used to implement such things as PLINQ and Parallel.For loops, but of course they are great for representing asynchronous work too. The FromAsync adapters move you from the dark ages of IAsyncResult to the glitzy new space age of tasks.
Not only are tasks tastier than hamburgers, but they enable complex dataflow graphs of asynchronous work to unfold dynamically at runtime, thanks to the ContinueWith method. From a Task<T> you can get a Task<U> that was computed based on the T; ad infinitum. We like dataflow. It is the key to unlocking parallelism, or more accurately, boiling away all else except for dataflow is the key. But what about control flow, you might ask? We like it less. But you can do it, so long as you put in some work. F#'s async workflows make this sort of thing a tad easier, but the raw libraries in .NET 4.0 don't come with any sort of loops or conditional capabilities. Perhaps in the future they will. Nevertheless, in this post I shall demonstrate how to build a couple simple ones.
Not because the lack of them is going to cause unprecidented and unheard of horrors, but rather because in doing so we'll see some neat features of tasks.
The two methods I will illustrate in this post are:
public static class TaskControlFlow
{
public static Task For(int from, int to, Func<int, Task> body, int width)
public static Task While(Func<int, bool> condition, Func<int, Task> body, int width)
}
Notice that each body is given the iteration index and is expected to launch asynchronous work and return a Task. The parameters that these methods take are probably obvious. Well, except for the last one. The "width" indicates how many outstanding asynchronous bodies should be in flight at once. The Task returned by For and While won't be considered done until all iterations are done, and any exceptions will be propagated as you might hope. It would be pretty useless otherwise.
For example, we could write a while loop that does something very silly:
TaskControlFlow.While(
i => i < 100,
i => { return CreateTimerTask(250).ContinueWith(_ => Console.WriteLine(i)); },
4
).Wait();
This just prints returns a "timer task" that completes after 250ms and prints out the iteration to the console. We pass a width of 4, so only four tasks will be outstanding at any given time. Notice we call Wait at the end, since both For and While return tasks representing the in flight work. This could have instead been written using a For loop as follows:
TaskControlFlow.For(0, 100,
i => { return CreateTimerTask(250).ContinueWith(_ => Console.WriteLine(i)); },
4
).Wait();
The CreateTimerTask method, by the way, looks like this:
private static Task CreateTimerTask(int ms)
{
var tcs = new TaskCompletionSource<bool>();
new Timer(x => ((TaskCompletionSource<bool>)x).SetResult(true), tcs, ms, -1);
return tcs.Task;
}
As something more realistic, imagine we wanted to do something with a large number of files, and don't want to block a whole bunch of threads in the process. The following "simple" expression will count up all of the bytes for all of the files in a particular directory, without once blocking the thread -- well, except for the initial call to Directory.GetFiles:
string win = "c:\\...\\";
string[] files = Directory.GetFiles(win);
int total = 0;
TaskControlFlow.For(0, files.Length,
i => {
bool eof = false;
int offset = 0;
byte[] buff = new byte[4096];
FileStream fs = File.OpenRead(files[i]);
return TaskControlFlow.While(
j => !eof,
j => Task<int>.Factory.
FromAsync<byte[],int,int>(
fs.BeginRead, fs.EndRead, buff, offset, buff.Length,
null, TaskCreationOptions.None
).
ContinueWith(v => {
if (eof = v.Result < buff.Length) {
fs.Close();
}
offset += v.Result;
Interlocked.Add(ref total, v.Result);
}),
1
);
},
8
).Wait();
Console.WriteLine(total);
Pretty neat. We've somewhat arbitrarily chosen a width of 8 for this loop. And notice something very subtle but important here: we've chosen a width of 1 for the inner loop that plows through the bytes of a file. This is because we're sharing state, and it would not be safe to launch numerous iterations at once. The same byte[], eof variable, and so forth, would become corrupt. I will mention in passing that it's unfortunate that we've got that interlocked stuck in there to add to the total. Refactoring this so that we could just do a LINQ reduce over the whole thing would be nice. Indeed, it can be done.
We can do away with the For implementation very quickly. It is just implemented in terms of While:
public static Task For(int from, int to, Func<int, Task> body, int width)
{
return While(i => from + i < to, body, width);
}
And it turns out that the While implementation is not terribly complicated either. Here it is:
public static Task While(Func<int, bool> condition, Func<int, Task> body, int width)
{
var tcs = new TaskCompletionSource<bool>();
int currIx = -1; // Current shared index.
int currCount = width; // The number of outstanding tasks.
int canceled = 0; // 1 if at least one body was cancelled.
ConcurrentBag<Exception> exceptions = null; // A collection of exceptions, if any.
// Generate a continuation action: this fires for each body that completes.
Action<Task> fcont = null;
fcont = tsk => {
if (tsk.IsFaulted) {
// Accumulate exceptions.
LazyInitializer.EnsureInitialized(ref exceptions);
foreach (Exception inner in tsk.Exception.InnerExceptions) {
exceptions.Add(inner);
}
}
else if (tsk.IsCanceled) {
// Mark that cancellation has occurred.
canceled = 1;
}
else if (canceled == 0 && exceptions == null) {
// If no cancellations / exceptions are found, attempt to kick off more work.
int ix = Interlocked.Increment(ref currIx);
if (condition(ix)) {
// Generate a new body task, handling exceptions. Then make sure we
// tack on the continuation on that new task, so we can keep on going...
// If the condition yielded 'false', we'll simply fall through and try to finish.
Task btsk;
try {
btsk = body(ix);
}
catch (Exception ex) {
btsk = AlreadyFaulted(ex);
}
btsk.ContinueWith(fcont);
return;
}
}
// If this is the last task, signal completion.
if (Interlocked.Decrement(ref currCount) == 0) {
if (exceptions != null) {
tcs.SetException(exceptions);
}
else if (canceled == 1) {
tcs.SetCanceled();
}
else {
tcs.SetResult(true);
}
}
};
// Fire off the right number of starting tasks.
for (int i = 0; i < width; i++) {
AlreadyDone.ContinueWith(fcont);
}
return tcs.Task;
}
I've commented the code inline to illustrate what is going on. The only other part that isn't shown are the AlreadyDone and AlreadyFaulted members, which simply give Tasks that are already in a final state. This isn't strictly necessary, but come in handy in a number of situations:
internal static Task AlreadyDone;
static TaskControlFlow()
{
var tcs = new TaskCompletionSource<bool>();
tcs.SetResult(true);
AlreadyDone = tcs.Task;
}
private static Task AlreadyFaulted(Exception ex)
{
var tcs = new TaskCompletionSource<bool>();
tcs.SetException(ex);
return tcs.Task;
}
And that's it. I'm done for now. Hope you enjoyed it. I've got a few other posts in the works -- primarily the result of a day full of hacking (I got in the office at 7am this morning, and have been here ever since, 14 hours later) -- demonstrating how to do speculative asynchronous work for if/else branches. Finally, I also have a neat example that illustrates how to do deep dataflow-based speculation without having to wait for work to complete. This combines the new .NET 4.0 dynamic capabilities with parallelism, so I'm pretty excited to get it working and write about it.
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 27 | 28 | 29 | 30 | 1 | 2 | 3 | | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | 11 | 12 | 13 | 14 | 15 | 16 | 17 | | 18 | 19 | 20 | 21 | 22 | 23 | 24 | | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Browse by Category:
Notables:
|