I've been toying with some ideas for async operations lately, basically seeing if I can come up with an easier Mort-accessible API model for parallel execution. This is notoriously difficult for strict languages, where ordering matters. I can't say that I'm entirely happy with the result, and really think that languages have to create special syntax to make this simple. So I've been doing considerable research there, too. Interestingly, one of the architects on the CLR team, Jim Miller, did his PhD on a parallelization of the Scheme language. Never a short supply of smart people here at Microsoft!
Parallelization
It seems that tree-like structures (i.e. processing tree-like data or executing tree-like operations) have the most opportunity for parallelization. That is, of course, if each node in the tree depends only on descendants and not its ancestors, unlike (say) XML. If you're implicitly parallelizing this, each thread can take responsibility for traversing down the left-hand side of the tree, kicking off new threads to handle the right branches at each node just before visiting the left branch, and lastly joining to perform last minute computation before flowing data back up the tree. This is recursive, and could end up with quite a few worker threads.
Depending on the weighting of the tree, the number of threads grows exponentially, e.g. O((n-1)^2), where n = the tree depth. It's still interesting, though, because each thread is responsible for calculating the left branches and creating workers for its right branch, and you don't really waste much execution time (on a machine with a processor per thread and where creating, managing, and joining on threads were free, of course ;) ). Even with such costs, you could imagine weighting the distribution of parallelization such that you amortize the cost of threading in such a way so as to minimize the total execution time of the entire operation versus a single-threaded traversal.
It also occurs to me that, while more computationally expensive with O(n^2) execution time, factoring some algorithms this way facilitates parallelization. Could it be that in a few years time, we'll be intentionally writing O(n^2) algorithms over, say, O(n) to make them more conducive to concurrent execution? Yes, they are more expensive in terms of computation and space required, but could be less costly in terms of time. Who knows. One thing I am certain of: computational power and storage capacity are expanding and growing. Time is not. It seems like we can afford to waste the former to gain some savings in the latter.
Mort Friendly?
It's difficult to come up with real scenarios where Mort would need parallel execution. Well, that's not quite true. It's hard to see where Mort would *think* to use parallel execution. I'm fairly sure this is because people are trained to think in single-threaded terms, a mode which is admittedly difficult to get out of the habit of. I mean, to Mort: if he's got to calculate a value or perform some action, well--gosh darnit--it needs to be done before he can move on. We need to change this. Better mainstream language support is surely a way to start.
Lightweight Concurrent Tasks
I have become particularly fond of one toy sample I wrote. It's a lightweight way to execute concurrent tasks, without having to worry about threading, thread pools, wait operations, and the like. As you will see, there's nothing magic about it but for the abstraction. You just call Async.Start up front (which schedules work on the ThreadPool to calculate the value), store the returned AsyncValue, do some work in parallel, and finally Join on the AsyncValue when you need to use the result of its computation.
Since it's usually helpful to see a scenario before diving into the API, here's a demonstration of how this might work for a concurrent Fibonacci function. We all know that the recursive Fibonacci function is horribly wasteful. Not only is it O(n^2), but it results in a whole lot of duplicate calculation. There might be room for memoization of some sort here. Regardless:
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
AsyncValue<int> x2 = Async.Start<int, int>(fib, x - 2);
return fib(x - 1) + x2.Join();
}
}
This is an interesting example, actually. If threads were free, this would theoretically be faster than the single-threaded version of the same recursive procedure. It's even conceivable that, for sufficiently large x, and with a sufficiently large number of processors, and even with the wasted calculations, that this could be faster than the unrolled loop-based constant space version. Of course, this isn't true in today's world and with today's threading architecture, especially given the frequency with which we incur this overhead (one per non-leaf node in the tree).
The Plumbing
Anyhow, let's look at the async plumbing logic that enables this. It's very much like a thunk, except that the value is calculated in the background rather than "on-demand."
First, we define some static factory class (this could very well be on ThreadPool) with a Start operation:
public static class Async
{
public static AsyncValue<A> Start<A>(AsyncFunc<A> f)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f();
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
public static AsyncValue<A> Start<A, B>(AsyncFunc<A, B> f, B b)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(b);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
public static AsyncValue<A> Start<A, B, C>(AsyncFunc<A, B, C> f, B b, C c)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(b, c);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
}
Note that there are similar overloads for varying number of arguments. I've omitted the rest for brevity. Unfortunately, I can't think of a way to reuse this logic generically inside a core args-neutral method without resorting to reflection. This is painful, and IMHO is just a shortcoming in the expressiveness of C#. I'll return to this for more comments shortly. For now, if anybody has ideas, let me know.
Then, the AsyncValue holder class, which encapsulates the Join and wait logic:
public class AsyncValue<A> : IAsyncResult
{
// fields
internal Delegate asyncFunc;
internal bool isCompleted;
internal A returnValue;
internal ManualResetEvent asyncWaitHandle;
// ctors
public AsyncValue()
{
asyncWaitHandle = new ManualResetEvent(false);
}
// properties
public object AsyncState
{
get { return asyncFunc; }
}
public WaitHandle AsyncWaitHandle
{
get { return asyncWaitHandle; }
}
public bool CompletedSynchronously
{
get { return false; }
}
public bool IsCompleted
{
get { return isCompleted; }
}
public A Join()
{
while (!isCompleted)
asyncWaitHandle.WaitOne();
return returnValue;
}
}
Of course, we're also required to declare the delegate signatures we intend to use:
public delegate A AsyncFunc<A>();
public delegate A AsyncFunc<A, B>(B b);
public delegate A AsyncFunc<A, B, C>(B b, C c);
public delegate A AsyncFunc<A, B, C, D>(B b, C c, D d);
public delegate A AsyncFunc<A, B, C, D, E>(B b, C c, D d, E e);
public delegate A AsyncFunc<A, B, C, D, E, F>(B b, C c, D d, E e, F f);
public delegate A AsyncFunc<A, B, C, D, E, F, G>(B b, C c, D d, E e, F f, G g);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H>(B b, C c, D d, E e, F f, G g, H h);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H, I>(B b, C c, D d, E e, F f, G g, H h, I i);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H, I, J>(B b, C c, D d, E e, F f, G g, H h, I i, J j);
...And so on. ;)
Language Syntax?
Oh how I would love there to be C# language syntax for this. Here are two quick (not thought through) possibilities:
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
int! x2 = fib(x - 2);
return fib(x - 1) + x2;
}
}
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
int x2 = start fib(x - 2);
return fib(x - 1) + x2;
}
}
The first is a little less explicit, and I would imagine the second would cause typing problems. Although with some simple rules we could avoid the problem. Basically, before loading the the local for whatever reason it were needed, we'd have to force it. Just like a thunk. If we had a better idea of its use, we could avoid unnecessarily and prematurely forcing (which could block execution!). Regardless, it should be obvious that emitting IL which uses the Async stuff defined above for this syntax would be relatively straightforward.
Off-Topic: Generics Complaint
I'm constantly finding myself sick and tired of the massive paranoia of modern C++-derivitives. I guess this is entirely orthogonal, but it seems to plague the newer OO languages, e.g. C# and Java. The problem identified above is one great example. Why is it that I can't simplify life greatly with some code like this:
public static AsyncValue<A> Start(AsyncFunc<A, _> f, _ __)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(__);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
Functional languages have mastered the art of pattern matching, where some details can be safely ignored and passed through transparently. Haskell does this without sacrificing type safety at all. We need something similar for C#. But I digress.