| |
 Saturday, May 21, 2005
It's been a while since I've posted anything about my book.
I'm wrapping up my next chapter this weekend (I'm over half way done page-count-wise :P), and was interested in any opinions out there on the content. To reiterate: my goal is to cover CLR and FCL topics which haven't gotten a lot of love to date, with a focus on v2.0, while still painting a complete enough picture to be useful mostly on its own (with some previous experience with the technology of course). I think assemblies, loading internals, and how to deal with versioning problems are some that have been neglected (except for on Suzanne Cook's blog, of course).
Also, if you're interested in being a reviewer, drop me a line at joedu (at@at) bluebytesoftware (DOT.DOT) com. Please provide a brief bio of yourself so I know who's serious and who isn't.
Here's a rough layout. Comments welcome (uninteresting bullets, missed items, etc.):
Assemblies: Packaging & Deployment
- Units of Execution & Reuse
- A Look Inside Portable Executable (PE) Files
- Strong Naming
- Metadata, Metadata, Metadata
- Manifests & Modules
- Dynamic Assemblies
- Asssembly Loading
- Loading By Hand
- Inside the Bind, Map, Load Process
- Probing ("Fusion")
- Custom Assembly Binding
- Execution
- EXE Bootstrapping
- Other Hosts (IE, SQL Server "Yukon")
- Code Sharing
- AppDomain Isolation
- Deployment & Configuration
- XCopy
- The GAC
- ClickOnce
- Configuration
- Versioning
- AppCompat
- Rolling Forward
- Versioning Libraries
- Versioning Applications
- Servicing Deployed Apps
Structured exception handling and all of the challenges associated with it aren't new topics about which developers must worry. I was flipping through my copy of The C++ Programming Language (3rd Edition) this morning and ended up rediscovering Appendex E on C++ Standard-Library Exception Safety. Understanding what consistency and failure guarantees you intend to make with your library and then actually making them, knowing what to expect from a pre-written library and when and how to program defensively against it, and all such related topics were challenging with C++ and remain a challenge with C#. It's amazing how many of these things are directly transferrable between technologies.
He mentions in the book that all of the appendices are available on the web, so I did a quick search. Voila! Here it is [PDF].
Even if you're not a C++-er, you'll find a ton of great information in that article. Enjoy. I did (again).
 Saturday, May 14, 2005
What is this program's output and why?
using System;
public class NullableTest {
static void Main() { int? x = null; object y = x;
Console.WriteLine("Test #1: {0}", x == null); Console.WriteLine("Test #2: {0}", y == null); Console.WriteLine("Test #3: {0}", IsNull<int?>(x)); }
static bool IsNull<T>(T t) { return t == null; }
}
Is this intuitive, confusing, or <something else>?
 Friday, May 13, 2005
If you haven't been keeping an eye on Raymond and Rico's progress on the performance tuning of a Chinese/English dictionary, in particular comparisons between the unmanaged and managed version... Well... Go check it out!
I'm normally not one for mindless link propagation, but it's a fun and very interesting goings on.
 Thursday, May 12, 2005
Wow. Something bad happened to my blog over the last 24 hours. First, random Service Unavailable messages. Then, my blog started serving up Kanji. Rather than debug the problem I decided to wipe it out and upgrade to dasBlog 1.7.
I'd actually intended to do this weeks ago. I had also intended to redesign the site. The old one was pretty heavyweight, crowded, and the homepage itself is a couple 100k in size. But because of the quick fix nature, I didn't get a chance to do much right now. It's fairly plain, but I like the simplicity. I'll try to shnazz it up at some point.
Now I just need to fix the broken images problem. Methinks my hosting company is throttling my bandwidth, and they might cut down on images as the first tier of throttling. I should probably thank them, as it might prevent me from spiking even higher and paying extra per MB/mo. Traffic on this site has increased a lot over the past few months, and I suspect I'm going to have to pay the extra $$$ for the additional bandwidth.
 Wednesday, May 11, 2005
I was on an email thread with a customer earlier today, and the topic of fibers came up. More generally, the need to schedule lightweight tasks for simulated concurrent execution. The thread pool is fairly heavyweight, doesn’t offer much flexibility in the way of scheduling, and admittedly just doesn't quite cut it for many scenarios. We're looking hard at how best to improve the thread pool support in Orcas (that’s Whidbey + 1)--especially in the area of better control over the scheduling policy--so I'd love to get any feedback you might have.
This discussion got me thinking about how you can do some fancy shmancy hacks using C# 2.0 iterators to cook up a mutated form of simulated concurrent task execution.
Iterators & Coroutines
Coroutines enable coordination between multiple logical tasks all running in lock step in a manner quite similar to fibers. Tasks can run on the same physical thread, but operate as though they were concurrent units of execution, and mostly rely on good citizenship to avoid one unit starving another. A coroutine will execute for a small period of time, and finally yield a value of interest. Sometimes this value is just void/nil/etc, for example in cases where the coroutine is just running some side-effecting code and then yielding back to its caller to let it decide what the next step should be.
C# 2.0 ships with a form of coroutines. The iterator feature, described here, here and here, is an impressive piece of beautiful compiler hackery. It closure-transforms a block of code into a mini state machine—pushing locals into instance variables on the closure—so that an instance will " suspend" (or "yield") itself and can be "resumed" at well defined breakpoints. This is the traditional definition of a coroutine, although the implementation technique differs from many. There might be a few odd cases it doesn't support, but it's quite close. We can use these as a starting point.
Scheduling Tasks
If you're willing to trust that a coroutine will yield often enough, we can schedule a bunch of them so that they are able to run somewhat concurrently. Of course, this is "concurrent" in the sense that processes and threads on a single processor machine are concurrent. With enough coroutines that are doing interesting work, this starts to look attractive. Especially if the code inside of them is willing to yield just before doing a blocking operation, and generally abide by some courteous rules like not eating up too much time before yielding.
We would need a scheduler that maintains a queue of active coroutines, and some form of policy for letting them execute. Using a braindead simple algorithm, it could just walk the queue sequentially and execute, a form of round robin scheduling. At each step, it would dequeue the head coroutine, runs it until it yields, and then re-queue it and move to the next—unless it is done, in which case we just skip the re-queueing.
The fact that this is possible (and pretty simple) relies on a whole array of abstractions underneath us, all of which are easy to take for granted:
- The "data stack" and "activation frame" for the coroutine is automatically saved and restored for us by the C# compiler's iterator transforms, almost like a mini thread preemption routine. This is awesome. Obviously, it's not the same in reality (only conceptually), and the call stack responds as though it were an ordinary function call.
- The scheduler is running on a thread like any other piece of code which gets timesliced accordingly. We could have multiple pools of coroutines that are all executing concurrently, and in lockstep with each other.
- The OS is conceptually doing a very similar thing below us, but at an extremely finer grained level.
One of the major downsides to using coroutines is this fashion that you can't preempt, and thus your ability to fairly timeslice is limited. The scheduling algorithm could use a heuristic which took into account average time allotment per yield for any given coroutine, and throttle time accordingly. If you knew a certain coroutine took 3x longer than most others last time, for example, you might only permit it to run 1/3 as much next time. But there's nothing to prevent a single coroutine from blocking indefinitely or hogging time from other tasks. A simple round robin algorithm probably wouldn't suffice for most requirements.
A First Cut Implementation: Grains
I wrote a whole bunch of code to implement what I describe above, and it works surprisingly well. I tried to produce a nice interface, but it still feels a little rough in some areas. I'm not sure whether anybody will benefit from seeing my arbitrary code postings (comments please!... I seem to get very few comments on these types of posts), but I'll share some of it. If folks want to see more, I'll post the whole bucket of code. I was thinking this would make a nice “official“ whitepaper, too. I named these things grains—get it: threads, fibers, grains, … clever, eh?—but they're really just scheduled coroutines.
So I'll try to walk through this quickly. We start with a simple Grain class. You can think of this as a parallel to the Thread class. It takes a start argument, and has a state (Stopped, Ready, Sleeping). This is the thing that's used to encapsulate a single activation of a grain which gets manipulated over time.
public class Grain
{
// Fields
private GrainState _lastKnownState = GrainState.Stopped;
private GrainNext _suspendedState;
private IEnumerator<GrainNext> _executor;
private ManualResetEvent _completedWaitEvent = new ManualResetEvent(false);
private GrainStart _start;
private ParameterizedGrainStart _parameterizedStart;
private IEnumerable<GrainNext> _enumerableStart;
// Constructors
public Grain(GrainStart start)
{
_start = start;
}
public Grain(ParameterizedGrainStart parameterizedStart)
{
_parameterizedStart = parameterizedStart;
}
public Grain(IEnumerable<GrainNext> enumerableStart)
{
_enumerableStart = enumerableStart;
}
public Grain(IEnumerator<GrainNext> executor)
{
_executor = executor;
}
// Properties
public GrainState LastKnownState
{
get { return _lastKnownState; }
internal set { _lastKnownState = value; }
}
public GrainNext SuspendedState
{
get { return _suspendedState; }
internal set { _suspendedState = value; }
}
internal IEnumerator<GrainNext> Executor
{
get { return _executor; }
}
internal EventWaitHandle CompletedWaitEvent
{
get { return _completedWaitEvent; }
}
// Methods
public void Start()
{
Start(null);
}
public void Start(object state)
{
// If one of the delegate starts were passed in, detect and execute to
// get our IEnumerable<GrainNext> instance.
if (_start != null)
_enumerableStart = _start();
else if (_parameterizedStart != null)
_enumerableStart = _parameterizedStart(state);
// Just create a new enumerator and use that as our "executor".
if (_enumerableStart != null)
_executor = _enumerableStart.GetEnumerator();
// Lastly, if executor is null, we're hosed. Just fail now.
if (_executor == null)
throw new ArgumentNullException(
"Executor is null. Either your start method returned null or you passed in null.");
}
public void Join()
{
_completedWaitEvent.WaitOne();
}
}
In that code, I reference a couple things. First, the GrainStart and ParameterizedGrainStart types are delegates which are quite similar to the ThreadStart and ParameterizedThreadStart types. They refer to a method used to kick off a coroutine as soon as it gets run by the scheduler.
public delegate IEnumerable<GrainNext> GrainStart();
public delegate IEnumerable<GrainNext> ParameterizedGrainStart(object obj);
These delegates return IEnumerable<GrainNext>, which is the target iterator function. GrainNext is a little tricky to get your head around at first. It's a delegate that, when executed, tells the scheduler what to do next. It tells the schedule whether the grain is ready to run, is sleeping waiting for an event, or is done executing. This state is represented by the GrainState enum. There's also a GrainStateFactory static class that helps to generate the simple Ready and Stopped cases. Remember I said coroutines can sometimes yield a void when it has nothing of interest to report? Well, the coroutine is simply yielding an instruction to the scheduler as to what it wants to do next. If the iterator just exits normally, the scheduler will assume this means it is normally terminating. Also one quick thing to note: if an unhandled exception escapes a grain, it tears down the whole scheduler. Not sure the best approach to handle this.
public delegate GrainState GrainNext();
public enum GrainState
{
Ready,
Sleeping,
Stopped
}
public static class GrainStateFactory
{
public static readonly GrainNext Ready =
delegate { return GrainState.Ready; };
public static readonly GrainNext Stopped =
delegate { return GrainState.Stopped; };
}
You might be wondering why this is done through a delegate. It seems odd, I agree. Why doesn't the enumerator just return GrainNext? Well, for one I’m a functional junkie and I’m imposing my preference for lazy evaluation on my readers. ;) But truthfully, I thought it was a cool feature that you could use the return delegate to inspect some predicate and determine the next step based on that. This way if you are sleeping a grain, you can check for the wake up condition in the returned delegate and, if it's true, return Ready; otherwise, you just return Sleeping, and the same predicate gets checked next time the scheduler is ready to run the grain. Would love feedback here.
The Round Robin Grain Scheduler
So obviously the Grain doesn’t do much good without some form of scheduler. So I wrote a lot of code here, too. Basically, a scheduler gets hosted in its own thread (hmm, or perhaps it, too, could be a grain!). It is basically a big while(true) loop that consumes things off the queue as they arrive.
So we start with an abstract GrainPool class with some common methods, but is void of any specific scheduling policy. We’ll skip this guy for now, it’s not terribly interesting. On to the actual implementation. The idea is that we can have a whole bunch of scheduling policies out of the box, but I'm tired right now… So I only spent time to implement a round robin scheduler.
The code here is fairly lengthy. But it should be pretty easy to follow. It just loops round and round, processing grains in the queue until it gets shut down.
public class RoundRobinGrainPool : GrainPool
{
private Queue<Grain> _queue = new Queue<Grain>();
protected override void SchedulerLoop()
{
try
{
bool exitRequested = false;
while (!exitRequested)
{
Grain g = null;
// Dequeue the first item in the queue, or wait until one arrives.
while (g == null && !_isInterruptionRequested)
{
lock (_poolLock)
{
if (_queue.Count == 0)
{
if (_isShutdownRequested)
{
// The queue has been emptied, process shutdown now.
break;
}
else
{
// Wait for a new item to arrive. 250ms might need tuning...
// it could be overly "spinny".
Monitor.Wait(_poolLock, 250);
}
}
else
{
// Dequeue the item at the head, and jump out of the loop.
g = _queue.Dequeue();
break;
}
}
}
//HACK: Due to shutdown, g could be null. If there are ever other new ways
//to exit the block w/out setting g, this code will catch it.
if (g == null)
{
exitRequested = true;
continue;
}
// If we last knew that the grain was asleep, give it a chance to indicate
// whether it is ready to run or not.
if (g.LastKnownState == GrainState.Sleeping)
{
g.LastKnownState = g.SuspendedState();
if (g.LastKnownState == GrainState.Stopped)
{
// It reported that it has stopped--move on to the next grain.
g.CompletedWaitEvent.Set();
continue;
}
}
// So long as the grain isn't sleeping, we give it a chance to run.
if (g.LastKnownState != GrainState.Sleeping)
{
if (g.Executor.MoveNext())
{
// The grain reported that it's either sleeping or ready to be
// run again. This is fine & dandy, just update its state and prepare
// it for its next round of execution.
g.SuspendedState = g.Executor.Current;
g.LastKnownState = g.SuspendedState();
}
else
{
// It's done executing. Update its state and prepare to ditch it.
g.SuspendedState = GrainStateFactory.Stopped;
g.LastKnownState = GrainState.Stopped;
}
if (g.LastKnownState == GrainState.Stopped)
{
// It reported that it was done. Move on to the next one.
g.CompletedWaitEvent.Set();
continue;
}
}
// If we got here, the grain still has execution left in it. Schedule it up
// for the next loop.
lock (_poolLock)
{
_queue.Enqueue(g);
}
}
}
finally
{
// Notify waiter that we saw the interruption request.
_shutdownEvent.Set();
}
}
public override void QueueGrain(Grain g)
{
// Check that our grain is in a ready state.
if (g.LastKnownState == GrainState.Stopped)
{
// Start will throw an exception if the grain is corrupt. We let
// it fly, no problems here.
g.Start();
}
// Put 'em in the queue and notify the scheduler that it's ready.
lock (_poolLock)
{
_queue.Enqueue(g);
Monitor.PulseAll(_poolLock);
}
}
public override void DequeueGrain(Grain g)
{
// Just copy our internal queue to a new version, leaving out the grain
// which was requested to be removed.
lock (_poolLock) //BUGBUG: This coarse lock sucks big time.
{
Queue<Grain> copy = new Queue<Grain>(_queue.Count * 2);
while (_queue.Count != 0)
{
Grain deq = _queue.Dequeue();
if (!deq.Equals(g))
copy.Enqueue(deq);
}
_queue = copy;
}
}
public override IEnumerator<Grain> GetEnumerator()
{
// Lock to ensure consistency in the data structure while we
// snapshot the enumerator.
lock (_poolLock)
{
return _queue.GetEnumerator();
}
}
}
Should really spend more time discussing this bit, but if you have specific questions or items which aren’t clear, please ask!
Usage Example
Again, running out of steam, so I didn't cook up any realistic examples. But this little snippet does demonstrate how it works from a user’s perspective. I did try to make the APIs half-decent and at least a little straightforward to use.
I create a new thread that owns running the grain scheduler, and then send off a couple coroutines for execution. I ask for a normal shutdown (not an interruption, so it waits for the grains to complete), and then we’re done. I'm going to try and get some better examples over the next couple days.
class Program
{
static void Main()
{
GrainPool pool = GrainPool.DefaultPool;
// Create a new Thread which hosts the grain pool.
Console.WriteLine("### Starting grain host...");
Thread t = new Thread(pool.Start);
t.Start();
// Now schedule some work for the grain pool.
Console.WriteLine("### Scheduling grains...");
pool.QueueGrain(new Grain(GrainFuncs.Fibonacci));
pool.QueueGrain(new Grain(GrainFuncs.Random));
// Lastly, wait for our grains to be done.
Console.WriteLine("### Waiting for completion & shutting down the pool...");
pool.Shutdown(true);
Console.WriteLine("### Joining the thread...");
t.Join();
Console.WriteLine("### Exited normally");
}
}
static class GrainFuncs
{
internal static IEnumerable<GrainNext> Fibonacci()
{
int n0 = 0;
int n1 = 1;
int n;
do
{
n = n0 + n1;
n0 = n1;
n1 = n;
Console.WriteLine("[Fib: {0}]", n);
yield return GrainStateFactory.Ready;
}
while (n < 1000);
Console.WriteLine("[Fib: {0}**FINAL**]", n);
}
internal static IEnumerable<GrainNext> Random()
{
Random r = new Random();
for (int i = 0; i < 15; i++)
{
Console.WriteLine("[Rand: {0}]", r.Next());
yield return new GrainNext(delegate
{
if (r.Next(2) == 1)
return GrainState.Ready;
else
return GrainState.Sleeping;
});
}
}
}
 Monday, May 09, 2005
I wrote up a wait-free atomic stack which remains consistent in the face of multi-threaded scenarios. It uses interlocked operations for pushes and pops. Unfortunately, I didn't manage to beat out the BCL's Stack<T> when combined with monitor locks in a tight insertion loop, but I did beat out our LinkedList<T> using monitors. I admit, I'm using pretty coarse grained locks in my tests, but I'm confident I'd still beat it out with a finer grained lock (anybody want to verify? I'm at ~69% of its execution with the big lock).
Overview
Instead of pasting the entire chunk of code here, I'll walk through one method at a time. The full source is available here. The class is named InterlockedStack<T>, and it implements IEnumerable<T>. Nothing fancy like IList<T> or ICollection<T>, mostly because I didn't want to have to worry about the guarantees those interfaces require you to make.
public class InterlockedStack<T> : IEnumerable<T>
{
I wrote the class using a linked list backing store, mostly because I couldn't think of a clever way to guarantee atomic adds and removes otherwise. This hit perf negatively as summarized in the closing paragraphs.
The Code
So I have a self explanatory inner class ListNode<U>:
// Crummy little data structure to hold a linked cell.
private class ListNode<U>
{
internal ListNode<U> Next;
internal U Value;
internal ListNode(U value) { this.Value = value; }
}
And a few fields:
private ListNode<T> head;
private int version;
internal int pushMisses;
internal int popMisses;
Head is the top of the stack, and version is just an incrementing counter which I use for some dirty read about the list's consistency during enumeration. PushMisses and PopMisses have been useful in my debugging to find out how many times an interlocked operation failed and had to retry due to an inconsistent read/write.
The two meaty methods are Push and Pop. They're pretty well commented, so hopefully not so much explanation is necessary. First, Push:
public void Push(T item)
{
ListNode<T> newHead = new ListNode<T>(item);
ListNode<T> newNext;
// Note: we could have avoided the extra break jump with a do { } while (x) loop. But
// we need the extra logic at the end for debugging push misses.
while (true)
{
// Point our new node to the current head.
newNext = head;
newHead.Next = newNext;
// The current head might get updated between this snapshot and the next C&S. If
// it does, we just loop around and try again.
if (Interlocked.CompareExchange(ref head, newHead, newNext) == newNext)
{
// Btw, we don't guarantee that the version will be +1 from the entrance to this
// routine, nor do we guarantee atomicity with respect to the actual head swap.
// Our guarantee is simply that it gets incremented roughly when the changes commit.
Interlocked.Increment(ref version);
break;
}
Interlocked.Increment(ref pushMisses);
}
}
It accepts an item, and goes into a while loop trying to add an item consistently. If it fails, it just circles back 'round and tries again. Notice that the only guarantee that we make with regards to version is that it gets incremented at roughly the same time the commit happens. This opens up a tiny race condition with the enumerator and count methods that I'm willing to live with.
Pop takes a similar approach:
public T Pop()
{
bool garbage; // Garbage in, garbage out...
return Pop(out garbage);
}
public T Pop(out bool isEmpty)
{
ListNode<T> popped;
while (true)
{
// Snap a copy of the current head.
popped = head;
if (popped == null)
{
// Simple case: the list is emptied (subject to races, we could in theory
// loop forever checking whether something got added during the window--probably
// not wise :) ). Just signal emptiness and return the item.
isEmpty = true;
return default(T);
}
// Try to C&S the head with our popped item's next pointer. This might fail as a result
// of concurrent updates. That's OK, we just spin around the loop and try again.
ListNode<T> next = popped.Next;
if (Interlocked.CompareExchange(ref head, next, popped) == popped)
{
// See version increment notes for Push. The same conditions apply here.
Interlocked.Increment(ref version);
// Setting next to null helps people who have a dirty copy of this node. It might
// help them do the right thing, but it's probably frivolous.
popped.Next = null;
break;
}
Interlocked.Increment(ref popMisses);
}
isEmpty = false;
return popped.Value;
}
You'll notice I provide two overloads for Pop--one which accepts an out bool parameter that gets set to true if the stack is empty, the other which doesn't. In either case, default(T) will be returned if the pop was empty. This is useful when you're using ref types, as you can just check for null without dealing with an out param. The BCL Stack<T> actually throws if you try to pop an empty stack. I don't know which I prefer.
Lastly, the enumerator and count methods. They're not fully thread-safe, although the only harm is the possibility of stale data. They're still useful for approximations of a point in time. You won't see any corruption, it's just that the way that they operate doesn't take into account all of the subtle races inherent in the data structure. This is just fine. Push and Pop are consistent, which was the #1 priority.
Performance Summary
I wrote a few perf harnesses. Nothing too fancy. If you want to see more, just check out the code that I linked to earlier. The results of doing 10 million inserts into a list across 10 threads on a single-proc box (ugh, would love to run this on a dual/quad) are:
- Stack<T> with Push wrapped in Monitor.Enter/Exit: ~1125ms (1.0x)
- LinkedList<T> with AddHead wrapped in Monitor.Enter/Exit: ~5175ms (4.6x)
- InterlockedStack<T> with Push: ~3584ms (3.2x)
These are the average #s for 3 runs.
I think it's obvious why Stack<T> beats out my implementation, but I'll summarize as follows:
- Stack<T> uses a fixed size array internally, so aside from the ocassional growth, adding an item is a simple case of a stelem and increment of a field. Stack<T>'s default growth policy seems to handle this type of testing, probably at the expense of unneeded space.
- I have to heap allocate a new ListNode<T> for each insertion. This adds a whole lot of overhead to the process of adding a node, returning an existing one, or even just inspecting the head.
- Accessing items is quicker with the Stack<T>, too, since it's just an index into the array to get the native data type or reference to it (hoorah generics), whereas my solution needs to do an extra dereference just to get to the native data/reference.
Summary
I'm not sure why you wouldn't just go with Stack<T> plus a monitor based on the numbers. Granted, this is a pretty contrived micro-benchmark, but still offers some evidence that this is the right way to go.
I think that on computers more capable of parallelism, you might see a win with the InterlockedStack<T>. Why? Because the interlocked operations are very fine grained, and might increase concurrency due to the ratio of the cost of contention to memory allocation increasing. The low push and pop misses I'm seeing in my testing is an indication that this is true. I'd love if somebody out there could verify this or share their thoughts.
 Thursday, April 28, 2005
I'm not doing a good job keeping up with the ~150 blogs to which I subscribe. 1,000s of posts behind (17,440/36,207). Scoble, man, I don't know how you do it.
So as a quick fix, I decided to refactor the list to aggregate my current Top 20. The way I decided this is, given my reading habits and interests, what 20 people's posts would I be most likely to read upon noticing a new entry? Obviously, the reasoning differs from person to person.
Figured I'd share it out in case you were missing one of them (btw, my blogroll's always available on the RHS of the site).
In ascending alpha-numeric order; my top 5 are bolded:
 Saturday, April 23, 2005
Don's kicking butt over there with lots of experimentation with closures and iterators in C#. Sam has commented and seems to be impressed with C#'s new evolutionary direction. I am too.
Inspired, I went back to a couple posts I did mid last year. I did some playing around with thunks and streams. I didn't do a great job explaining back then, and I think it might be appreciated a little more now. Further, a commenter on Brad's recent post talked about using streams in Scheme to implement the Sieve of Erastosthenes. So I did one in C#.
Streams and laziness
A stream uses lazy data structures in order to represent an infinite list. It's a recursive data structure, generally consisting of a pair of two elements: the first element is a value, and the second is a promise that, when forced, returns a likewise pair. A promise is just a suspended computation, also known as a thunk. When you need the value it represents, you force it. Many promises are “memoized,” meaning they remember the value they calculate. This helps to avoid performing the same calculation more than once.
Note: Functional programmers are quite familiar with the notion of using recursive data structures to represent lists. For example, consider a Pair<T,U> type which has two fields, one of type T and another of type U. A List can be defined as List<T> : Maybe<Pair<T, List<T>>>. Maybe is much like Nullable<T>, although it handles ref types, too. So a list is simply defined as either null or an element followed by a list. This, I think, is a topic for an entire post.
Streams are a bit more general purpose than the above explanation. But I've simplified in order to explain the concept without too many special cases. It's entirely reasonable to have a fixed size stream, or a stream which isn't calculated in sequential order, for example (consider a calculation which generates elements in a tree-like structure; you might want to sporadically populate the list both before and ahead of your current position). Further, languages like Haskell and Standard ML support laziness as a 1st class language construct. All recursive computations and data structures are lazy by default. This is quite nice, although difficult to accomplish in imperitive languages due to their strict ordering and side-effecting nature.
In Scheme, laziness is exposed through the functions 'delay' and 'force'. So if we wanted to represent a lazy Fibonacci calculation, we might do something like this:
(define (mk-fibs) (cons 1 (delay (fibs-next 0 1))))
(define (fibs-next n0 n1) (let ((n (+ n0 n1))) (cons n (delay (fibs-next n1 n)))))
We just treat the result of (mk-fibs) as though it were a list. A minor detail is that we need to force the cdr before using it. Promises in Scheme are memoized internally, so although you have to call force, it won't actually perform the calculation if it has already done so. It will just return the remembered value. I wrote a little function that “takes” the nth element from a lazy stream (derived from Haskell's Standard Prelude function), and actually patches up the stream as it goes.
Although promises are memoized automatically, you won't be able to pretty print them at the toplevel. By replacing the cdr with the result of a force in my function, it looks a little nicer when printed. Purely asthetics.
(define (take nth list) (if (<= nth 0) (car list) (take (- nth 1) (begin (set-cdr! list (force (cdr list))) (cdr list)))))
Umm... What does this have to do with C#?
Well, as C# moves more and more towards the world where 1st class functions are used pervasively, a lot of what Scheme (and Haskell and Standard ML and ...) can do is becoming available to C#-ites. A lot of people have been coming to this realization lately. The next couple years are very exciting for C# programmers. I hope people's heads don't explode. (Well, only a couple, OK?)
Here's my somewhat, almost general purpose Stream<T> class. It's a bit lengthy, and relies on an array of other data structures. I'll introduce those in a minute. And then I'll show how to use this to create a lazy version of Fibonacci and the Sieve of Erastosthenes:
class Stream<T> : IEnumerable<T>
{
private StreamNext<T> start;
// Ctors
public Stream(StreamNext<T> start) { this.start = start; }
public Stream(StreamPair<T> pair) : this(new StreamNext<T>(pair)) { }
public Stream(StreamFunction<T> function) : this(new StreamNext<T>(function)) { }
// Properties
public T this[int n]
{
get
{
// Note: the performance of this method isn't great. It allocates a
// new enumerator upon each invocation, and does a linear walk from
// 0..n, forcing promises along the way, in order to locate the desired
// element.
// Just walk ahead 'n' steps (inclusive), forcing if we must.
IEnumerator<T> enumerator = GetEnumerator(true);
for (int i = 0; i <= n; i++)
enumerator.MoveNext();
return enumerator.Current;
}
}
// Methods
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>)this).GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return GetEnumerator(true);
}
public IEnumerator<T> GetEnumerator(bool forcePromises)
{
StreamNext<T> last = null;
StreamNext<T> current = start;
while (true)
{
if (current.IsSecond())
{
// This is a suspended computation--i.e. a promise.
if (forcePromises)
{
// Apply the function to get the next pair.
current = new StreamNext<T>(current.Second());
// Memoize the changes we've made.
if (last == null)
start = current;
else
last.First.Tail = current;
}
else
{
// If we aren't forcing promises, we're done.
break;
}
}
// Yield the next value, and move on to the next.
yield return current.First.Head;
last = current;
current = current.First.Tail;
}
}
public override string ToString()
{
StringBuilder sb = new StringBuilder("(");
IEnumerator<T> enumerator = GetEnumerator(false);
while (enumerator.MoveNext())
{
sb.Append(enumerator.Current);
sb.Append(" . ");
}
sb.Append("#promise)");
return sb.ToString();
}
}
Some highlights of this class:
- The default enumerator forces values as it goes. This means if you do a foreach over it, it won't ever terminate (in theory; if you're not careful, you'll end up overflowing and probably failing in some unpredictable way)! This is one of the great features of the Stream... it generates a neverending list of values as fast as you can consume 'em. If you pass false to GetEnumerator, it won't force, and will instead just give you what's already been calculated.
- ToString is overridden to show you only the forced values thus far. It prints data in a dotted-list-like form, just because I think it's shnazzy.
- The indexer gets you the nth element of the list. This is cool because you don't have to worry about whether it's calculated or not--the indexer handles it for you. It also memoizes, so the next time 'round the value will be cached. This is very much like my 'take' function above in Scheme.
Fibonacci stream
Writing a Fibonacci stream is quite nice and easy with this new class. I already showed the Scheme version above. Here it is in C#:
internal class FibonacciStream : Stream<long>
{
internal FibonacciStream() : base(
new StreamPair<long>(1, delegate { return Next(0, 1); }))
{
}
private static StreamPair<long> Next(long n0, long n1)
{
long n = n0 + n1;
return new StreamPair<long>(n, delegate { return Next(n1, n); });
}
}
I derived from Stream<T>, although this isn't strictly necessary. So we start off by saying that the 1st fib is 1, and the second is a promise for “Next(0, 1)“. Next is a function which generates the next pair in the sequence. In this case, it adds 0+1 and uses that as the first part of the pair, and recursively gives a promise to “Next(1, 1)“. And so on.
You can foreach over it, e.g. to print the numbers in the series up to 1,000:
FibonacciStream fib = new FibonacciStream();
foreach (long l in fib)
{
if (l > 1000)
break;
Console.WriteLine("{0}", l);
}
Or even ask for a specific number in the series, e.g. the 50th Fibonacci number:
Console.WriteLine(fib[49]);
And as already noted, this is cumulative. So if you then decide to ask for the 51st number, it just does a linear walk through its memory, and forces the next value.
Sieve of Erastosthenes stream
Now, there are a couple ways to implement the Sieve. I'm not sure if there are better ways to do this with infinite streams (the standard approach is to mark from k..sqrt(n), where k is the prime and n is the length of the list; but with an infinite stream, we don't know n!), but this seems to be alright. Its performance is degraded proportional to the amount of primes it has found thus far, so when you get a ways into the list it could slow down. I ran it to compute all 0..int.MaxValue primes, and it didn't exhibit any prohibitive slowdown:
internal class SieveOfErastosthenesStream : Stream<bool>
{
// This might look strange, but we special case the first two iterations; 0 and 1 are not
// prime. So, our actual calculations start at 2.
internal SieveOfErastosthenesStream() : base(
new StreamPair<bool>(false,
new StreamPair<bool>(false,
delegate { return Next(2, new List<int>()); })))
{
}
private static StreamPair<bool> Next(int current, List<int> primes)
{
bool isComposite = false;
int sqrt = Math.Sqrt(current) + 1;
foreach (int p in primes)
{
if (p > sqrt) break;
isComposite = current % p == 0;
if (isComposite)
break;
}
if (!isComposite)
{
// We make a copy here because we're avoiding side effects. If we just added
// it to the 'primes' instance passed in, we can munge suspended calculations
// that get forced more than once (i.e. non-memoized).
primes = new List<int>(primes);
primes.Add(current);
}
return new StreamPair<bool>(!isComposite,
delegate { return Next(current + 1, primes); });
}
}
Fairly straight forward. It's a little complicated to get started since we need to special case 0 and 1, but after that the algorithm “just works.“ Notice that we copy the list of primes before we memoize the second computation. This gets deeper than I want to go right now, but suffice it to say that side effects are eeeevil. So now if we want to, say, print which numbers from 0..10,000 are prime, we can do it like this:
SieveOfErastosthenesStream sieve = new SieveOfErastosthenesStream();
IEnumerator<bool> sieveEnum = sieve.GetEnumerator();
for (int i = 0; i < 10000; i++)
{
sieveEnum.MoveNext();
if (sieveEnum.Current)
Console.WriteLine("{0}", i);
}
A naive implementation might have just used sieve's indexer for each call. This would be too horrible in reality. But since I know that it's performance is linear with respect to the size of the calculations performed thus far, I've gone with the slightly more verbose approach above. This isn't too much unlike working with a linked list.
But the indexer does come in handy. Say we wanted to check to see if a certain number is prime. Well, according to this, the 1,000th prime is 7919. So this should print 'True' (and indeed it does, very quickly I might add... actually it's entirely based on memory if you ran the above code first):
Console.WriteLine(sieve[7919]);
Hey: I don't like smoke and mirrors!
An astute reader will have noticed the abundance of types in Stream<T> above that I made no mention of. Yep, there's a bit of plumbing underneath to make this work. Some of this is to be expected, while some isn't. (For example, I've griped many a time that we don't have a Pair<T,U> type in the BCL.) Most of this is uninteresting, so I won't do a lot of explainin'.
delegate StreamPair<T> StreamFunction<T>();
class StreamPair<T> : Pair<T, StreamNext<T>>
{
public StreamPair(T current, StreamPair<T> memo) : base(current, new StreamNext<T>(memo)) { }
public StreamPair(T current, StreamFunction<T> promise) : base(current, new StreamNext<T>(promise)) { }
public T Current { get { return Head; } }
public StreamNext<T> Next { get { return Tail; } }
public bool IsMemoized() { return Next.IsMemoized(); }
public bool IsPromise() { return Next.IsPromise(); }
}
class StreamNext<T> : Choice<StreamPair<T>, StreamFunction<T>>
{
public StreamNext(StreamPair<T> first) : base(first) { }
public StreamNext(StreamFunction<T> second) : base(second) { }
public bool IsMemoized() { return IsFirst(); }
public bool IsPromise() { return IsSecond(); }
}
StreamFunction<T> is a delegate which represents lazy promises. Evaluating one returns a StreamPair<T>. This is a pair of the real T value (which was computed by evaluating the promise), and the next promise in the series. The promise is represented as a StreamNext<T>. I lied a little: StreamNext<T> can be either a StreamPair<T> (notice the recursion), or a StreamFunction<T>. The former is used when you might want to precalculate more than one value in a list, while the second is used for promising future values.
And of course, these build on other types not listed. Namely, Pair<T,U> and Choice<TF,TS>. I admit, this stuff is a hack. If has a pretty ugly interface.
class Pair<T, U>
{
public T Head;
public U Tail;
public Pair(T head, U tail)
{
this.Head = head;
this.Tail = tail;
}
}
class Choice<TF, TS>
{
private bool isFirstType;
private TF first;
private TS second;
public Choice(TF first)
{
this.isFirstType = true;
this.first = first;
}
public Choice(TS second)
{
this.isFirstType = false;
this.second = second;
}
public TF First
{
get { return first; }
}
public TS Second
{
get { return second; }
}
public bool Is<T>()
{
return isFirstType ?
typeof(T) == typeof(TF) :
typeof(T) == typeof(TS);
}
public T As<T>()
{
return isFirstType ?
(T)(object)first :
(T)(object)second;
}
public bool IsFirst()
{
return isFirstType;
}
public bool IsSecond()
{
return !isFirstType;
}
}
 Saturday, April 16, 2005
Top of the “for fun” stack looks like this:
- Write, write, write. My book is not moving along as fast as I had hoped. In fact, I'm waaaay behind schedule. Thankfully my publisher's understanding, but I'm certainly testing their patience!
- Finish reading the two books I picked up recently:
- Fix bugs in Sencha, especially in the area of quotations. Get more of the SXML, SXSLT, etc. stuff implemented. Work on platform interop. Finish up the standard library functions. VSIP. Interpreted mode? Found some interesting dynamic bugs recently due to the way I statically bind at compile time. Only crops up when working in the toplevel, but (interestingly) this is a pretty big use case for Scheme programming. ;)
- Get HaskIL off the ground. I've done some work, using GHC Core as the one real truth, but not nearly enough. Need more time. I'm really excited to work on some of the lazy and parallel aspects of this project... gotta get over the hump first.
- Put my two forks of Rotor back together. I'm learning to navigate the JIT and EE interactions a bit better, although I've come to a particularly unsurprising conclusion: the CLR is a complex beast. I haven't been able to get either of these two things to actually run a program that takes advantage of the changes I've made. Ugh.
- First class continuations (and coroutines) in the runtime. My approach here is to capture stack state and wrap it up in an object. This can be used to restore at a later point in time. You just call the static method System.Threading.Continuation.Capture(), returning a Continuation instance c; then later you can call c.Restore(). Makes it pretty easy to implement call/cc. I'm now at least a little proficient in writing fcalls; all the whacky macros are making my head hurt. Interactions between stack unwinding and restore is a tricky thing, and I'm finding that CAS is a pickle. My current design does a stack crawl and throws if it notices any type of assert or demand... not for technical reasons, but because it seems pretty bad to capture this and pass it around. Similarly, I fail if you're inside a try { } ... finally { } block since shared state can munged up in a finally (which gets torn down as part of the stack unwind). One of the huge difficulties here is state which doesn't get represented in data structures that the EE knows about--for example, JIT only stack frames. In fact, I'm slowly realizing that this might not be possible without an explicit level of indirection with stack management.
- Software transactional memory. System.Transactions is one step in the right direction. I'm interested in retry-style STM support, however, directly baked into the runtime. Some of the MSR folks have been doing amazing work here. This one's a tough nugget, too. You can't know statically if you're inside a transacted block--unless you limit transactions to simple blocks w/out function calls--so when you JIT a memory access, it's pretty damn tough to figure out the right thing to do. Ultimately, you don't want to add a level of indirection for each memory read/write. But this is what I've done thus far, mainly because it's a step in the right direction. Further, you can't safely retry stuff that does I/O, so I'm building a constant list of “known“ I/O routines in the BCL. This is crap and doesn't scale. I wish we had monads in the runtime... or at least some form of annotation so that I knew what library call ended up doing I/O.
- Did I mention I have a book to write?
I was thinking tonight: it'd be freaking awesome to have a pure functional language based on Haskell's laziness, pattern matching, and approach to I/O (monads). But with a paranthetic syntax, along with lists like Lisp/Scheme. Wow. Does it exist?
Perhaps I'll go and *do* something now. Instead of just talking about doing something.
|
|
Me
Joe  is an architect and developer on a systems incubation project at Microsoft.
Recent
Search
Browse
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2013, Joe Duffy
|
|