| |
 Saturday, March 05, 2005
 Friday, March 04, 2005
Advanced Topics in Types and Programming Languages edited by Benjamin C. Pierce |
 |
9 of 10. This book builds on one of my favorite reads from last year, Types and Programming Languages. It contains a collection of essays on topics ranging from precise type analysis, lower level type systems (e.g. a typed assembly language), reasoning about programs, and ML type inferencing. I've not yet completed it, but it is generally very well written. I highly recommend both of these books. |
Free Culture: The Nature and Future of Creativity by Lawrence Lessig |
 |
8 of 10. Fascinating read by one of the most knowledgeable and provocative experts on the topic of both historical and modern intellectual property. This book takes a careful look (without too much lawyer speak) at the impacts IP law have had on culture as a whole, and examines the impacts on the future of ideas and the ability to act out on such creativity. Lawrence has a blog over at http://www.lessig.org/blog/, and has a great collection of presentations over at the IT Conversations website. |
Mind Hacks by Tom Stafford, Matt Webb |
 |
7 of 10. This is a fun book. While some of the topics covered aren't necessarily hacks, the author's do a great job of discussing some interesting facets of how the brain functions. While coverage doesn't go very deep in any one area, the book provides plenty of (mostly web) references to follow up on if you end up wanting more details. The prose is very computer/geekish which just adds to the reading pleasure. |
 Monday, February 28, 2005
Via the Daily ACK, the University of Washington has their Computer Science & Engineering colloquium series available online.
One gem in this series is "Google: A Behind-the-Scenes Look." The presenter discusses a little bit about culture, a tad about their algorithms, data centers, trends, ongoing research, and so on. Each of the topics could have a dedicated video in its own right. Still a great breadth-first overview.
I've always loved data. It seems to me that all of the creative and elegant software out there has arisen directly as a result of humans needing to deal with and categorize information. No data? Not much to do, now, is there? Sure, sure, ... people often cook up some hypothesis or artificial goal in order to get motivated, but processing information is so ingrained in what we are and do that it truly seems to be the reason for human existence.
Alright, maybe that was too preachy/zen-like/whatever. Just watch the damned video. :)
Based on a recent question from Brad, I decided to write up a little post on the topic of event handlers. Specifically, that objects registered to receive events are rooted, meaning that they won't be eligible for GC even if you drop them on the floor. Through some trickery with weak references, you can work around this (if you need to).
So, what's the issue?
Once you register an object as a sink for events in C#, the source object generating the events will keep a reference to it. When the source becomes unreachable, your sink will also become unreachable (assuming that's the only live reference to the sink). An alternative way to make your sink unreachable is to manually unregister it, asking that the source give up its reference. But this means that until either one of these things happen, your object won't be eligible for garbage collection. This probably won’t be surprising for most folks, but for finalizable objects which own resources it does mean you need to be very careful to unregister your sink when it needs to die. Even for objects which don't hold on to such resources, you might be building up a list of unused objects after a while, akin to stashing away references in a collection and forgetting to prune it when you're done.
For example, given source and sink set of types:
class Source
{
public event EventHandler<EventArgs> Tick;
public void OnTick()
{
EventHandler<EventArgs> t = Tick;
if (t != null)
t(this, new EventArgs());
}
}
interface ISink
{
void HandleTick(object sender, EventArgs e);
}
And client code which defines a sink implementation:
class MySink : ISink, IDisposable
{
private IntPtr someHandle;
static int counter;
int id = counter++;
public MySink()
{
// create someHandle
}
~MySink()
{
// clean up someHandle
Console.WriteLine("!{0} finalized", id);
}
public void Dispose()
{
// clean up someHandle
GC.SuppressFinalize(this);
Console.WriteLine("~{0} disposed", id);
}
public void HandleTick(object sender, EventArgs e)
{
// use someHandle
Console.WriteLine("{0}: Received", id);
}
}
The owner of an instance of MySink might not realize what consequences registering the sink for messages will have, namely that it will cause it to become rooted. For example, this simple code does the trick:
Source src = /*...*/;
MySink sk = new MySink();
src.Tick += sk.HandleTick;
Given that this type owns unmanaged resources, the user really needs to remember to unregister it when they’re done with it.
And this is a problem because…?
This actually seems like it's the desired/intended behavior for somebody registering for events in most cases. Frequently, a sink object will be created, registered for an event, and then its reference dropped and left on its own to process the incoming messages. I.e. its only purpose in life is to respond to events. It might seem surprising that somebody would register an object and forget to ever unregister it, but I suspect this happens all the time. This is one of those subtle bugs which could manifest over time in a long running application (e.g. a web app), where unmanaged memory just continues to rise and rise until ASP.NET recycles itself. Especially if there’s some AppDomain-wide event registration and routing mechanism.
So, for those situations where we don’t want the event source to keep us alive, what are we to do?
Enter “Weak Event Handlers”
We have this nifty feature called weak references (exposed by the WeakReference class), enabling you to hold on to a reference without keeping it pinned from GC collection. If there are no strong references to an object and the GC runs, it'll sweep these up. This means that the underlying object referred to by the weak reference won't be valid any longer. (For those from the Java camp, be careful—one of the first mistakes I made when moving to the CLR was thinking that weak references == soft references. Soft references are only collected as a last resort, making them perfect for caching. Weak references are always collected if no strong references exist at the time of a GC, so you might have to think carefully about using them. This is one great reason to avoid GC.Collect() in production applications (among many others).)
We can go ahead and use this feature to wrap up our event handler delegates. Then we just hand the wrapper instead of the raw delegate off to event registration, thus preventing the event source from keeping our object alive. So, let’s get started.
First we introduce a general purpose WeakEventHandler<T> class:
class WeakEventHandler<T> where T : EventArgs
{
private WeakReference weakRef;
private EventHandler<T> handler;
public WeakEventHandler(EventHandler<T> d)
{
weakRef = new WeakReference(d);
handler = new EventHandler<T>(Invoke);
}
public EventHandler<T> Handler
{
get { return handler; }
}
private void Invoke(object sender, T e)
{
EventHandler<T> d = (EventHandler<T>)weakRef.Target;
if (d != null)
d(sender, e);
else
// hmm! what to do here?
}
}
So now somebody can construct a weak event handler, and register that instead. Once the target of the delegate loses its roots, it’s eligible for GC. For example, the three line snippet from above becomes:
Source src = /*...*/;
MySink sk = new MySink();
src.Tick += new WeakEventHandler<EventArgs>(sk.HandleTick).Handler;
(Note that I initially wanted to subclass Delegate to make a WeakDelegate type, and simply override the internal storage in order to make a general purpose “weak delegate.” Unfortunately, Delegate and MulticastDelegate are treated specially by the CLI (see Partition II), meaning that I couldn’t do this straightforwardly. I’m going to look into it further, but I’m not sure exactly what this would entail.)
But now won't my WeakEventHandler stay alive?
Yep, it does. This is admittedly leaps and bounds better than keeping a finalizable object rooted, but we can probably clean things up a little. Notice the “// hmm! what to do here?” line above?—we can actually put some code in here that intelligently unregisters the weak handler from the source. It requires a bit of generalization, but here’s a shot at it. (Note that this approach requires an event to be triggered to initiate the cleanup, so for events with particularly long delays in between instances, this might not be a great solution.)
First, let’s update our WeakEventHandler<T> class to accept an unregister delegate. This will be invoked in cases where our underlying object has been collected, and hence the weak wrapper should be removed from the event source’s registration list; from this point forward, our previous code just ignores all future requests to fire the event anyhow:
delegate void UnregisterHandler<T>(WeakEventHandler<T> handler) where T : EventArgs;
class WeakEventHandler<T> where T : EventArgs
{
private WeakReference weakRef;
private EventHandler<T> handler;
private UnregisterHandler<T> unregister;
public WeakEventHandler(EventHandler<T> d) : this(d, null)
{
}
public WeakEventHandler(EventHandler<T> d, UnregisterHandler<T> u)
{
weakRef = new WeakReference(d);
handler = new EventHandler<T>(Invoke);
unregister = u;
}
public EventHandler<T> Handler
{
get { return handler; }
}
private void Invoke(object sender, T e)
{
EventHandler<T> d = (EventHandler<T>)weakRef.Target;
if (d != null)
d(sender, e);
else if (unregister != null)
unregister(this);
}
}
Then, we just update our source to have an Unregister method. As a matter of fact, now that we have this weak handler goop, let’s encapsulate it inside a new Register method. This means that (unfortunately) we won’t be using the C# syntax for event registration, but compared with having users grok the weak handler mechanism, I personally find it much more palatable:
class Source
{
public event EventHandler<EventArgs> Tick;
public void OnTick()
{
EventHandler<EventArgs> t = Tick;
if (t != null)
t(this, new EventArgs());
}
public void Register(ISink sk)
{
Tick += new WeakEventHandler<EventArgs>(sk.HandleTick, Unregister).Handler;
}
private void Unregister(WeakEventHandler<EventArgs> eh)
{
Tick -= eh.Handler;
}
}
Now the three lines of code change to:
Source src = /*...*/;
MySink sk = new MySink();
src.Register(sk);
Should I actually do this?
Just because you can use this doesn't mean you should. :) It seems like a reasonable strategy to make sure you don't shoot yourself in the foot too badly, but if you're relying on weak wrappers to do the unregistration and cleanup for you, it seems like all you're really doing is working around bugs. This is a bit like forgetting to call Dispose on an object and having a finalizer kick in just to make sure you don't leak. Further, you probably shouldn't use this if all you do is register for an event and drop the reference. In this case, the GC could kick in as early as the nanosecond after you drop the reference, meaning your registration was useless.
With that said, I think it's perfectly reasonable to use this to debug and detect bugs in your code. For example, if you fired an assert instead of unregistering the weak handler, you might be able to detect where in your program you forget to unregister sink objects. Just some thoughts. I'd love to hear any feedback on where you might find this useful.
Testing, Testing, 1, 2, 3...
This little snippet of test code demonstrates the whole thing. Here’s the full source in case anybody wants to play around with it without having to scour this post to pull together all the code.
class Program
{
static void Main()
{
Program p = new Program();
p.Execute();
}
private Source src = new Source();
public void Execute()
{
for (int j = 0; j < 1000; j++)
{
for (int i = 0; i < 1000; i++)
{
src.Register(new MySink());
}
src.OnTick();
GC.Collect();
}
}
}
 Tuesday, February 22, 2005
I was hacking ever so briefly on my compiler today, and found a pretty subtle bug. Basically, free variables weren't being detected correctly which whacked my closure conversion process--resulting in IL which didn't quite do what it should have. I'm not sure that anybody would find the debugging exercise interesting, but humor me... ok? ;)
I have this simple test case:
(define (mkadder n) (lambda (x) (+ x n))) (define z0 (mkadder 5)) (define z1 (mkadder 10)) (write "Should be 55: ") (write (z0 50)) (newline) (write "Should be 60: ") (write (z1 50)) (newline)
Nothing terribly groundbreaking. I have a small # of these simple tests (~50) which aren't really tests at all. I just compile them, stash away the IL, and review and accept (or debug) any changes to the IL when I make compiler changes. Ideally, I'd have some asserts in there, but this seems to work fine for now.
As you probably noticed above, the (lambda (x) (+ x n)) contains a free variable, n, which is bound by its enclosing mkadder lambda. I represent bound closures through a pretty popular closure conversion technique.
It's pretty simple: Any lambdas with no free variables are emitted as static methods; other lambdas, however, get emitted as classes containing fields to hold the bindings of their free variables. These classes also have a constructor which requires all such bindings to be supplied, and an Apply method that takes whatever # of args the lambda needs to execute (this is the meat of the lambda).
E.g. mkadder is a static method since it doesn't contain free variables:
public static Func1<double,double> mkadder(double n);
However, the anonymous lambda inside gets its own class:
public class _lambda0 : Closure<float> { private float n; public _lambda0(float n); public float Apply(float x); }
(A smart cookie might notice that we have a subtle issue here--that n actually gets blitted around. So when mkadder passes n to _lambda0's constructor, if _lambda0 ends up mutating it, we're not sharing the same reference, and hence mkadder wouldn't see _lambda0's updates. This is a problem since this is the correct semantics. To solve this problem, I try to detect any mutation and lift the variable out into a shared static reference; by detect, I look for either a) mutation for sure (e.g. set!), or b) can't tell (e.g. passing it off to yet another method). But I digress...)
Before calling Apply, an instance of the closure class must be constructed, at which time the values for free variables are provided, fields set, and free variables bound. (Another popular technique is to pass free variables as additional arguments to the method. I considered this, but this way seems a bit nicer for interop scenarios from OO languages (i.e. C#). It does mean that I have to construct simple objects and throw them away almost immediately, though, so once I get to the perf tuning stage I might change my mind. I could always go with both, and make the instance method just a shortcut to the static which passes the field values as the free variable arguments. This sacrifices code size, though. Never perfect, is it?)
Anyhow, I detect free variables and pick the strategy by doing a depth first traversal of my AST. During this traversal, I accumulate variable references, and for special nodes, such as a lambda which binds parameters, or a define or let which creates bindings for a precise scope, I delete variables from the ongoing free variable list. This works nicely if your traversal code works properly. :) When it doesn't, bad things happen.
I represent things like the operands to a procedure call as an IList<AstNode>. Unfortunately, my traversal code uses reflection to get at an AST node's children, and looks for both nodes directly and IEnumerable types (which contain lists of nodes). But alas, IList<T> doesn't implement the old-style IEnumerable (although it does implement IEnumerable<T>). It makes sense--doing so would require anybody implementing a new generics-style list to implement the old-style one, too. I wanted to say typeof(IEnumerable<*>).IsAssignableFrom(foo), but of course you can't easily express such things in C#... (although through some hard-to-follow reflection code you most certainly can). I could have done IEnumerable<AstNode>, but unfortunately I have nodes which use further derived types as the type argument, so this wouldn't have worked... (although using co-/contra-variant support in IL might have worked... I prefer to remain in C#-world for now).
This meant the free variable n wasn't even being picked up! Which in turn meant that my lambda was fully closed and could be emitted as a static method. Ouch. Obviously, this caused my program to fail, but the scary thing is how subtly: it compiled and even passed verification. How I emitted valid IL when I should have been missing a variable reference is beyond me at the moment. This surely will uncover a couple bugs which happen in strange edge cases like this. Lesson learned #1: When in doubt, fail loudly. Lesson learned #2: Get better test coverage. This is such a simple case, I'm surprised I just found it now.
Anyhow... What did I do to fix this? Pretty simple: I just changed my AstNodes to use List<T>. The BCL team's done the work to implement both IEnumerable and IEnumerable<T> on List<T>. I know I'll always have objects in there, so I'm not worried about the boxing overhead this would have normally incurred (although I do end up doing a lot of runtime type checking--something I need to do anyhow).
And like magic, it works! Yaay.
 Monday, February 21, 2005
I wonder if anybody out there in the community has given any substantial thought to how Avalon fits in on the web. Right after PDC'03, I started work on something which ultimately got dropped and never completed. It was intended to be a web application framework that enabled Avalon and Indigo to hook into some web eventing mechanism (i.e. ASP.NET or something similar), to allow for rich web-based UIs.
Basically, you could wire up certain UI events to result in Indigo messages to a web endpoint. This endpoint could have some defined "page" lifecycle, or you might be able to hijack ASP.NET in some fashion to reuse the existing familiar model. Certain types of messages could result in a full client page reload, i.e. the message returned could contain a segment of XAML which would then be rendered in response. You could even envision page reactions which didn't require a full page refresh, sort of a JavaScript-ish thing, for example updating a single field, moving a UI widget, and so on.
It seems you could do this without Indigo (just use raw HTTP and fully take advantage of ASP.NET... you'd of course need to get it to generate proper XAML instead of HTML), but there are so many benefits that Indigo buys you that I think it's worth the extra investment. Getting things to work just right on the client, including dynamic page compilation and tuning the eventing model to have the right level of chattiness so as not to make your UI super sluggish, seems like these would be two of the most difficult challenges. Still lots of potential.
So: any thoughts on this whole thing?
 Thursday, February 10, 2005
I've been toying with some ideas for async operations lately, basically seeing if I can come up with an easier Mort-accessible API model for parallel execution. This is notoriously difficult for strict languages, where ordering matters. I can't say that I'm entirely happy with the result, and really think that languages have to create special syntax to make this simple. So I've been doing considerable research there, too. Interestingly, one of the architects on the CLR team, Jim Miller, did his PhD on a parallelization of the Scheme language. Never a short supply of smart people here at Microsoft!
Parallelization
It seems that tree-like structures (i.e. processing tree-like data or executing tree-like operations) have the most opportunity for parallelization. That is, of course, if each node in the tree depends only on descendants and not its ancestors, unlike (say) XML. If you're implicitly parallelizing this, each thread can take responsibility for traversing down the left-hand side of the tree, kicking off new threads to handle the right branches at each node just before visiting the left branch, and lastly joining to perform last minute computation before flowing data back up the tree. This is recursive, and could end up with quite a few worker threads.
Depending on the weighting of the tree, the number of threads grows exponentially, e.g. O((n-1)^2), where n = the tree depth. It's still interesting, though, because each thread is responsible for calculating the left branches and creating workers for its right branch, and you don't really waste much execution time (on a machine with a processor per thread and where creating, managing, and joining on threads were free, of course ;) ). Even with such costs, you could imagine weighting the distribution of parallelization such that you amortize the cost of threading in such a way so as to minimize the total execution time of the entire operation versus a single-threaded traversal.
It also occurs to me that, while more computationally expensive with O(n^2) execution time, factoring some algorithms this way facilitates parallelization. Could it be that in a few years time, we'll be intentionally writing O(n^2) algorithms over, say, O(n) to make them more conducive to concurrent execution? Yes, they are more expensive in terms of computation and space required, but could be less costly in terms of time. Who knows. One thing I am certain of: computational power and storage capacity are expanding and growing. Time is not. It seems like we can afford to waste the former to gain some savings in the latter.
Mort Friendly?
It's difficult to come up with real scenarios where Mort would need parallel execution. Well, that's not quite true. It's hard to see where Mort would *think* to use parallel execution. I'm fairly sure this is because people are trained to think in single-threaded terms, a mode which is admittedly difficult to get out of the habit of. I mean, to Mort: if he's got to calculate a value or perform some action, well--gosh darnit--it needs to be done before he can move on. We need to change this. Better mainstream language support is surely a way to start.
Lightweight Concurrent Tasks
I have become particularly fond of one toy sample I wrote. It's a lightweight way to execute concurrent tasks, without having to worry about threading, thread pools, wait operations, and the like. As you will see, there's nothing magic about it but for the abstraction. You just call Async.Start up front (which schedules work on the ThreadPool to calculate the value), store the returned AsyncValue, do some work in parallel, and finally Join on the AsyncValue when you need to use the result of its computation.
Since it's usually helpful to see a scenario before diving into the API, here's a demonstration of how this might work for a concurrent Fibonacci function. We all know that the recursive Fibonacci function is horribly wasteful. Not only is it O(n^2), but it results in a whole lot of duplicate calculation. There might be room for memoization of some sort here. Regardless:
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
AsyncValue<int> x2 = Async.Start<int, int>(fib, x - 2);
return fib(x - 1) + x2.Join();
}
}
This is an interesting example, actually. If threads were free, this would theoretically be faster than the single-threaded version of the same recursive procedure. It's even conceivable that, for sufficiently large x, and with a sufficiently large number of processors, and even with the wasted calculations, that this could be faster than the unrolled loop-based constant space version. Of course, this isn't true in today's world and with today's threading architecture, especially given the frequency with which we incur this overhead (one per non-leaf node in the tree).
The Plumbing
Anyhow, let's look at the async plumbing logic that enables this. It's very much like a thunk, except that the value is calculated in the background rather than "on-demand."
First, we define some static factory class (this could very well be on ThreadPool) with a Start operation:
public static class Async
{
public static AsyncValue<A> Start<A>(AsyncFunc<A> f)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f();
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
public static AsyncValue<A> Start<A, B>(AsyncFunc<A, B> f, B b)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(b);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
public static AsyncValue<A> Start<A, B, C>(AsyncFunc<A, B, C> f, B b, C c)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(b, c);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
}
Note that there are similar overloads for varying number of arguments. I've omitted the rest for brevity. Unfortunately, I can't think of a way to reuse this logic generically inside a core args-neutral method without resorting to reflection. This is painful, and IMHO is just a shortcoming in the expressiveness of C#. I'll return to this for more comments shortly. For now, if anybody has ideas, let me know.
Then, the AsyncValue holder class, which encapsulates the Join and wait logic:
public class AsyncValue<A> : IAsyncResult
{
// fields
internal Delegate asyncFunc;
internal bool isCompleted;
internal A returnValue;
internal ManualResetEvent asyncWaitHandle;
// ctors
public AsyncValue()
{
asyncWaitHandle = new ManualResetEvent(false);
}
// properties
public object AsyncState
{
get { return asyncFunc; }
}
public WaitHandle AsyncWaitHandle
{
get { return asyncWaitHandle; }
}
public bool CompletedSynchronously
{
get { return false; }
}
public bool IsCompleted
{
get { return isCompleted; }
}
public A Join()
{
while (!isCompleted)
asyncWaitHandle.WaitOne();
return returnValue;
}
}
Of course, we're also required to declare the delegate signatures we intend to use:
public delegate A AsyncFunc<A>();
public delegate A AsyncFunc<A, B>(B b);
public delegate A AsyncFunc<A, B, C>(B b, C c);
public delegate A AsyncFunc<A, B, C, D>(B b, C c, D d);
public delegate A AsyncFunc<A, B, C, D, E>(B b, C c, D d, E e);
public delegate A AsyncFunc<A, B, C, D, E, F>(B b, C c, D d, E e, F f);
public delegate A AsyncFunc<A, B, C, D, E, F, G>(B b, C c, D d, E e, F f, G g);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H>(B b, C c, D d, E e, F f, G g, H h);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H, I>(B b, C c, D d, E e, F f, G g, H h, I i);
public delegate A AsyncFunc<A, B, C, D, E, F, G, H, I, J>(B b, C c, D d, E e, F f, G g, H h, I i, J j);
...And so on. ;)
Language Syntax?
Oh how I would love there to be C# language syntax for this. Here are two quick (not thought through) possibilities:
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
int! x2 = fib(x - 2);
return fib(x - 1) + x2;
}
}
static int fib(int x)
{
if (x <= 1)
{
return 1;
}
else
{
int x2 = start fib(x - 2);
return fib(x - 1) + x2;
}
}
The first is a little less explicit, and I would imagine the second would cause typing problems. Although with some simple rules we could avoid the problem. Basically, before loading the the local for whatever reason it were needed, we'd have to force it. Just like a thunk. If we had a better idea of its use, we could avoid unnecessarily and prematurely forcing (which could block execution!). Regardless, it should be obvious that emitting IL which uses the Async stuff defined above for this syntax would be relatively straightforward.
Off-Topic: Generics Complaint
I'm constantly finding myself sick and tired of the massive paranoia of modern C++-derivitives. I guess this is entirely orthogonal, but it seems to plague the newer OO languages, e.g. C# and Java. The problem identified above is one great example. Why is it that I can't simplify life greatly with some code like this:
public static AsyncValue<A> Start(AsyncFunc<A, _> f, _ __)
{
AsyncValue<A> value = new AsyncValue<A>();
ThreadPool.QueueUserWorkItem(delegate
{
value.asyncFunc = f;
value.returnValue = f(__);
value.isCompleted = true;
value.asyncWaitHandle.Set();
}
);
return value;
}
Functional languages have mastered the art of pattern matching, where some details can be safely ignored and passed through transparently. Haskell does this without sacrificing type safety at all. We need something similar for C#. But I digress.
 Sunday, February 06, 2005
I spent much of yesterday driving to Yakima Valley, hanging out for a bit, and then driving home. Round trip was about 7 hours and 380 miles, includng non-driving time. I suspect it'd be a bit more fun during the Spring- or Summer-time, as the place was (as expected) pretty quiet. In fact, it sounds like most wineries do barrel tastings around the end of April, so I think we're going to have to head back up... maybe a quick weekend trip.
I'm still a newbie to Washington, and so was pretty amazed at the scenery. You go from rainy, tropical-ish climate, through Snoqualmie Pass--which reminded me so much like driving through New Hampshire on the east coast (with real snow falling even!)--, and then land in something which feels a bit like Arizona, with very few trees and very dry weather. All in a matter of 2 1/2 hours. This was one of those times I wish I had a bike.
Threw some photos up on MSN Spaces (free disk space :) ). Here are a few:




 Thursday, February 03, 2005
During our chat yesterday, a question came up about explicit layout structs.
Overlapping Fields
In particular, somebody was wondering why overlapping reference pointers are disallowed, yet overlapping value types are entirely legal. Consider what would happen if we did allow this: accessing a reference to an instance of type a through an overlapping reference field of incorrect type b would result in very bad behavior (or worse: a value type field b whose bytes would be interepreted as a reference to who the heck knows where).
I'm not precisely sure what the runtime would do in such a circumstance (die gracefully one would hope), but I suspect its not so clearly defined -- hence the disallowance of this construct. :)
Overlapping structs are allowed since structs are just well defined sequences of bytes. One could argue this is a blemish on the CTS's type soundness (and I would agree wholeheartedly), but I'll leave that to other folks to debate.
Union<T,U>?
So anyhow, it got me wondering: could one throw together a general purpose union type using generics? I decided to try...
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
struct Union<T,U>
where T : struct
where U : struct
{
// fields
[FieldOffset(0)]
private bool isT;
[FieldOffset(1)]
private T tValue;
[FieldOffset(1)]
private U uValue;
// ctors
public Union(T t)
{
uValue = default(U); //shutup compiler
tValue = t;
isT = true;
}
public Union(U u)
{
tValue = default(T); //shutup compiler
uValue = u;
isT = false;
}
// properties
public bool IsT
{
get { return isT; }
}
public T TValue
{
get { return tValue; }
set { tValue = value; isT = true; }
}
public U UValue
{
get { return uValue; }
set { uValue = value; isT = false; }
}
}
This enables you to do fancy unions with certain kinds of structs, using just one wasted byte at the beginning for the bool isT. It'd be nice if you could do reference types, too, but unless you can guarantee nobody will ever try to access, say, uValue when isT is true, it ain't gonna happen. Further, even with structs this wouldn't work if T or U had at least one reference type instance field for the same reasons outlined above -- you could imagine bad things happening if you were allowed to access a “corrupt” pointer, basically just some random bytes which make up a value type instance interpreted as a pointer to a memory location (ouch).
Looks great, right? Well, minus all of the aforementioned caveats. I marvelled at the beauty of this code. What a clever chap I am, I told myself. It even compiled!
Not So Good News
Unfortunately we don't allow execution of even this watered down version.
Why?
Because of the use of generics.
As I said, compilers permit it (at least C# does), but it won't pass our verifier and thus causes a TypeLoadException if you try to use the generated IL. In this case, it should be obvious that we could correctly lay out the struct in memory when the type is being JITted. However, the behavior here -- even if it passed verification -- simply isn't defined at all. Further, my guess is that there are some JIT optimizations (e.g. eager struct size computation that might not be generics-aware yet) that would be thrown off if we just permitted this to get through. Not that it isn't possible, just that we haven't spent the time to enable it... probably because of the relative obscurity (leave it to me to find the obscure things). :)
The behavior is entirely deterministic statically and thus at runtime because we only place the parameterized types at the end of the struct. Once the type is closed and we know the type arguments, we can easily compute the size (e.g. 1 + max(sizeof(T), sizeof(U))). If we had fields after the parameterized types, however, it'd be impossible to specify the right offsets statically and thus we'd run into problems. Although, we could still easily determine the right amount of storage at JIT time. One could imagine a [FieldOffset(max(sizeof(T),sizeof(U))+x)] construct that made this possible.
It's a shame. A general purpose Union<T,U> type would be pretty damn cool.
 Saturday, January 22, 2005
Having a massive library of techie books is a blessing in disguise... Especially when you're writing a book and need constant reminder of how things really work. Or just a little inspiration.
I have these on my desk right now (listed in no particular order):
Notice that I have quite a few Java books. Interestingly, I own very few CLR/.NET Framework books... I've found that most of the Java material is transferrable. I'd recommend any one of the above books very highly.
|
|
Me
Joe  is an architect and developer on a systems incubation project at Microsoft.
Recent
Search
Browse
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2013, Joe Duffy
|
|