| |
 Tuesday, November 02, 2004
My Scheme compiler frontend is almost at the point where I can move my focus from syntax to semantics. It successfully recognizes 100% of the Scheme grammar's tokens and constructs an AST representation for about 75% of the constructs available in R5RS. Along with this lexer/parser comes a simple toplevel read-parse-print console that parses stdin or file input and, rather than evaluating it, just prints the AST out for inspection. This has been really useful for debugging. I also have some unit tests that validate certain Scheme input produces the expect tree, also proving to be rather useful. A test is written first for the construct I'm attempting to parse, and I know I'm pretty much there when it succeeds. Very nice.
Obviously most of the difficult tasks are still to come. I've done some work on the backend, but more experimentation than anything else. Most of my time here has been spent thinking about how to make interoperability with other “mainstream” managed languages palatable. I'm actually writing this so that it's pretty modular, thinking this would facilitate plugging in different steps along the way (e.g. so that extending it with an optimizer is easy; ripping an optimizer out and tossing a more efficient one in is even simpler).
Wish I had more time to throw at this. Some day. :)
A thunk is a relatively common construct in functional programming where the passing of arguments won't result in any side effects. In these cases, the compiler can silently emit code that bypasses computing the value of an argument altogether at the call site in case it isn't used in a method body at all. If it does end up being used, however, it will be lazily evaluated at the last possible moment. This is sometimes referred to as call-by-lazy-evaluation. It is said that the argument passed is frozen at the time of invocation, and thawed when it is needed. Further references typically avoid re-thawing once the initial thaw has ocurred.
It might sound odd that an argument would be passed in to a method but not used at all. But depending on the code paths a method takes, it could end up not needing to reference it at all. Consider the simplest case: where a caller spends time retrieving results from a database to construct some fancy input, but then the call fails before the target method even has a chance to inspect this data, possibly because one of the other arguments was in an invalid format or out of the valid range. It's terribly inefficient that the client had to compute this in the first place!
Just playing around, I've thrown together a Thunk<T> class. It's nothing special, but it seems to work rather nicely with C#'s new anonymous delegate syntax.
public class Thunk<T>
{
private T value;
private ThawFunction thaw;
private bool thawed;
public delegate T ThawFunction();
public Thunk(ThawFunction thaw)
{
this.thaw = thaw;
}
public T Value
{
get
{
if (IsFrozen)
{
this.value = thaw();
thawed = true;
}
return this.value;
}
}
public bool IsFrozen
{
get { return !thawed; }
}
}
As an example of its usage, consider this test class:
public class ThunkExample
{
public static void Main(string[] args)
{
ThunkExample ex = new ThunkExample();
string longText = "Pretend this is some long text that is expensive to split.";
ex.WithThunk(longText);
ex.WithoutThunk(longText);
}
public void WithThunk(string longText)
{
WithThunkDoWork(longText.Length, new Thunk<string[]>(delegate { return longText.Split(' '); }));
}
public void WithThunkDoWork(int strlen, Thunk<string[]> words)
{
if (strlen > 2048)
throw new ArgumentOutOfRangeException("strlen", "Must be <= 2048");
Console.WriteLine("-- WithThunk --");
foreach (string w in words.Value)
Console.WriteLine(w);
}
public void WithoutThunk(string longText)
{
WithoutThunkDoWork(longText.Length, longText.Split(' '));
}
public void WithoutThunkDoWork(int strlen, string[] words)
{
if (strlen > 2048)
throw new ArgumentOutOfRangeException("strlen", "Must be <= 2048");
Console.WriteLine("-- WithoutThunk --");
foreach (string w in words)
Console.WriteLine(w);
}
}
If you take a look at the IL for the WithThunk vs. WithoutThunk method, you'll see a fundamental difference. Specifically, WithoutThunk computes a bunch of local values, and leaves them on the stack for the following call to the WithoutThunkDoWork(...) method.
IL_0009: newarr [mscorlib]System.Char IL_000e: stloc.0 IL_000f: ldloc.0 IL_0010: ldc.i4.0 IL_0011: ldc.i4.s 32 IL_0013: stelem.i2 IL_0014: ldloc.0 IL_0015: callvirt instance string[] [mscorlib]System.String::Split(char[])
So the difference is that WithoutThunk evaluates the string[] argument at the call site, while WithThunk delays calculation to its first use in the *DoWork methods. If the data is not used at all, it doesn't get calculated. Obviously this is a contrived example, but if the delayed operation was expensive -- e.g. as in the database example cited above -- this could have tangible benefits at runtime.
A couple things would make this construct nicer sans first class CLR support. Consider if C# had mixins, for example. Thunk<T> could then derive from T, and some compiler generated code could implement simple wrapper methods that forwarded any calls to its value field. Of course this would only work if all methods were virtual, but it's a start. We could then pass thunks around pretending they were instances of a given type, and (in theory, at least) existing code would work with them just fine. Alternatively, overloading assignment and dereferencing operators might be nice, too. This would allow one to assign to and dereference a thunk as though it were just an instance of the type it wrapped. Similar to the Nullable<T> type, rarely does one actually want to access and use it as a typical object.
Lastly, a few caveats. All of this assumes that the performance hit resulting from late bound delegate calls is acceptable in your scenario. If you're wrapping an operation that has side effects, this is not a good idea for obvious reasons.
 Saturday, October 30, 2004
The FxCop team recently released a new version, 1.312. Check it out.
I've been working quite a bit lately with the guys responsible for this technology. They've poured a lot of energy into this product... and it shows. With this release, you can expect improved quality of existing rules (reducing false positives, increasing coverage), and a number of entirely new rules. This stuff is based entirely on real experience shipping managed code and is a great way to avoid common implementation mistakes and pitfalls. And it helps you to be consistent with the Framework, too, a real plus when you're shipping publically-consumable APIs.
If you're not already shipping FxCop'd code, I'd highly recommend it... :)
 Friday, October 29, 2004
I spent a few minutes tonight updating my exception program to output the EN-US value of any resource strings passed to exception constructors. Find the full updated report for mscorlib here.
Here's an example of what the String.Contains(...) method looks like:
compilercontrolled public hidebysig System.String.Contains(System.String) |
| 1. System.ArgumentNullException("source") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
|
| 2. System.ArgumentNullException("source") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions) |
| |
|
| 3. System.ArgumentNullException("value") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions) |
| |
|
| 4. System.ArgumentOutOfRangeException("startIndex",R"Index was out of range. Must be non-negative and less than the size of the collection.") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions) |
| |
|
| 5. System.ArgumentOutOfRangeException("count",R"Count must be positive and count must refer to a location within the string/array/collection.") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions) |
| |
|
| 6. System.ArgumentException(R"Value of flags is invalid.","options") |
| |
System.String.Contains(System.String) |
| |
System.String.IndexOf(System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String) |
| |
System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions) |
| |
| | As you can see, 4, 5, and 6 show the resource string used as message arguments.
 Wednesday, October 27, 2004
As a follow up to my transitive exception post a little over a week ago, I hacked together a tool that almost does what I want.
See this page for some sample output, the results of running against the Whidbey Beta 1 release of mscorlib.dll (available as a free download off of MSDN). Notice that you get a full stack trace in cases where the method in question doesn't directly throw something, but rather a piece of called code does. It seems to be pretty accurate, and is actually giving me a warm and fuzzy feeling about the state of managed code and exceptions. It doesn't look like we let a lot of crap leak out of methods at all. Having a Java-heavy background, not having checked exceptions is seemingly giving me bouts of paranoia. :)
Here's a quick list of noted exclusions the tool doesn't currently handle at all:
- Late bound method calls. Virtuals, delegates, reflection, and the like.
- Well known system exceptions coming from infrastructure or unmanaged interop.
I have plans for both, but haven't had a chance to get something workable just yet.
You'll also notice that exception constructor arguments are actually captured and output, but only when literals are used at the call site. This is very seldom the case as a result of a little thing we like to call resources that are - thankfully - used pretty ubiquitously throughout the Framework. I'm planning on doing some reflection at the time of analysis to try and tease out the right resource string. I think it's pretty neat to see this data in the report.
I would publish the code for the tool, but unfortunately I've taken a dependency on some "secret-squirrel" (Hi Mark) static analysis technology that can't leave MS's comfy confines.
Diclaimer: This information is for research only, and should not be considered official Microsoft documentation of any sort. Please refer to the .NET Framework SDK for details on exceptions that a given method might throw.
 Tuesday, October 26, 2004
I subjected a friend tonight to the oh-so-exciting task of providing feedback on my book. I'm finding outside input very helpful actually, mostly because the separation of good writing from technical content is an important one to make. Others are good at providing a relatively objective opinion on the words and sentence structure itself. I simply despise the common excuse that just because someone's a nerd, they aren't able to communicate well (or even in a grammatically correct fashion). And they get away with it, too! Not to say that I have mastered these skills yet, but I digress...
Anyhow, here it be:
“This book is so boring, it has to be good.”
 Sunday, October 17, 2004
KitG posted the Nullable specification just the other day. To me as a Microsoft newbie (yes, I still recall what it's like on the outside), the fact that folks can now readily obtain previews of technologies and read the design specifications is pretty freaking cool... one step closer to complete transparency.
 Friday, October 15, 2004
I'm trying to quickly write up a tool to compute the transitive closure of all possible exceptions resulting in an invocation to a given method. Checked exceptions are an interesting means by which to attempt compiler enforcement of this. In Java, for example, exceptions that could be thrown are well known primarily because of strict inheritance rules, removing uncertainty by disallowing overriding methods to declare that they throw new exception types (aside from strengthening through polymorphism). The fact that you must either catch or declare that you throw an exception is simply a way to instruct the compiler what you intend to throw or let seep through the cracks. Arguably it could probably do a decent job (clearly not 100%, though) at figuring it out without requiring you to explicitly state it, however. And then perhaps it just becomes a compiler warning when you don't catch something.
Anyhow, I'm trying to hammer out the logic for my program and I came up with this. It's a bit cryptic, and I've probably used some symbols in uncommon ways... oh well... it helped me to think through the situation, and will be a great help when I actually implement it.
To compute the transitive closure of possible exceptions thrown as a result of an invocation to a method:
Let d be the method in question and a tuple of <td,md>, where td is the static type being analyzed and md is the method handle. Let C(d) be the set of tuples <iC(d)i,tC(d)i,mC(d)i>, for i = 0 to |C(d)|, representing method calls reachable from within d’s IL body (regardless of certain code paths and/or code path probabilities—probabilistic analysis could be quite interesting, but is not the focus of this effort), where iC(d)i is the instance on which a method is being called (or e for static method calls), tC(d)i is the static type to which the method call refers, and mC(d)i is the method handle being called.
Let DirThrows (d) be the set of exception types for which an explicit throw instruction is present within d’s IL body.
[informative] RefThrows(d) is defined below to equal the set of exceptions that could be thrown as a result of invoking a method call inside of d’s body, transitively closed on C(d).
Let Exc(d) be the set of exception types that could by thrown as a result of invoking method d. Specifically, Exc(d) = { DirThrows(d) È RefThrows(d) | deleting any e for which d‘s body contains an enclosing exception handler that catches e or a base-type and does not issue a rethrow instruction }.
Let RefThrows(d) be union of all Ei computed as follows:
For each <iC(d)i,tC(d)i,mC(d)i> Î C(d)
If virtual(<tC(d)i,mC(d)i>) is false, let Ei = Exc(<tC(d)i,mC(d)i>).
Otherwise, if there exists an assignment to iC(d)i of type u within visibility that provides an implementation for which virtual(<u,mC(d)i>) is false or whose instance construction is also within visibility and that can be deterministically proven to be the last known assignment (i.e. non-pointer local variable), let Ei = Exc(<u,mC(d)i>).
Otherwise, let T be the set of known derivative types of tC(d)i. If |T| = 0, let Ei = Exc(<tC(d)i,mC(d)i>). Otherwise, let Ei = { the union of all Exc(<u,mC(d)i>) for each u Î T }.
 Thursday, October 14, 2004
I'm feeling better about this book project each time I sit down to work on it. My word count is increasing steadily, and as expected my ability to get in “the zone” and crank out pages is indeed improving. My first draft is actually about 3% complete! :P

As I'm working on my book, I'm trying to remain conscious about making it accessible to as wide an audience as possible. This obviously includes the hobbyist and student crowd. As such, I've been working with the VS Express SKUs... and I must say, they kick mucho butt!
Check 'em out here: http://lab.msdn.microsoft.com/express/
Yeah, yeah, I probably sound like some marketing swine, but so be it. I've always been a fan of lightweight programming environments (emacs, csc, and nant was my standard “ide“ until recently), and these certainly feel more lightweight than Enterprise. Shhh... I won't go so far as to say I prefer Express over Enterprise - I haven't used Express enough to make an informed judgement, and I'm sure there are a slew of nifty features that didn't make it into Express - but when you just want to whack out a bunch of code and don't need a fancy shmancy IDE that supports multiple languages, client- and web- programming together, and the like: this dogfood tastes great.
|
|
Me
Joe  is an architect and developer on a systems incubation project at Microsoft.
Recent
Search
Browse
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2013, Joe Duffy
|
|