Thursday, November 11, 2004

3,894 lines of code.

First time successfully parsing an entire R5RS test script into a complete AST.

Able to emit a module that recognizes and creates IL for lambdas w/ only bound variables.

It feels good to actually see a real, live, breathing assembly. Thought I'd never get back above water.

11/11/2004 12:09:03 AM (Pacific Standard Time, UTC-08:00)  #   

 Sunday, November 07, 2004

When I should be writing and instead procrastinate, I come up with whacky ideas like identifying which words I make too liberal use of.

Breakdown of all words ocurring 25 or more times in my first chapter:

the: 1049 to: 691 a: 599 of: 513 is: 353
and: 332 this: 319 in: 305 you: 295 that: 292
string: 252 for: 225 =: 178 an: 178 are: 170
it: 157 as: 157 with: 155 will: 150 on: 130
if: 122 be: 120 can: 111 or: 104 console: 97
type: 96 example: 88 object: 85 method: 84 when: 81
also: 80 by: 79 these: 77 which: 74 int: 70
number: 70 code: 70 use: 70 using: 69 instance: 69
your: 67 gc: 67 types: 63 its: 62 there: 61
s: 59 value: 59 {: 59 }: 56 class: 56
do: 56 not: 56 other: 56 new: 55 //: 55
from: 55 time: 55 but: 54 simply: 54 two: 54
methods: 53 more: 52 datetime: 52 have: 50 character: 49
any: 48 at: 47 should: 45 characters: 45 strings: 44
one: 44 all: 43 return: 41 true: 40 actually: 40
need: 40 call: 40 so: 39 operations: 39 0: 39
some: 39 numbers: 38 than: 37 reference: 37 system: 37
chapter: 36 up: 36 formatting: 36 result: 36 i: 36
set: 36 just: 35 because: 35 data: 35 following: 35
often: 34 returns: 34 into: 34 information: 34 such: 34
most: 34 they: 34 array: 33 single: 32 e: 32
only: 32 same: 32 we: 32 available: 32 has: 32
each: 32 first: 31 them: 31 objects: 30 public: 30
boolean: 30 consider: 29 out: 29 exception: 29 decimal: 29
would: 29 point: 29 static: 28 10: 28 values: 28
how: 28 stringbuilder: 28 end: 27 even: 27 provides: 26
equality: 26 format: 26 instances: 26 buffer: 25

Interestingly, I have 59 opening curly braces {, but only 56 closing! Yikes... Wish I could do some lexical analysis on my Word document.

11/7/2004 2:52:39 PM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, November 04, 2004

Interesting discussion going on over on Krzysztof's blog. We recently changed our guidance for generic type parameter naming from recommending single letter names to more descriptive ones. I personally think we as programmers simply need to discover the rest of the Unicode character set...

E.g.

class IList<α> {}

class IDictionary<α, β> {}

class Converter<α, β> { }

And so on. Of course I am being flippant, but I'm not a huge fan of this change. I prefer clear and concise, and actually like that there is a mathematical feel to generics. With that said, I suppose it might be less approachable, and Ada programs did, in fact, tend to use more descriptive generic type labels. I'd call that a historical precedent against my preference. :)

I wonder what Don thinks.

11/4/2004 11:00:57 PM (Pacific Daylight Time, UTC-07:00)  #   

Essentials of Programming Languages
by Daniel P. Friedman, Mitchell Wand, Christopher T. Haynes

10 of 10. Details fundamental programming language concepts with a focus on the implementation of them, including closures, type checking, continuations, object orientation, and the like. The book gives a great overview, building a functional interpreter using Scheme along the way to illustrate and highlight points. There is just the right amount of formal notation. Highly recommended.

Structure and Interpretation of Computer Programs
by Harold Abelson, Gerald Jay Sussman

10 of 10. The Wizard book. Although it's available for free online, I ended up buying a copy. This is a classic book on fundamental concepts of programming, with a heavy focus on Scheme and functional programming constructs. Anyone who doesn't already own it... well... should. :)

Alan Turing: Life and Legacy of a Great Thinker
by Christof Teuscher

9 of 10. An amazing collection of essays covering the spectrum of Turing's life. Not super-geeky, but nonetheless a fascinating set of pieces. I've been interested in Turing ever since I read The Code Book: The Evolution Of Secrecy From Mary, Queen Of Scots To Quantum Crytography years and years ago, a historical account of cryptography with a decent coverage of his attempts and eventual success to break the ENIGMA.

Programming Ruby
by Dave Thomas

8 of 10. This is the classic text on Ruby, the programming language. This is referred to the "PickAxe" among the Ruby crowd. The first edition has been out of print for some time, so the recent re-release of a second edition is very welcome! Not only has it been updated to cover Ruby 1.8, but it has much more content than the first edition. Great book, especially for reference. (Ruby isn't known for its great documentation!)

The Haskell School of Expression
by Paul Hudak

7 of 10. This book is a brief tour of the Haskell programming language, using multimedia examples to illustrate a variety of potential uses. I found it a bit too geared towards the functional language beginner. The focus on multimedia was interesting, albeit distracting at times.

11/4/2004 10:15:34 PM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, November 02, 2004

My Scheme compiler frontend is almost at the point where I can move my focus from syntax to semantics. It successfully recognizes 100% of the Scheme grammar's tokens and constructs an AST representation for about 75% of the constructs available in R5RS. Along with this lexer/parser comes a simple toplevel read-parse-print console that parses stdin or file input and, rather than evaluating it, just prints the AST out for inspection. This has been really useful for debugging. I also have some unit tests that validate certain Scheme input produces the expect tree, also proving to be rather useful. A test is written first for the construct I'm attempting to parse, and I know I'm pretty much there when it succeeds. Very nice.

Obviously most of the difficult tasks are still to come. I've done some work on the backend, but more experimentation than anything else. Most of my time here has been spent thinking about how to make interoperability with other “mainstream” managed languages palatable. I'm actually writing this so that it's pretty modular, thinking this would facilitate plugging in different steps along the way (e.g. so that extending it with an optimizer is easy; ripping an optimizer out and tossing a more efficient one in is even simpler).

Wish I had more time to throw at this. Some day. :)

11/2/2004 9:17:14 PM (Pacific Daylight Time, UTC-07:00)  #   

A thunk is a relatively common construct in functional programming where the passing of arguments won't result in any side effects. In these cases, the compiler can silently emit code that bypasses computing the value of an argument altogether at the call site in case it isn't used in a method body at all. If it does end up being used, however, it will be lazily evaluated at the last possible moment. This is sometimes referred to as call-by-lazy-evaluation. It is said that the argument passed is frozen at the time of invocation, and thawed when it is needed. Further references typically avoid re-thawing once the initial thaw has ocurred.

It might sound odd that an argument would be passed in to a method but not used at all. But depending on the code paths a method takes, it could end up not needing to reference it at all. Consider the simplest case: where a caller spends time retrieving results from a database to construct some fancy input, but then the call fails before the target method even has a chance to inspect this data, possibly because one of the other arguments was in an invalid format or out of the valid range. It's terribly inefficient that the client had to compute this in the first place!

Just playing around, I've thrown together a Thunk<T> class. It's nothing special, but it seems to work rather nicely with C#'s new anonymous delegate syntax.

public class Thunk<T>

{

    private T value;

    private ThawFunction thaw;

    private bool thawed;

 

    public delegate T ThawFunction();

 

    public Thunk(ThawFunction thaw)

    {

        this.thaw = thaw;

    }

 

    public T Value

    {

        get

        {

            if (IsFrozen)

            {

                this.value = thaw();

                thawed = true;

            }

            return this.value;

        }

    }

 

    public bool IsFrozen

    {

        get { return !thawed; }

    }

}

As an example of its usage, consider this test class:

public class ThunkExample

{

    public static void Main(string[] args)

    {

        ThunkExample ex = new ThunkExample();

 

        string longText = "Pretend this is some long text that is expensive to split.";

        ex.WithThunk(longText);

        ex.WithoutThunk(longText);

    }

 

    public void WithThunk(string longText)

    {

        WithThunkDoWork(longText.Length, new Thunk<string[]>(delegate { return longText.Split(' '); }));

    }

 

    public void WithThunkDoWork(int strlen, Thunk<string[]> words)

    {

        if (strlen > 2048)

            throw new ArgumentOutOfRangeException("strlen", "Must be <= 2048");

 

        Console.WriteLine("-- WithThunk --");

 

        foreach (string w in words.Value)

            Console.WriteLine(w);

    }

 

    public void WithoutThunk(string longText)

    {

        WithoutThunkDoWork(longText.Length, longText.Split(' '));

    }

 

    public void WithoutThunkDoWork(int strlen, string[] words)

    {

        if (strlen > 2048)

            throw new ArgumentOutOfRangeException("strlen", "Must be <= 2048");

 

        Console.WriteLine("-- WithoutThunk --");

 

        foreach (string w in words)

            Console.WriteLine(w);

    }

}

If you take a look at the IL for the WithThunk vs. WithoutThunk method, you'll see a fundamental difference. Specifically, WithoutThunk computes a bunch of local values, and leaves them on the stack for the following call to the WithoutThunkDoWork(...) method.

  IL_0009:  newarr     [mscorlib]System.Char
  IL_000e:  stloc.0
  IL_000f:  ldloc.0
  IL_0010:  ldc.i4.0
  IL_0011:  ldc.i4.s   32
  IL_0013:  stelem.i2
  IL_0014:  ldloc.0
  IL_0015:  callvirt   instance string[] [mscorlib]System.String::Split(char[])

So the difference is that WithoutThunk evaluates the string[] argument at the call site, while WithThunk delays calculation to its first use in the *DoWork methods. If the data is not used at all, it doesn't get calculated. Obviously this is a contrived example, but if the delayed operation was expensive -- e.g. as in the database example cited above -- this could have tangible benefits at runtime.

A couple things would make this construct nicer sans first class CLR support. Consider if C# had mixins, for example. Thunk<T> could then derive from T, and some compiler generated code could implement simple wrapper methods that forwarded any calls to its value field. Of course this would only work if all methods were virtual, but it's a start. We could then pass thunks around pretending they were instances of a given type, and (in theory, at least) existing code would work with them just fine. Alternatively, overloading assignment and dereferencing operators might be nice, too. This would allow one to assign to and dereference a thunk as though it were just an instance of the type it wrapped. Similar to the Nullable<T> type, rarely does one actually want to access and use it as a typical object.

Lastly, a few caveats. All of this assumes that the performance hit resulting from late bound delegate calls is acceptable in your scenario. If you're wrapping an operation that has side effects, this is not a good idea for obvious reasons.

11/2/2004 2:54:02 AM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, October 30, 2004

The FxCop team recently released a new version, 1.312. Check it out.

I've been working quite a bit lately with the guys responsible for this technology. They've poured a lot of energy into this product... and it shows. With this release, you can expect improved quality of existing rules (reducing false positives, increasing coverage), and a number of entirely new rules. This stuff is based entirely on real experience shipping managed code and is a great way to avoid common implementation mistakes and pitfalls. And it helps you to be consistent with the Framework, too, a real plus when you're shipping publically-consumable APIs.

If you're not already shipping FxCop'd code, I'd highly recommend it... :)

10/30/2004 12:18:38 AM (Pacific Daylight Time, UTC-07:00)  #   

 Friday, October 29, 2004

I spent a few minutes tonight updating my exception program to output the EN-US value of any resource strings passed to exception constructors. Find the full updated report for mscorlib here.

Here's an example of what the String.Contains(...) method looks like:

compilercontrolled public hidebysig
System.String.Contains(System.String)
1. System.ArgumentNullException("source")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
   
2. System.ArgumentNullException("source")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions)
   
3. System.ArgumentNullException("value")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions)
   
4. System.ArgumentOutOfRangeException("startIndex",R"Index was out of range. Must be non-negative and less than the size of the collection.")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions)
   
5. System.ArgumentOutOfRangeException("count",R"Count must be positive and count must refer to a location within the string/array/collection.")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions)
   
6. System.ArgumentException(R"Value of flags is invalid.","options")
    System.String.Contains(System.String)
    System.String.IndexOf(System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String)
    System.Globalization.CompareInfo.IndexOf(System.String,System.String,System.Int32,System.Int32,System.Globalization.CompareOptions)
   

As you can see, 4, 5, and 6 show the resource string used as message arguments.

10/29/2004 12:24:26 AM (Pacific Daylight Time, UTC-07:00)  #   

 Wednesday, October 27, 2004

As a follow up to my transitive exception post a little over a week ago, I hacked together a tool that almost does what I want.

See this page for some sample output, the results of running against the Whidbey Beta 1 release of mscorlib.dll (available as a free download off of MSDN). Notice that you get a full stack trace in cases where the method in question doesn't directly throw something, but rather a piece of called code does. It seems to be pretty accurate, and is actually giving me a warm and fuzzy feeling about the state of managed code and exceptions. It doesn't look like we let a lot of crap leak out of methods at all. Having a Java-heavy background, not having checked exceptions is seemingly giving me bouts of paranoia. :)

Here's a quick list of noted exclusions the tool doesn't currently handle at all:

  • Late bound method calls. Virtuals, delegates, reflection, and the like.
  • Well known system exceptions coming from infrastructure or unmanaged interop.

I have plans for both, but haven't had a chance to get something workable just yet.

You'll also notice that exception constructor arguments are actually captured and output, but only when literals are used at the call site. This is very seldom the case as a result of a little thing we like to call resources that are - thankfully - used pretty ubiquitously throughout the Framework. I'm planning on doing some reflection at the time of analysis to try and tease out the right resource string. I think it's pretty neat to see this data in the report.

I would publish the code for the tool, but unfortunately I've taken a dependency on some "secret-squirrel" (Hi Mark) static analysis technology that can't leave MS's comfy confines.

Diclaimer: This information is for research only, and should not be considered official Microsoft documentation of any sort. Please refer to the .NET Framework SDK for details on exceptions that a given method might throw.

10/27/2004 12:29:47 AM (Pacific Daylight Time, UTC-07:00)  #   

 Tuesday, October 26, 2004

I subjected a friend tonight to the oh-so-exciting task of providing feedback on my book. I'm finding outside input very helpful actually, mostly because the separation of good writing from technical content is an important one to make. Others are good at providing a relatively objective opinion on the words and sentence structure itself. I simply despise the common excuse that just because someone's a nerd, they aren't able to communicate well (or even in a grammatically correct fashion). And they get away with it, too! Not to say that I have mastered these skills yet, but I digress...

Anyhow, here it be:

“This book is so boring, it has to be good.”

10/26/2004 11:35:57 PM (Pacific Daylight Time, UTC-07:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy