RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Friday, December 31, 2004

My flight for Boston leaves in 8 hours. I'm going home for a week to visit.

Should be a great time, lots of family and friends I haven't seen since moving out last June.

And I get to eat at Blue Ginger on New Year's eve with friends. Awesome prix fixe menu with paired wines. This'll be the third year in a row.

Posting and progress on much of anything will be light. I'll be back in the new year!

12/31/2004 1:32:15 AM (Pacific Standard Time, UTC-08:00)  #   

So I've decided to go back and try to do some static type checking for Sencha. This is for a few reasons:

  • It seems too easy not to do.
  • Many of the one-off optimizations I was contemplating would have all been solved by a more general typing strategy.
  • I'm interested in the area of type soundness in compiler implementations more so than many others, so it's a bit selfish, too.

I injected a new backend phase which traverses and "rewrites" the AST to contain typing information before doing code generation. That is, each node in the tree is given an evaluation type. The two base cases are: 1) function evaluation, where the evaluation type is the return type; and, 2) literals, where the evaluation type is the literal type. Everything else just borrows the type from one of its subordinates. There are lots of cases where I can still get stuck, however, and have to back out and resort to type erasure in such situations. This means using object everywhere, with lots of boxing and casting. This tends to have a viral nature up the AST, as once I start using erasure on a less elder node, everyone further up in the tree starts to see object as the inherited type. As an example, consider the Scheme if statement:

<if-stmt> ::= (if <test> <consequent> <alternate>)

This has the type:

<consequent> : T, <alternate> : T
-----------------------------------
<if-stmt> : T

<consequent> : T1, <alternate> : T2
-----------------------------------
<if-stmt> : typeof(object)

In other words, if <consequent> and <alternate> are both of the same type T, the type of <if-stmt> is T. Otherwise, we erase to object, and ensure to box up whatever is left on the stack; that is, if <test> evaluates to true and T1 is a value type, we must box; if <test> is false and T2 is a value type, we must box.

The chances to get stuck are numerous right now, and not just limited to funky type mismatches like that mentioned above. This is primarily my own fault, as any lambda gets generated as having an object return value. I theoretically can support arbitrary return types since each lambda is represented by a generic delegate (T1 Func1<T1,T2>(T2 a), T1 Func2<T1,T2,T3>(T2 a, T3 b), T1 Func3<T1,T2,T3,T4>(T2 a, T3 b, T4 c), and so on). But for now, I just make sure to box everything up before passing it to such a function as an argument, and ensure that from within the delegate I box any value types before returning. The bottom line here is that any time I encounter an application of a first class function, I have to resort to complete erasure of its arguments and return value which ripples up the AST. In Scheme, this is obvioulsy a pretty pervasive idiom, so I will certainly put the effort in to make this better.

The primary benefits right now are small optimizations, where I can now ask the simple question: "What type does this expression evaluate to?" and make optimizations (such as avoiding boxing) based on that. Moreover, I can bind to more appropriate overloads of primitive and Framework methods since I know the type in most cases. My example from yesterday's post with the (+ 1 2 3 4 5) Scheme expression looks a bit better now. Since I can now bind to the type-safe version, the IL looks like this:

.method public hidebysig static void  Main(string[] A_0) cil managed
{
  .entrypoint
  // Code size       91 (0x5b)
  .maxstack  4
  .locals init ([0] float64[] V_0)
  IL_0000:  ldc.i4.4
  IL_0001:  newarr     [mscorlib]System.Double
  IL_0006:  stloc      V_0
  IL_000a:  nop
  IL_000b:  nop
  IL_000c:  ldc.r8     1.
  IL_0015:  ldloc.0
  IL_0016:  ldc.i4.0
  IL_0017:  ldc.r8     2.
  IL_0020:  stelem.r8
  IL_0021:  ldloc.0
  IL_0022:  ldc.i4.1
  IL_0023:  ldc.r8     3.
  IL_002c:  stelem.r8
  IL_002d:  ldloc.0
  IL_002e:  ldc.i4.2
  IL_002f:  ldc.r8     4.
  IL_0038:  stelem.r8
  IL_0039:  ldloc.0
  IL_003a:  ldc.i4.3
  IL_003b:  ldc.r8     5.
  IL_0044:  stelem.r8
  IL_0045:  ldloc.0
  IL_0046:  call       float64 [SenchaRuntimeLibrary]Sencha.Runtime.StandardSchemeFunctions::op_Add(float64,
                                                                                                    float64[])
  IL_004b:  box        [mscorlib]System.Double
  IL_0050:  call       string [SenchaRuntimeLibrary]Sencha.Runtime.RuntimeHelper::ToString(object)
  IL_0055:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_005a:  ret
} // end of method Program::Main

Note that if I were to do something like (+ 1 2 ((lambda () 3)) 4 5), it breaks down immediately and erases everything. It should be obvious that this can bind to the op_Add(double, double[]) overload, but it actually can't. This is because, as stated above, all lambdas return object. Thus, when I evaluate ((lambda () 3)), the evaluation type of this expression turns out to be object. This will be a focus for optimizations in the future.

12/31/2004 1:14:31 AM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, December 30, 2004

I have about 75% of the standard R5RS functions implemented now. I chose to write the runtime library in C#, primarily because I wanted to make heavy use of the existing Framework and haven't shored up my Scheme<-{OtherIL} interop just yet. It's amazing how simple the mapping was. Most Scheme functions ended up being one- or two-liners that just marshalled through to existing APIs. Man, I love the Framework.

When I say {OtherIL}, I mean simply accessing C#-compiled code from Scheme, for example. It's trickier than it sounds. First, I need to figure out how to do lookups and bind to exiting .NET functions. I.e. do you simply access by the class's FQN and then search all referenced assemblies at compile-time for a match? Require assembly annotations? Introduce non-Scheme-compliant syntax? And so on. For the Scheme library functions, I have a controlled manner of adding them to the environment and accessing them.

Moreover, I can (and do!) make a lot of assumptions about consuming code I've compiled since... well... I compiled it. For example, all of the standard Scheme functions leave something on the stack after executing. (I.e. non-void return types.) Scheme differentiates between commands and expressions, the latter of which always return a value. Once I introduce methods with void return types, a lot of things will stop working, including a small feature called verifiability.

Anyhow, I've chosen to use the new Whidbey LinkedList<T> class for lists. This means that when you do a (cons x y), you're really emitting a new LinkedList in the background. (car x) will simply return the head of the list, while (cdr x) will return the portion of the list following the head. It's unfortunate, but I had to write a "sublist" function myself since the Framework doesn't have one. I chatted with Krzysztof about this, but it seems this is just the way the cookie crumbles! It was relatively simple anyhow, although not entirely trivial since a node maintains an affinity to a particular list (so new LinkedList<T>(otherList.Head.Next) won't work).

private static LinkedList<T> CloneSubList<T>(LinkedListNode<T> node)
{
    LinkedList<T> newList = new LinkedList<T>();
    while (node != null)
    {
        newList.AddTail(node.Value);
        node = node.Next;
    }
    return newList;
}

So lists are surprisingly working almost perfectly with very little work! Even the 30-odd permutations of cadr, cddar, cdadadr, cdaaaddddaaddadddaaddr, and so on. :) I really wish there were better sublist support, though, as this is a horribly inefficient way to do business. Not to mention that I think the semantics will break down at some point, for example when using calls such as (set-cdr! (cdr x) y), as it mutates the copy of the list, not the original one. Still a little work to do here. Damn all of the x! methods in Scheme! They make things just a little bit trickier... should've stuck to a Haskell implementation.

Finally, after ratholing a bit on trying to static type check (solely through inferencing), I chatted with Jim a bit on Tuesday. Python is, along with Scheme, almost always purely dynamically typed, so I was curious what he did with IronPython. Seems like he does some interesting optimizations such as pooling boxed value types, but has a similar approach to mine. As I noted earlier, this means a hell of a lot of boxing and unboxing, along with casts galore to make the verifier happy. One snippet of ridiculous IL that gets created for an if statement is as follows.

This is the Scheme code:

(if #t
    'success
    'failure)

And this is the IL:

.method public hidebysig static void  Main(string[] A_0) cil managed
{
  .entrypoint
  // Code size       43 (0x2b)
  .maxstack  2
  IL_0000:  ldc.i4.1
  IL_0001:  box        [mscorlib]System.Boolean
  IL_0006:  unbox      [mscorlib]System.Boolean
  IL_000b:  ldind.i1
  IL_000c:  brfalse    IL_001b
  IL_0011:  ldstr      "success"
  IL_0016:  br         IL_0020
  IL_001b:  ldstr      "failure"
  IL_0020:  call       string [SenchaRuntimeLibrary]Sencha.Runtime.RuntimeHelper::ToString(object)
  IL_0025:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_002a:  ret
} // end of method Program::Main

See any opportunity for optimization? Anyhow, this is a bit tangential, and is just a simple optimization I need to work on. The more worrysome fact is that I've created strongly typed overloads for all of the Scheme library functions, but can't use them right now! For example, I have:

[SchemeFunction("+")]
public static object op_Add(object z1, object[] zn);
[SchemeFunction("+")]
public static double op_Add(double z1, double[] zn);

Currently, I only ever bind to the first one. Which means stupid looking calls such as the following compilation.

Scheme:

(+ 1 2 3 4 5)

IL:

.method public hidebysig static void  Main(string[] A_0) cil managed
{
  .entrypoint
  // Code size       111 (0x6f)
  .maxstack  4
  .locals init ([0] object[] V_0)
  IL_0000:  ldc.i4.4
  IL_0001:  newarr     [mscorlib]System.Object
  IL_0006:  stloc      V_0
  IL_000a:  nop
  IL_000b:  nop
  IL_000c:  ldc.r8     1.
  IL_0015:  box        [mscorlib]System.Double
  IL_001a:  ldloc.0
  IL_001b:  ldc.i4.0
  IL_001c:  ldc.r8     2.
  IL_0025:  box        [mscorlib]System.Double
  IL_002a:  stelem.ref
  IL_002b:  ldloc.0
  IL_002c:  ldc.i4.1
  IL_002d:  ldc.r8     3.
  IL_0036:  box        [mscorlib]System.Double
  IL_003b:  stelem.ref
  IL_003c:  ldloc.0
  IL_003d:  ldc.i4.2
  IL_003e:  ldc.r8     4.
  IL_0047:  box        [mscorlib]System.Double
  IL_004c:  stelem.ref
  IL_004d:  ldloc.0
  IL_004e:  ldc.i4.3
  IL_004f:  ldc.r8     5.
  IL_0058:  box        [mscorlib]System.Double
  IL_005d:  stelem.ref
  IL_005e:  ldloc.0
  IL_005f:  call       object [SenchaRuntimeLibrary]Sencha.Runtime.StandardSchemeFunctions::op_Add(object,
                                                                                                   object[])
  IL_0064:  call       string [SenchaRuntimeLibrary]Sencha.Runtime.RuntimeHelper::ToString(object)
  IL_0069:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_006e:  ret
} // end of method Program::Main

Gawd, that hurts the eyes! It's quite obvious I can inference in this case (as everything's a ldc.x), but it begins to break down quick enough.

His advice was to forego any attempts at type inferencing, or even the thought of adding static type annotations, until the purely dynamic version is working. I had started to come to the same conclusion after spending a lot of time on this. This is a tough problem to solve, even for languages that have static type annotations!

To finish the post off, here is a set of simple tests that now work correctly. I had on the order of 30 tests functioning now. Only about 150 to go. :)

(number? 35.05)
=> #t

((lambda (x y)
   (x y))
   (lambda (x)
     ((lambda (x) (* x y))
     x))
   3)
=> 9

(let ((gle (lambda (x y)
             (cond ((> x y) 'greater)
                   ((< x y) 'less)
                   (else 'equal)))))
     (gle 3 10))
=> less

(car (cons 3 5))
=> 3

(cadr (cons 3 5))
=> 5

12/30/2004 12:32:29 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 28, 2004

I just posted an internal thread we had recently to Channel9. The topic was a new set of DG updates resulting from security reviews. The thread actually ended up being a bit long (and had broken off into many semi-related tangents), so it had to be cut at about 1/4 of the entire thread.

It's worth a read.

12/28/2004 11:38:21 AM (Pacific Standard Time, UTC-08:00)  #   

I finished up the latest round of books just in time for a few more, two of which were holiday gifts.

Types and Programming Languages
by Benjamin C. Pierce

10 of 10. Contains a pragmatic view of language typing by using formal operational semantics. It begins with an untyped lambda calculus and moves to simply-typed on up through more complex language syntax. The text is very well organized, introducing new topics in a logical progression from untyped to fully typed language syntax. The book uses ML to demonstrate example implementations along the way.

Formal Semantics
by Glynn Winskel

9 of 10. This is a very mathematically-oriented text on formal mechanisms for representing programming language semantics. Includes coverage of domain theory, and operational, denotational, and natural semantics, plus special coverage of parallel and nondeterministic formalisms. Very dense, but very well written.

Computer Architecture: A Quantitative Approach
by John L. Hennessy, David A. Patterson, David Goldberg

6 of 10. Decent, albeit introductory and very sparse, coverage of various computer (hardware) architecture topics. This includes coverage of instruction set design, parallelism, pipelining, multi-core, and a variety of other interesting things. The biggest disappointment is the lack of depth in the topics covered. Still recommended as a quick reference to occupy your bookshelf.

12/28/2004 11:34:31 AM (Pacific Standard Time, UTC-08:00)  #   

 Monday, December 27, 2004

The beauty of podcasting is that it gives everyone a voice, even those who might not typically have a channel through which to deliver a message. Race, socio-ecnomic class, and geographic don't matter. Solely based on the merit of content (or the individual), other people can subscribe and listen in on a regular basis. This is obviously one of the great facets of blogging, too.

Unfortunately, the podsphere is overpopulated with the incessant ramblings of boring people who lack anything interesting to say. Yes, they certainly deserve a voice. However, with hundreds of thousands of U.S. troops overseas and similar numbers of people (more?) affected by the recent earthquake and tsunamis, all of which would probably love to have their story heard (and quite frankly, would likely have a deeper message to send), why is this so? I realize podcasting isn't the first thing somebody thinks of in a tragic situation, but for the troops especially--where they must deal with lengthy stays overseas and have family members who likely have the necessary technology to receive such data--it seems this would be an amazing use of the technology. I suspect it's primarily due to lack of equipment and infrastructure.

We need to change that.

This is one reason why I intend to continue investing my life in making technology better and improving its reach: So that we don't have lame excuses such as this standing in the way of the creation and dissemination of information, regardless of who wants to publish or subscribe.

12/27/2004 1:21:08 PM (Pacific Standard Time, UTC-08:00)  #   

 Friday, December 24, 2004

I can now compile simple Scheme function calls into IL, such as

(+ 5 10)

As well as lambda applications

((lambda (x)
   (* x x))
 9)

((lambda (x y)
   (* x y))
 9 3)

And I was just able to compile (a workable copy!) of the following function call. It was a little tricky because the first lambda deals with higher-order functions... i.e. it takes a function as its first argument:

((lambda (x y)
   (* (x y) y))
 (lambda (x) (* x 2)) 3)

While it may not be immediately obvious, the correct output for this is 18. Indeed, that is what gets printed. :)

Below is the IL that gets generated by my compiler for this last program. Notice the hideous overuse of boxing. This will change. I already have a bunch of so-called “fast” entry points for all of the standard Scheme functions, and I intend to do some trickery with the lambda closure classes. Since Scheme doesn't have static typing, anything I introduce is just an optimization, and will have to rely solely on the use of literals to infer the type. Everything else has to be run-time type checked, which means objects and boxed value types everywhere. I might be able to get a little clever here, but truthfully haven't spent enough time thinking about it.

Lambdas are represented as Closure objects which have corresponding Function delegates. This is how procedure calls to lambdas occur. You might be able to discern from the IL that evaluation of a lambda generates the new class, instantiates it (passing any free variables to its constructor), and leaves a delegate on the stack. This will not be changing... (Unless I find that performance is poor, of course. From some benchmarking, however, it seems that our Whidbey delegate invocation improvements reduce the overhead versus a straight virtually dispatched call to just under 10%.) One optimization I will certainly be making however, is avoiding the creation of a Closure-derived class altogether for simple lambdas without free variables. In this case, it'll just end up as a static method.

Anyhow, here's the code. Sorry I don't have time to go into great detail right now.

.class public auto ansi Program
       extends [mscorlib]System.Object
{
  .method public hidebysig static void  Main(string[] A_0) cil managed
  {
    .entrypoint
    // Code size       59 (0x3b)
    .maxstack  4
    IL_0000:  newobj     instance void __lambda0::.ctor()
    IL_0005:  dup
    IL_0006:  ldvirtftn  instance object __lambda0::Apply2(object,
                                                           object)
    IL_000c:  newobj     instance void class [SenchaRuntimeLibrary]Sencha.Runtime.'Func2`3'<object,object,object>::.ctor(object,
                                                                                                                         native int)
    IL_0011:  newobj     instance void __lambda1::.ctor()
    IL_0016:  dup
    IL_0017:  ldvirtftn  instance object __lambda1::Apply1(object)
    IL_001d:  newobj     instance void class [SenchaRuntimeLibrary]Sencha.Runtime.'Func1`2'<object,object>::.ctor(object,
                                                                                                                  native int)
    IL_0022:  ldc.r8     3.
    IL_002b:  box        [mscorlib]System.Double
    IL_0030:  call       instance !0 class [SenchaRuntimeLibrary]Sencha.Runtime.'Func2`3'<object,object,object>::Invoke(!1,
                                                                                                                        !2)
    IL_0035:  call       void [mscorlib]System.Console::WriteLine(object)
    IL_003a:  ret
  } // end of method Program::Main

  .method public specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  2
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor

} // end of class Program

.class public auto ansi sealed __lambda0
       extends class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure2`3'<object,object,object>
{
  .method public hidebysig virtual instance object
          Apply2([in] object x,
                 [in] object y) cil managed
  {
    .override  method instance !0 class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure2`3'<object,object,object>::Apply2(!1,
                                                                                                                        !2)
    // Code size       14 (0xe)
    .maxstack  2
    IL_0000:  ldarg.1
    IL_0001:  ldarg.2
    IL_0002:  call       instance !0 class [SenchaRuntimeLibrary]Sencha.Runtime.'Func1`2'<object,object>::Invoke(!1)
    IL_0007:  ldarg.2
    IL_0008:  call       object [SenchaRuntimeLibrary]Sencha.Runtime.StandardSchemeFunctions::Multiply(object,
                                                                                                       object)
    IL_000d:  ret
  } // end of method __lambda0::Apply2

  .method public specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  2
    IL_0000:  ldarg.0
    IL_0001:  call       instance void class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure2`3'<object,object,object>::.ctor()
    IL_0006:  ret
  } // end of method __lambda0::.ctor

} // end of class __lambda0

.class public auto ansi sealed __lambda1
       extends class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure1`2'<object,object>
{
  .method public hidebysig virtual instance object
          Apply1([in] object x) cil managed
  {
    .override  method instance !0 class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure1`2'<object,object>::Apply1(!1)
    // Code size       21 (0x15)
    .maxstack  2
    IL_0000:  ldarg.1
    IL_0001:  ldc.r8     2.
    IL_000a:  box        [mscorlib]System.Double
    IL_000f:  call       object [SenchaRuntimeLibrary]Sencha.Runtime.StandardSchemeFunctions::Multiply(object,
                                                                                                       object)
    IL_0014:  ret
  } // end of method __lambda1::Apply1

  .method public specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  2
    IL_0000:  ldarg.0
    IL_0001:  call       instance void class [SenchaRuntimeLibrary]Sencha.Runtime.'Closure1`2'<object,object>::.ctor()
    IL_0006:  ret
  } // end of method __lambda1::.ctor

} // end of class __lambda1

12/24/2004 3:17:01 AM (Pacific Standard Time, UTC-08:00)  #   

Paul Graham has some very thought proviking opinions, expressed both in his online essays and new book.

I just stumbled across this audio recording of his OSCON talk from earlier this year. The topic is great hackers. One of the best talks I've heard in a long time.

The roughly equivalent transcript can be found here.

12/24/2004 1:47:30 AM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, December 22, 2004

I'm going to get into the regular habit of soliciting feedback on book content. The responses here were very helpful.

Right now, I'm wrapping up the chapter on mutli-threading which covers app-domains, threading, and process management. I've tried to cover of all the standard concurrency and threading topics, asynchronous API patterns, synchronization and the Win32 WaitHandle-based resources, and both the System.AppDomain and System.Diagnostics.Process classes.

Is there anything specific to this area you'd like to see covered?

Any common pain points that .NET newcomers (or even experienced developers) seem to get hung up on when it comes to this topic?

Any anti-patterns you see emerging that the general public should know about?

Feedback would be much appreciated.

12/22/2004 1:16:06 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 21, 2004

Early last year, Brad, Krzystzof, Steven, Rico, and several other people put together a great internal training class on best practices for developing managed APIs. The course targeted primarily any PMs, developers, or testers that are actively building products that expose public APIs and run on the .NET Framework. It's also relevant to people who are designing for re-use within their own projects, even if they don't expose any APIs per se. It was a huge success here at Microsoft.

And, I'm pleased to let you know that we recorded the entire thing, did some editing and tweaks over the last months, and now it's ready to put into your hands!

Starting January 7th (Friday), we'll be posting this content up on MSDN, staggered over a period of time. We'll also be coordinating chats with the experts behind each session shortly following the posting of each video... you can ask questions or engage in conversation with the architects, developers, and PMs responsible for the class content. Frank's still putting together the concrete details, so I'm being intentionally somewhat vague. :)

The courses cover a lot of the material from the Design Guidelines document (plus some extra goodies), with ~1hr dedicated sessions on:

  • Naming conventions
  • API design basics
  • Usability
  • Designing for subclassing
  • Security
  • Performance
  • Designing in a managed memory world
  • Interop
  • Packaging and deployment
  • And a whole lot more!

I'll post more detailed information as the plans solidify.

Update: Brad posted a link to an older post where he details class topics a bit further.

12/21/2004 10:56:40 AM (Pacific Standard Time, UTC-08:00)  #   

 

Recent Entries:

Search:

Browse by Date:
<December 2004>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

Browse by Category:

Notables: