RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Sunday, November 21, 2004

Generating and dealing with Scheme lambda expressions in IL is a relatively interesting problem. Scheme is statically-scoped, making the implementation a bit more straightforward than, say, Common LISP. I have not yet determined the best approach in all cases yet, but certainly have a workable approach. There are many optimizations to be had, but I’ll worry more about this at some point in the future. Interestingly, the approach I have arrived at is very similar to C# 2.0’s anonymous delegate syntax, albeit for the auto-generated delegates.

Consider a lambda expression as follows:

(let (y 0)
  (define (mkincadder x)
    (lambda (z)
      (begin
        (set! y (+ y 1))
        (+ x y z))))

This is a tad tricky, but can be roughly translated into the following C# 2.0 program:

delegate object _lambd(object z);
static int y = 0;
static _lambd mkincadder(object x)
{
  return delegate (object z) {
    y = y + 1;
    return x + y + z;
  };
}

Both a consumer in Scheme and C# will produce the same output. Here is a sample program for each, both of which print out the numbers “7,” “8,” and “9” to the console:

Scheme:
(let (a (mkincadder 5))
  (begin
    (print (a 2))
    (print (a 2))
    (print (a 2))
))

C#:
_lambd a = mkincadder(5);
Console.WriteLine(a(2));
Console.WriteLine(a(2));
Console.WriteLine(a(2));

In fact, the implementation I've devised is nearly identical to the anonymous delegate example. The one optimization I make is that the programmer is not required to define a delegate signature, as this is a gory detail hidden by the Scheme compiler itself. The compiler reuses auto-generated delegates with the same signature, essentially building up a dictionary of unique delegate signatures.

What happens under the hood is that each lambda is captured in a class with a single apply method. The value of a lambda expression is simply a delegate to this method. Any free variables from within the lambda body that are bound to something other than an argument are teased out into a class variable and captured at the time an instance is constructed. So for example, the generated class for the lambda above looks like this (pseudo-code):

public class __lambd1
{
  private int x;
  private int y;

  public __lambd1(int x, int y)
  {
    this.x = x;
    this.y = y;
  }

  public delegate object __func(object z);

  public object apply(object z)
  {
    y = y + 1;
    return x + y + z;
  }
}

public class __global
{
  public __lambd1.__func mkincadder(object x)
  {
    __lambd1 ret = new __lambd1(x, 0);
    return new __lambd1.__func(ret.apply);
  }
}

Generating an entire class is overkill for simple lambdas which don’t contain non-arg free variables, an optimization I will likely make by accumulating such functions on a single static class similar to the generated __global class containing named lambdas. Additionally, I will likely place functions that share environments on the same class...

An interesting complication arises, however, when many lambdas begin to access and mutate state within a shared environment. In fact, the concept of an environment, while straightforward to do in an environment-passing interpreter, feels like a lost concept in the clumsy translation to IL. At this point, I see several options, a few of which seem feasible.

First, consider what I mean by this:

(let (y 0)
  (define (mkincadder x)
    (lambda (z)
      (begin
        (set! y (+ y 1))
        (+ x y z))))
  (define (dec x)
    (set! y (- y x))))

Here we now have two things that can access the y variable: mkincadder and dec. Why is this problematic? Well, now we cannot simply capture free variables and store them as instance fields on individual lambda classes. We need to use some sharable location in memory which can be mutated and where the function updates will be visible to each other. For this particular example, the answer is relatively straightforward: simply put an internal y variable on the __global class, and ensure the functions that must share an environment are declared on it. In this case, that means mkincadder and dec get defined on __global. There will end up being only a single instance of __global being used at runtime, accomplishing the goal of having a shared environment.

This is the example pseudo-code for __global; not much changes for the actual lambda implementations other than referencing __global for the y variable:

public class __global
{
  internal object y = 0;
  public __lambd1.__func mkincadder(object x)
  {
    __lambd1 ret = new __lambd1(this, x);
    return new __lambd1.__func(ret.apply);
  }
  public __lambd2.__func dec(object x)
  {
    __lambd2 ret = new __lambd2(this, x);
    return new __lambd2.__func(ret.apply);
  }
}

As mentioned above, the compiler now knows that any references to y from within the lambdas must get turned into a reference to __global.y. Notice that we pass this to the constructor so it can hook a reference to its “parent environment”.

This seems to work for most cases. Here are a couple other solutions I had previously considered which I might need to rely on slight variants for in cases where the above doesn't work.

An alternative approach is to use nested classes. Items with shared environments would get generated on the same class as above, while parent environments would be implemented simply by searching up the outer class hierarchy. This approach unfortunately gets pretty hairy quickly, though, as the number of chained environments increases. As long as the Scheme compiler hides this as an implementation detail that never surfaces to consumers, that’s fine. But this is unavoidable should somebody want to use us from C#.

A slight permutation of this solution would be to use inheritance rather than nesting. That is, shared environment variables would be “inherited” using static fields, and could be “hidden” by simply injecting identically-named fields on derived classes. At code generation time, we could detect the right place to reference in the class hierarchy based on what is defined where. This is also tricky, however, because sometimes we would want to use static variables and sometimes instance, depending on whether multiple environments with different bindings for the same variable could be created (yes in most cases).

Lastly, I could preserve the notion of an environment using a dictionary-like construct passed around at runtime. This could be like a standard environment-passing interpreter, and I would have fine-grained control over what is referenced and how environments are explicitly chained together. Unfortunately, I fear that the performance would suffer greatly with this approach. (Instead of ldfld/stfld instructions, I would now have to call methods to access and store variables. Ugh.)

These are certainly interesting problems, and I am just beginning to do a bit of research on what other IL-based compilers do.

11/21/2004 1:39:19 AM (Pacific Standard Time, UTC-08:00)  #   
Tracked by:
"buy womens shoes" (buy womens shoes) [Trackback]
"buy shoes" (buy shoes cheap) [Trackback]

 

Recent Entries:

Search:

Browse by Date:
<November 2004>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011

Browse by Category:

Notables: