Some people picked up on my reference to an obscure number grammar last week, so I figured I'd post briefly about my intentions.
I am cooking up a managed Scheme interpreter and compiler. There are still several unknowns at this point... for example, I'm not even sure what approach I'm going to take with regards to licensing and releasing the source (it's primarily just an academic exercise for my own benefit). My actual code generation story isn't fully baked yet, either. Regardless, I am keeping a sort of dev-log which I intend to transform into a paper or set of blog posts when it makes sense to do so. So these tough decisions should be well documented.
I'm hand-authoring the front- and back-ends, knowing fully that there are tools out there that could auto-generate much of it for me. Call me a control freak. The process has been very interesting thus far, actually (albeit a bit tedious at times... e.g. do I put one or two characters back on the buffer?). I'm continuously reminded of the age-old search for the perfect mix of generalized and specialized code.
The milestone I spoke of last week was successfully implementing the lexer and its ancillary components. I found some compliance scripts on the 'net which are proving handy for testing. When I say it's complete, I mean complete: it lexes any valid Scheme, and has an error-detection/recovery strategy that I'm fairly happy with. Took about 1.5klocs.
Granted, Scheme has a pretty straightforward lexical structure... the super-difficult bit is certainly the code generation. I think the biggest challenge for the entire effort will be getting acceptable parse/emit/execute performance when operating in interpreter mode (the default).