Saturday, September 18, 2010

I have several positions open on my team here at Microsoft.

My team's responsibility spans multiple aspects of a research operating system’s programming model. The three main areas are concurrency, languages, and frameworks. When I say concurrency, I mean things like asynchrony and message passing, data and task parallelism, distributed parallelism, runtime scheduling and resource management, and heterogeneity and GPGPU. When I say languages, I mean type systems, mostly-functional programming, verified safe concurrency, and both front- and back-end compilation. And when I say frameworks, I mean virtually anything you could imagine wanting out of a platform framework: all things XML, data interoperability (database, web services, etc.), collections, transactions, multimaster synchronization, and even low level things, like regex, numerics, and globalization.

Our team is 100% developers, and we have an “everybody codes, everybody loves to code” culture. Even managers are expected to spend a significant amount of time prolifically writing code.

All of these components are new and built from the ground up. So self-drive and an ability to have a vision and make it happen are incredibly important.

We are always happy to hire great, hard-working people, regardless of years of experience. If you’re extremely strong in one or more of the abovementioned areas, more of a generalist, are an amazing coder, or all of the above, you’d fit in perfectly. This is the most amazing team of people I’ve ever worked with. If you are interested, please email your resume to me at joedu AT microsoft DOT com.

9/18/2010 11:42:27 AM (Pacific Daylight Time, UTC-07:00)  #   

 Monday, September 06, 2010

I can’t tell you how many times I’ve heard the age-old adage echoed inappropriately and out of context:

"We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil"
            -- Donald E. Knuth, Structured Programming with go to Statements

I have heard the "premature optimization is the root of all evil" statement used by programmers of varying experience at every stage of the software lifecycle, to defend all sorts of choices, ranging from poor architectures, to gratuitous memory allocations, to inappropriate choices of data structures and algorithms, to complete disregard for variable latency in latency-sensitive situations, among others.

Mostly this quip is used defend sloppy decision-making, or to justify the indefinite deferral of decision-making. In other words, laziness. It is safe to say that the very mention of this oft-misquoted phrase causes an immediate visceral reaction to commence within me... and it’s not a pleasant one.

In this short article, we’ll look at some important principles that are counter to what many people erroneously believe this statement to be saying. To save you time and suspense, I will summarize the main conclusions: I do not advocate contorting oneself in order to achieve a perceived minor performance gain. Even the best performance architects, when following their intuition, are wrong 9 times out of 10 about what matters. (Or maybe 97 times out of 100, based on Knuth’s numbers.) What I do advocate is thoughtful and intentional performance tradeoffs being made as every line of code is written. Always understand the order of magnitude that matters, why it matters, and where it matters. And measure regularly! I am a big believer in statistics, so if a programmer sitting in his or her office writing code thinks just a little bit more about the performance implications of every line of code that is written, he or she will save an entire team that time and then some down the road. Given the choice between two ways of writing a line of code, both with similar readability, writability, and maintainability properties, and yet interestingly different performance profiles, don’t be a bozo: choose the performant approach. Eschew redundant work, and poorly written code. And lastly, avoid gratuitously abstract, generalized, and allocation-heavy code, when slimmer, more precise code will do the trick.

Follow these suggestions and your code will just about always win in both maintainability and performance.

Understand the order of magnitude that matters

First and foremost, you really ought to understand what order of magnitude matters for each line of code you write.

In other words, you need to have a budget; what can you afford, and where can you afford it? The answer here changes dramatically depending on whether you’re writing a device driver, reusable framework library, UI control, highly-connected network application, installation script, etc. No single answer fits all.

I am personally used to writing code where 100 CPU cycles matters. So invoking a function that acquires a lock by way of a shared-memory interlocked instruction that may take 100 cycles is something I am apt to think hard about; even more worrisome is if that acquisition could block waiting for 100,000 cycles. Indeed this situation could become disastrous under load. As you can tell, I write a lot of systems code. If you’re working on a network-intensive application, on the other hand, most of the code you write is going to be impervious to 100 cycle blips, and more sensitive to efficient network utilization, scalability, and end-to-end performance. And if you’re writing a little one-time script, or some testing or debugging program, you may get away with ignoring performance altogether, even multi-million cycle network round-trips.

To be successful at this, you’ll need to know what things cost. If you don’t know what things cost, you’re just flailing in the dark, hoping to get lucky. This includes rule of thumb order of magnitudes for primitive operations – e.g. reading / writing a register (nanoseconds, single-digit cycles), a cache hit (nanoseconds, tens of cycles), a cache miss to main memory (nanoseconds, hundreds of cycles), a disk access including page faults (micro- or milliseconds, millions of cycles), and a network roundtrip (milliseconds or seconds, many millions of cycles) – in addition to peering beneath opaque abstractions provided by other programmers, to understand their best, average, and worst case performance.

Clearly the concerns and situations you must work to avoid change quite substantially depending on the class of code you are writing, and whether the main function of your program is delivering a user experience (where usability reigns supreme), delivering server-side throughput, etc. Thinking this through is crucial, because it helps avoid true "premature optimization" traps where a programmer ends up writing complicated and convoluted code to save 10 cycles, when he or she really needs to be thinking about architecting the interaction with the network more thoughtfully to asynchronously overlap round-trips. Understanding how performance impacts the main function of your program drives all else.

Pay attention to interoperability between layers of separately authored software that is composed together. The most common cause of hangs is that an API didn’t specify the expected performance, and so a UI programmer ended up using it in an innocuous but inappropriate way, because they couldn’t afford the range of order of magnitude cost that the API’s performance was expected to fall within. Hangs aren’t the only manifestation; O(N^2), or worse, performance can also result, if, for example, a caller didn’t realize the function called was going to enumerate a list in order to generate its results.

It is also important to think about worst case situations. What happens if that lock is held for longer than expected, because the system is under load and the scheduler is overloaded? And what if the owning thread was preempted while holding the lock, and now will not get to run again for quite some time? What happens if the network is saturated because a big news event is underway, or worse, the phone network is intermittently cutting out, the network cable has been unplugged, etc.? What about the case where, because a user has launched far too many applications at once, your memory-intensive operation that usually enjoys nice warmth and locality suddenly begins waiting for the disk on the majority of its memory accesses, due to demand paging? These things happen all the time.

In each of these situations, you can end up paying many more orders of magnitude in cost than you expected under ordinary circumstances. The lock acquisition that usually took 100 CPU cycles now takes several million cycles (as long as a network roundtrip), and the network operation that is usually measured in milliseconds is now measured in tens of seconds, as the software painfully waits for the operation to time out. And your "non-blocking" memory-intensive algorithm on the UI thread just caused a hang, because it’s paging like crazy.

You’ve experienced these problems as a user of modern software, I am sure, and it isn’t fun. An hourglass, spinning donut, unresponsive button clicks, "(Not Responsive)" title bars, and bleachy white screens. An important measurement of a programmer’s worth is how good the code they write operates under the extreme and rare circumstances. Because, truth be told, when you have a large user-base, these circumstances aren’t that rare after all. This is more of a "performance in the large" thing, but it turns out that the end result is delivered as a result of many "performance in the small" decisions adding up. A developer wrote code meant to be used in a particular way, but decided what order of magnitude was reasonable based on best case, … and gave no thought to the worst case.

Using the right data structure for the job

This is motherhood and apple pie, Computer Science 101, … bad clichés abound. And yet so many programmers get this wrong, because they simply don’t give it enough thought.

One of my favorite books on the topic ("Programming Pearls") has this to say about them:

"Most programmers have seen them, and most good programmers realize they’ve written at least one.
            They are huge, messy, ugly programs that should have been short, clean, beautiful programs."

I’ll add one adjective to the "short, clean, beautiful" list: fast.

Data structures drive storage and access behavior, both strongly affecting the size and speed of algorithms and components that make use of them. Worst case really does matter. This too is a case where the right choice will boost not only performance but also the cleanliness of the program.

I’m actually not going to spend too much time on this; when I said this is CS101, I meant it. However, it is crucial to be intentional and smart in this choice. Validate assumptions, and measure.

Ironically, in my experience, many senior programmers can make frighteningly bad data structure choices, often because they are more apt to choose a sophisticated and yet woefully inappropriate one. They may choose a linked list, for example, because they want zero-allocation element linking via an embedded next pointer. And yet they then end up with many lists traversals throughout the program, where a dense array representation would have been well worth the extra allocation. The naïve programmer would have happily new’d up a List<T>, and avoided some common pitfalls; yet, here the senior guy is working as hard as humanly possible to avoid a single extra allocation. They over-optimized in one dimension, and ignored many others that mattered more.

This same class of programmer may choose a very complicated lock-free data structure for sharing elements between threads, incurring many more object allocations (and thus increased GC pressure), and a large number of expensive interlocked operations scattered throughout the code. The sexy lure of lock-freedom tricked them into making a bad choice. Perhaps they didn’t quite understand that locks and lock-free data structures share many costs in common. Or perhaps they just hoped to get lucky and squeeze out out-of-this-world scalability thanks to lock-freedom, without actually considering the access patterns necessary to lead to such scalability and whether their program employed them.

These are often held up as examples of "premature optimization", but I hold them up as examples of "careless optimization". The double kicker here is that the time spent building the more complicated solution would have been better spent carefully thinking and measuring, and ultimately deciding not to be overly clever in the first place. This most often plagues the mid-range programmer, who is just smart enough to know about a vast array of techniques, but not yet mature enough to know when not to employ them.

A different, better-performing approach

It’s an all-too-common occurrence. I’ll give code review feedback, asking "Why didn’t you take approach B? It seems to be just as clear, and yet obviously has superior performance." Again, this is in a circumstance where I believe the difference matters, given the order of magnitude that matters for the code in question. And I’ll get a response, "Premature optimization is the root of all evil." At which point I want to smack this certain someone upside the head, because it’s such a lame answer.

The real answer is that the programmer didn’t stop to carefully consider alternatives before coding up solution A. (To be fair, sometimes good solutions evade the best of us.) The reality is that the alternative approach should have been taken; it may be true that it’s "too late" because the implications of the original decision were nontrivial and perhaps far-reaching, but that is too often an unfortunate consequence of not taking the care and thought to do it right in the first place.

These kinds of "peanut butter" problems add up in a hard to identify way. Your performance profiler may not obviously point out the effect of such a bad choice so that it’s staring you in your face. Rather than making one routine 1000% slower, you may have made your entire program 3% slower. Make enough of these sorts of decisions, and you will have dug yourself a hole deep enough to take a considerable percentage of the original development time just digging out. I don’t know about you, but I prefer to clean my house incrementally and regularly rather than letting garbage pile up to the ceilings, with flies buzzing around, before taking out the trash. Simply put, all great software programmers I know are proactive in writing clear, clean, and smart code. They pour love into the code they write.

In this day and age, where mobility and therefore power is king, instructions matter. My boss is fond of saying "the most performant instruction is the one you didn’t have to execute." And it’s true. The best way to save battery power on mobile phones is to execute less code to get the same job done.

To take an example of a technology that I am quite supportive of, but that makes writing inefficient code very easy, let’s look at LINQ-to-Objects. Quick, how many inefficiencies are introduced by this code?

int[] Scale(int[] inputs, int lo, int hi, int c) {
    var results = from x in inputs
                  where (x >= lo) && (x <= hi)
                  select (x * c);
    return results.ToArray();
}

It’s hard to account for them all.

There are two delegate object allocations, one for the call to Enumerable.Where and the other for the call to Enumerable.Select. These delegates point to potentially two distinct closure objects, each of which has captured enclosing variables. These closure objects are instances of new classes, which occupy nontrivial space in both the binary and at runtime. (And of course, the arguments are now stored in two places, must be copied to the closure objects, and then we must incur extra indirections each time we access them.) In all likelihood, the Where and Select operators are going to allocate new IEnumerable and new IEnumerator objects. For each element in the input, the Where operator will make two interface method calls, one to IEnumerator.MoveNext and the other to IEnumerator.get_Current. It will then make a delegate call, which is slightly more expensive than a virtual method call on the CLR. For each element for which the Where delegate returns ‘true’, the Select operator will have likewise made two interface method calls, in addition to another delegate invocation. Oh, and the implementations of these likely use C# iterators, which produce relatively fat code, and are implemented as state machines which will incur more overhead (switch statements, state variables, etc.) than a hand-written implementation.

Wow. And we aren’t even done yet. The ToArray method doesn’t know the size of the output, so it must make many allocations. It’ll start with 4 elements, and then keep doubling and copying elements as necessary. And we may end up with excess storage. If we end up with 33,000 elements, for example, we will waste about 128KB of dynamic storage (32,000 X 4-byte ints).

A programmer may have written this routine this way because he or she has recently discovered LINQ, or has heard of the benefits of writing code declaratively. And/or he or she may have decided to introduce a more general purpose implementation of a Scale API versus doing something specific to the use in the particular program that Scale will be immediately used in. This is a great example of why premature generalization is often at odds with writing efficient code.

Imagine an alternative universe, on the other hand, where Scale will only get used once and therefore we can take advantage of certain properties of its usage. Namely, perhaps the input array need not be preserved, and instead we can update the elements matching the criteria in place:

void ScaleInPlace(int[] inputs, int lo, int hi, int c) {
    for (int i = 0; i < inputs.Length; i++) {
        if ((inputs[i] >= lo) & (inputs[i] <= hi)) {
            inputs[i] *= c;
        }
    }
}

A quick-and-dirty benchmark shows this to be an order of magnitude faster. Again, is it an order of magnitude that you care about? Perhaps not. See my earlier thoughts on that particular topic. But if you care about costs in the 100s or 1000s of cycles range, you probably want to pay heed.

Now, I’m not trying to take potshots at LINQ. It was just an example. In fact, I spent 3 years running a team that delivered PLINQ, a parallel execution engine for LINQ-to-Objects. LINQ is great where you can afford it, and/or where the alternatives do not offer ridiculously better performance. For example, if you can’t do in-place updates, functionally producing new data is going to require allocations no matter which way you slice it. But having watched people using PLINQ, I have witnessed numerous occasions where an inordinately expensive query was made 8-times faster by parallelizing… where the trivial refactoring into a slimmed down algorithm with proper data structures would have speed the code up by 100-fold. Parallelizing a piggy piece of code to make it faster merely uses more of the machine to get the same job done, and will negatively impact power, resource management, and utilization.

Another view is that writing code in this declarative manner is better, because it’ll just get faster as the compiler and runtimes enjoy new optimizations. This sounds nice, and seems like taking a high road of some sort. But what usually matters is today: how does the code perform using the latest and greatest technology available today. And if you scratch underneath the surface, it turns out that most of these optimizations are what I call "science fiction" and unlikely to happen. If you write redundant asserts at twenty adjacent layers of your program, well, you’re probably going to pay for them. If you allocate objects like they are cheap apples growing on trees, you’re going to pay for them. True, optimizations might make things faster over time, but usually not in the way you expect and usually not by orders of magnitude unless you are lucky.

A colleague of mine used to call C a WYWIWYG language—"what you write is what you get"—wherein each line of code roughly mapped one-to-one, in a self-evident way, with a corresponding handful of assembly instructions. This is a stark contrast to C#, wherein a single line of code can allocate many objects and have an impact to the surrounding code by introducing numerous silent indirections. For this reason alone, understanding what things cost and paying attention to them is admittedly more difficult – and arguably more important – in C# than it was back in the good ole’ C days. ILDASM is your friend … as is the disassembler. Yes, good systems programmers regularly look at the assembly code generated by the .NET JIT. Don’t assume it generates the code you think it does.

Gratuitous memory allocations

I love C#. I really do. I was reading my "Secure Coding in C and C++" book for fun this weekend, and it reminded me how many of those security vulnerabilities are eliminated by construction thanks to type- and memory-safety.

But the one thing I don’t love is how easy and invisible it makes heap allocations.

The very fact that C++ memory management is painful means most C++ programmers are overly-conscious about allocation and footprint. Having to opt-in to using pointers means developers are conscious about indirections, rather than having them everywhere by default. These are qualitative, hard-to-backup statements, but in my experience they are quite true. It’s also cultural.

Because they are so easy, it’s doubly important to be on the lookout for allocations in C#. Each one adds to a hard-to-quantify debt that must be repaid later on when a GC subsequently scans the heap looking for garbage. An API may appear cheap to invoke, but it may have allocated an inordinate amount of memory whose cost is only paid for down the line. This is certainly not "paying it forward."

It’s never been so easy to read a GB worth of data into memory and then haphazardly leave it hanging around for the GC to take care of at some indeterminate point in the future as it is in .NET. Or an entire list of said data. Too many times a .NET program holds onto GBs of working set, when a more memory conscientious approach would have been to devise an incremental loading strategy, employ a denser representation of this data, or some combination of the two. But, hey, memory is plentiful and cheap! And in the worst case, paging to disk is free! Right? Wrong. Think about the worst case.

Depending on the size of the objects allocated, how long they remain live, and how many processors are being used, the subsequent GCs necessary to clean up incessant allocations may impact a program in a difficult to predict way. Allocating a bunch of very large objects that live for long enough to make it out of the nursery, but not forever, for instance, is one of the worst evils you can do. This is known as mid-life crisis. You either want really short-lived objects or really long-lived ones. But in any case, it really matters: the LINQ example earlier shows how easy it is to allocate crap without seeing it in the code.

If I could do it all over again, I would make some changes to C#. I would try to keep pointers, and merely not allow you to free them. Indirections would be explicit. The reference type versus value type distinction would not exist; whether something was a reference would work like C++, i.e. you get to decide. Things get tricky when you start allocating things on the stack, because of the lifetime issues, so we’d probably only support stack allocation for a subset of primitive types. (Or we’d employ conservative escape analysis.) Anyway, the point here is to illustrate that in such a world, you’d be more conscious about how data is laid out in memory, encouraging dense representations over sparse and pointer rich data structures silently spreading all over the place. We don't live in this world, so pretend as though we do; each time you see a reference to an object, think "indirection!" to yourself and react as you would in C++ when you see a pointer dereference.

Allocations are not always bad, of course. It’s easy to become paranoid here. You need to understand your memory manager to know for sure. Most GC-based systems, for example, are heavily tuned to make successive small object allocations very fast. So if you’re programming in C#, you’ll get less "bang for your buck" by fusing a large number of contiguous object allocations into a small number of large object allocations that ultimately occupy an equivalent amount of space, particularly if those objects are short-lived. Lots of little garbage tends to be pretty painless, at least relatively.

Variable latency and asynchrony

There aren’t many ways to introduce a multisecond delay into your program at a moment’s notice. But I/O can do just that.

Code with highly variable latency is dangerous, because it can have dramatically different performance characteristics depending on numerous variables, many of which are driven by environmental conditions outside of your program’s control. As such it is immensely important to document where such variable latency can occur, and to program defensively against it happening.

For example, imagine a team of twenty programmers building some desktop application. The team is just large enough that no one person can understand the full details of how the entire system works. So you’ve got to compose many pieces together. (As I mentioned earlier, composition can lead to hard-to-predict performance characteristics.) Programmer Alice is responsible for serving up a list of fonts, and Programmer Bob is consuming that list to paint it on the UI. Does Bob know what it takes to fetch the list of fonts? Probably not. Does Alice know the full set of concerns that Bob must think about to deliver a responsive UI, like incremental repaints, progress reporting, and cancellation? Probably not. So Alice does the best she knows how to do: she hits the cache, when the font cache is fully populated, and falls back to fetching fonts from the printer otherwise. She returns a List<Font> object from her API. Now Bob just calls the API and paints the results on the UI; the call appears to be quite snappy in his testing. Unfortunately, when the cache is not populated, the "quick" cache hit turns into a series of network roundtrips with the printer; that hangs the UI, but only for a few milliseconds. Even worse, when the printer is accidentally offline, perhaps due to a power outage, the UI freezes for twenty seconds, because that’s the hard-coded timeout. Ouch!

This situation happens all the time. It’s one of the most common causes of UI hangs.

If you’re programming in an environment where asynchrony is first class, Alice could have advertised the fact that, under worst-case circumstances, fetching the font list would take some time. If she were programming in .NET, for example, she’d return a Task<List<Font>> rather than a List<Font>. The API would then be self-documenting, and Bob would know that waiting for the task’s result is dangerous business. He knows, of course, that blocking the UI thread often leads to responsiveness problems. So he would instead use the ContinueWith API to rendezvous with the results once they become available. And Bob may now know he needs to go back and work more closely with Alice on this interface: to ensure cancellation is wired up correctly, and to design a richer interface that facilitates incremental painting and progress reporting.

Variable latency is not just problematic for responsiveness reasons. If I/O is expressed synchronously, a program cannot efficiently overlap many of them. Imagine we must make three network calls as part of completing an operation, and that each one will take 25 milliseconds to complete. If we do synchronous I/O, the whole operation will take at least 75 milliseconds. If we launch do asynchronous I/O, on the other hand, the operation may take as few as 25 milliseconds to finish. That’s huge.

If I had my druthers, all I/O would be asynchronous. But that’s not where we are today.

The concern is not limited to just I/O, of course. Compute- and memory-bound work can quickly turn into variable latency work, particularly under stressful situations like when an application is paging. So truthfully any abstraction doing "heavy lifting" should offer an asynchronous alternative.

Examples of "bad" optimizations

It is easy to take it too far. Even if you’re shaving off cycles where each-and-every cycle matters, you may be doing the wrong thing. If anything, I hope this article has convinced you to be thoughtful, and to strive to strike a healthy balance between beautiful code and performance.

Anytime the optimization sacrifices maintainability, it is highly suspect. Indeed, many such optimizations are superficial and may not actually improve the resulting code’s performance.

The worst category of optimization is one that can lead to brittle and insecure code.

One example of this is heavy reliance on stack allocation. In C and C++, doing stack allocation of buffers often leads to difficult choices, like fixing the buffer size and writing to it in place. There is perhaps no single technique that, over the years, has led to the most buffer overrun-based exploits. Not only that, but stack overflow in Windows programs is quite catastrophic, and increases in likelihood the more stack allocation that a program does. So doing _alloca in C++ or stackalloc in C# is really just playing with fire, particularly for dynamically sized and potentially big allocations.

Another example is using unsafe code in C#. I can’t tell you how many times I’ve seen programmers employ unsafe pointer arithmetic to avoid the automatic bounds checking generated by the CLR JIT compiler. It is true that in some circumstances this can be a win. But it is also true that most programmers who do this never bothered to crack open the resulting assembly to see that the JIT compiler does a fairly decent job at automatic bounds check hoisting. This is an example where the cost of the optimization outweighs the benefits in most circumstances. The cost to pin memory, the risk of heap corruption due to a failure to properly pin memory or an offset error, and the complication in the code, are all just not worth it. Unless you really have actually measured and found the routine to be a problem.

If it smells way too complicated, it probably is.

In conclusion

I’m not saying Knuth didn’t have a good point. He did. But the "premature optimization is the root of all evil" pop-culture and witty statement is not a license to ignore performance altogether. It’s not justification to be sloppy about writing code. Proactive attention to performance is priceless, especially for certain kinds of product- and systems-level software.

My hope is that this article has helped to instill a better sense, or reinforce an existing good sense, for what matters, where it matters, and why it matters when it comes to performance. Before you write a line of code, you really need to have a budget for what you can afford; and, as you write your code, you must know what that code costs, and keep a constant mental tally of how much of that budget has been spent. Don’t exceed the budget, and most certainly do not ignore it and just hope that wishful thinking will save your behind. Building up this debt will cost you down the road, I promise. And ultimately, test-driven development works for performance too; you will at least know right away once you have exceeded your budget.

Think about worst case performance. It’s not the only thing that matters, but particularly when the best and worst case differ by an order of magnitude, you will probably need to think more holistically about the composition of caller and callee while building a larger program out of constituent parts.

And lastly, the productivity and safety gains of managed code, thanks to nice abstractions, type- and memory-safety, and automatic memory management, do not have to come at the expense of performance. Indeed this is a stereotype that performance conscious programmers are in a position to break down. All you need to do is slow down and be thoughtful about each and every line of code you write. Remember, programming is as much engineering as it is an art. Measure, measure, measure; and, of course, be thoughtful, intentional, and responsible in crafting beautiful and performant code.

9/6/2010 11:38:49 AM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, July 11, 2010

That immutability facilitates increased degrees of concurrency is an oft-cited dictum. But is it true? And either way, why?

My view on this matter may be a controversial one. Immutability is an important foundational tool in the toolkit for building concurrent – in addition to reliable and predictable – software. But it is not the only one that matters. Making all your data immutable isn’t going to instantly lead to a massively scalable program. Natural isolation is also critically important, perhaps more so. And, as it turns out, sometimes mutability is just what the doctor ordered, as with large-scale data parallelism.

Isolation first; immutability second; synchronization last

Stepping back for a moment, the recipe for concurrency is rather simple. Say you’ve got multiple concurrent pieces of work running simultaneously (or have a goal of getting there); for discussion’s sake, call them tasks. Take two tasks. The first critical decision has two cases: either these tasks concurrently access overlapping data in shared-memory, or they do not. If they do not, they are isolated, and no precautions associated with racing memory updates are needed. If they do share data, on the other hand, then something else must give. If all concurrently accessed data is immutable, or all functions used to interact with data are pure, then dangerous concurrency hazards are avoided. All is well. If some data is mutable, however, then this is where things get tricky, and higher-level synchronization is needed to make accesses safe. This decision tree is straightforward and clear.

I have listed those four attributes – isolated, immutable and pure, and synchronization – in a very intentional order. Thankfully, this order mirrors the natural top-down hierarchical architecture of most modern object- and component-based programs: we have large containers that communicate through well-defined interfaces, each comprised of layers of such containers, and somewhere towards the leaves, a fair amount of intimate commingling of knowledge regarding data and invariants.

This order also reflects the order of complexity and execution-time costs, from least to most. Isolation is simple, because components depend on each other in loosely-coupled ways, and in fact scales superiorly in a concurrent program because no synchronization is necessary, the “right” data structure may be chosen for the job – immutable or not – and locality is part-and-parcel to the architecture. Immutability at least avoids the morass of synchronization, which can affect programs immensely in complexity, runtime overheads, and write-contention for shared data. It is clear that synchronization is something to avoid at all costs, particularly anything done in an ad-hoc manner like locks.

Making the concurrency

But where did all this concurrency come from, anyway?

It came from two things:

  1. The coarse-grained breakdown of a program into isolated pieces.
  2. The fine-grained data parallelism.

On #1: Program fragments that are isolated are already half-way down the road to running concurrently as tasks. The second half of this journey, of course, is teaching them to interact with one another asynchronously, most frequently through message-passing or by sticking them into a pipeline. The details of course depend on what programming language you are using. It may be through agents, actors, active objects, COM objects, EJBs, CCR receivers, web-services, something ad-hoc built with .NET tasks, or some other reification. Nevertheless the isolation is common to all these.

On #2: Data parallelism, it turns out, often works best with mutable data structures. These structures must be partitionable, of course, so that tasks comprising the data parallel operations may operate with logically isolated chunks of this data safely, even if they are parts of the same physical data structure. So chunks of them are isolated even though they don’t appear to be. This is trivially achievable with many important parallel-friendly data structures like arrays, vectors, and matrices. Capturing this isolation in the type system is of course no small task, though region typing gets close (see UIUC’s Data Parallel Java).

But you usually don’t want these structures to be immutable, because they can be modified in constant-time and space if they are their classic simple mutable forms. Programmers doing HPC-style data-parallelism a la FORTRAN, vectorization, and GPGPU know this quite well. Compare this a world where we are doing data-parallelism over immutable data structures, where modifications often necessitate allocations or more complicated big-oh times due to clever techniques meant to avoid such allocations, as with persistent immutable data structures. This is likely less ideal. It is true that some data parallel operations are not in-place against mutable data – as with PLINQ – at which point purity, but not immutability, is key. The two are related but not identical: immutability pervades the construction of data structures, whereas purity pervades the construction of functions. But if you can get by with one copy of the data, why not do it? Particularly since most datasets amenable to parallel speedups are quite large.

Immutability: the bricks, not the mortar

Notice that the concurrency did not actually come from immutable data structures in either case, however. So what are they good for?

One obvious use, which has little to do with concurrency, is to enforce characteristics of particular data structures in a program. A translation lookup table may not have been meant to be written to except for initialization time, and using an immutable data structure is a wonderful way to enforce this intent.

What about concurrency? Immutable data structures facilitate sharing data amongst otherwise isolated tasks in an efficient zero-copy manner. No synchronization necessary. This is the real payoff.

For example, say we’ve got a document-editor and would like to launch a background task that does spellchecking in parallel. How will the spellchecker concurrently access the document, given that the user may continue editing it simultaneously? Likely we will use an immutable data structure to hold some interesting document state, such as storing text in a piece-table. OneNote, Visual Studio, and many other document-editors use this technique. This is zero-cost snapshot isolation.

Not having immutability in this particular scenario is immensely painful. Isolation won’t work very well. You could model the document as a task, and require the spellchecker to interact with it using messages. Chattiness would be a concern. And, worse, the spellchecker’s messages may now interleave with other messages, like a user editing the document. Those kinds of message-passing races are non-trivial to deal with. Synchronization won’t work well either. Clearly we don’t want to lock the user out of editing his or her document just because spellchecking is occurring. Such a boneheaded design is what leads to spinning donuts, bleached-white screens, and “(Not Responding)” title bars. But clearly we don’t want to acquire a lock and then make a full copy of the entire document. Perhaps we’d try to copy just what is visible on the screen. This is a dangerous game to play.

Immutability does not solve all of the problems in this scenario, however. Snapshots of any kind lead to a subtle issue that is familiar to those with experience doing multimaster, in which multiple parties have conflicting views on what “the” data ought to be, and in which these views must be reconciled.

In this particular case, the spellchecker sends the results back to the task which spawned it, and presumably owns the document, when it has finished checking some portion of the document. Because the spellchecker was working with an immutable snapshot, however, its answer may now be out-of-date. We have turned the need to deal with message-level interleaving – as described above – into the need to deal with all of the messages that may have interleaved within a window of time. This is where multimaster techniques, such as diffing and merging come into play. Other techniques can be used, of course, like cancelling and ignoring out-of-date results. But it is clear something intentional must be done.

In conclusion

It is safe to say that immutability facilitates important concurrent architectures and algorithms. It can really help big time, for sure. But it is clearly no panacea. Whether mutability or immutability is the right choice for a particular data structure in your program, as with all things, depends.

It could be the case that choosing a piece-table for storing your text facilitates large-scales of concurrency in version two of your software application, but that in version one you have no use for it. Making that call ahead of time may pay in spades down the road, even if it comes at a marginal cost up-front. Or it could be that choosing an immutable data structure costs you in time and space, and you never end up exploiting the fact that you could have shared that particular structure in a zero-cost way across agents in your program.

One thing’s for sure: I’m glad to be programming in languages like C#, F#, Clojure, and Scala, where I’ve got a choice.

7/11/2010 8:18:30 PM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, July 01, 2010

In .NET today, readonly/initonly-ness is in the eye of the provider. Not the beholder.

Although both C# and the CLR verifier go to great pains to ensure you don't change a readonly/initonly field outside of its constructor (or class constructor, in the case of a static field), this guarantee doesn't imply what you might imagine. It means what it says: you can't change such fields except for in certain contexts.

If you try, C# won't let you, including forming byrefs to them:

v.cs(0,0): error CS0191: A readonly field cannot be assigned to (except in a constructor or a variable initializer)
v.cs(0,0): error CS0192: A readonly field cannot be passed ref or out (except in a constructor)
v.cs(0,0): error CS0198: A static readonly field cannot be assigned to (except in a static constructor or a variable initializer)
v.cs(0,0): error CS0199: A static readonly field cannot be passed ref or out (except in a static constructor)

And neither will the CLR verifier:

[IL]: Error: [c:\v.exe : C::Main][offset 0x00000001] Cannot change initonly
    field outside its .ctor.

Of course, attempting to invoke an operation on a readonly struct will make a defensive copy locally, and invoke the method against that. This ensures the readonly contents cannot change.

One unfortunate hole in this safety is with unions. You do not need unsafe code to break readonly, and yet the effect is the same as with an unverifiable program that writes to a readonly field:

struct S1 {
    public readonly int X;
}

struct S2 {
    public int X;
}

[StructLayout(LayoutKind.Explicit)]
struct S3 {
    [FieldOffset(0)]
    public S1 A;
    [FieldOffset(0)]
    public S2 B;
}

Now we can change A.X via B.X, even though A.X is supposedly readonly:

S3 s3 = ...;
int x = s3.A.X;
s3.B.X++;
ASSERT(x == s3.A.X); // false; it is +1

The same would have been true even if the field S3.A was marked readonly.

This is quite an evil trick. I have to be honest that I believe this is a CIL verification hole, and should produce unverifiable MSIL much like when you try to overlay structs containing overlapping GC references. Nevertheless, it is what it is.

Let's step back. Why does all of this matter, anyway, and what guarantees were we hoping that readonly would provide?

It would be ideal, I assert, if the guarantee was not just "the target field can only be written to in the constructor", but also "the target field, once read, cannot be observed with a different value later on". This would not be true during construction, but we'd like to say it holds at all other times.

The above example throws a wrench in this idea. As does the following example. But this new example will be more disturbing, because the solution is not a simple verifier change.

What would you expect this program to print to the console?

struct S {
    public readonly int X;

    public S(int x) { X = x; }

    public void MultiplyInto(int c, out S target) {
        System.Console.WriteLine(this.X);
        target = new S(X * c);
        System.Console.WriteLine(this.X); // same? it is, after all, readonly.
    }
}

S s = new S(42);
s.MultiplyInto(10, out s);

As you may or may not have guessed, the output is "42" followed by "420". Yes, the value of 'this.X' changes after we have assigned to 'target' inside MultiplyTo, because the caller aliases the out-param with the 'this' param. Recall that parameter passing for structs in C# is done byref, so that these two references actually physically point to the same location when that call is made. The assignment to 'target', therefore, actually replaces the entire contents of 'this' all at once. And hence this gives the illusion that readonly fields are shifting.

You might be tempted to say that this can be prevented with alias analysis. But this is deceptively difficult to do. Consider this more complicated example:

class C {
    public struct S S;
}

void M1(C c) {
    M2(c, out c.S);
}

void M2(C c, out S s) {
    c.S.MultiplyInto(10, out s);
}

It is in no way clear inside M2 that the two aliases refer to the same location. The aliasing occurred higher up in the stack. Although byrefs are restricted to stack-only passing, making the necessary alias analysis tantalizingly close to attainable, it is nontrivial to say the least. Presumably we would have had to have blocked the forming of the byref within M1, rather than its use within M2. We could fall back to runtime checks, but that is also unfortunate for numerous reasons.

The moral of the story? Structs as containers of readonly values are not to be trusted, at least not for situations that call for bulletproof safety, such as caching values in the compiler rather than rereading them, because the fields are readonly. Although C# and the CLR do a good job at verifying readonly/initonly are done right at the initialization site, there are still places where these guarantees break down. Thankfully the byref aliasing problem does not threaten thread-safety, but the union problem does. And in conclusion, I do have to imagine all of this will get fixed somewhere down the road, it's just a matter of when and where.

7/1/2010 12:41:20 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, June 27, 2010

Partially-constructed objects are a constant source of difficulty in object-oriented systems. These are objects whose construction has not completed in its entirety prior to another piece of code trying to use them. Such uses are often error-prone, because the object in question is likely not in a consistent state. Because this situation is comparatively rare in the wild, however, most people (safely) ignore or remain ignorant of the problem. Until one day they get bitten.

Not only are partially-constructed objects a source of consternation for everyday programmers, they are also a challenge for language designers wanting to provide guarantees around invariants, immutability and concurrency-safety, and non-nullability. We shall see examples below why this is true. The world would be better off if partially-constructed objects did not exist. Thankfully there is some interesting prior art that moves us in this direction from which to learn.

Seeing such a beast in the wild

In what situations might you see a partially-constructed object? There are two common ones in C++ and C#:

  • ‘this’ is leaked out of a constructor to some code that assumes the object has been initialized.
  • A failure partway through an object’s construction leads to its destructor or finalizer running against a partially-constructed object.

In the first case, the rule of thumb is “don’t do that.” This is easier said than done. The second case, on the other hand, is a fact of life, and the rule of thumb is “tread with care, and be intentional.” Let’s examine both more closely.

The evils of leaking ‘this’

Leaking ‘this’ during construction to code that expects to see a fully-initialized object is a terrible practice. Before moving on, it’s important to remember initialization order in C++ and C#: base constructors run first, and then more derived constructors. If I have E subclasses D subclasses C, then constructing an instance of E will run C’s constructor, and then D’s, and then lastly E’s. Destructors in C++, of course, run in the reverse order.

Member initializers, on the other hand, run in different orders in C++ versus C#. In C#, they run from most derived first, to least derived. So in the above situation, E’s initializers run, and then D’s, and then C’s. This happens fully before running the ad-hoc constructor code. In C++, however, member initializers run alongside the ordinary construction process. C’s member initializers run just before C’s ad-hoc construction code, and then D’s, and then E’s. Another difference is that C#’s initializers cannot access ‘this’, whereas C++’s initializers can.

For example, this C# program will print E_init, D_init, C_init, C_ctor, D_ctor, and then E_ctor:

using System;
class C {
    int x = M();
    public C() {
        Console.WriteLine("C_ctor");
    }
    private static int M() {
        Console.WriteLine("C_init");
        return 42;
    }
}

class D : C {
    int x = M();
    public D() : base() {
        Console.WriteLine("D_ctor");
    }
    private static int M() {
        Console.WriteLine("D_init");
        return 42;
    }
}

class E : D {
    int x = M();
    public E() : base() {
        Console.WriteLine("E_ctor");
    }
    private static int M() {
        Console.WriteLine("E_init");
        return 42;
    }
}

class Program {
    public static void Main() {
        new E();
    }
}

And this C++ program will print C_init, C_ctor, D_init, D_ctor, E_init, E_ctor, ~E, ~D, and finally ~C:

#include 
using namespace std;

struct C {
    int x;
    C() : x(M()) { cout << "C_ctor" << endl;   }
    ~C() { cout << "~C" << endl; }
    static int M() { cout << "C_init" << endl; return 42; }
};

struct D : C {
   int x;
    D(): x(M()) { cout << "D_ctor" << endl; }
    ~D() { cout << "~D" << endl; }
    static int M() { cout << "D_init" << endl; return 42; }
};

struct E : D {
   int x;
    E() : x(M()) { cout << "E_ctor" << endl; }
    ~E() { cout << "~E" << endl; }
    static int M() { cout << "E_init" << endl; return 42; }
};

static void main() {
    E e;
}

It’s interesting to note that the CLR allows constructor chaining to happen in any order. The C# compiler emits the calls to base as the first thing a constructor does, but other languages can choose to do differently. The verifier ensures that a call has occurred somewhere in the constructor body before returning.

This IL program, for example, will print E, D, and then C – the reverse of what C# gives us:

.assembly extern mscorlib { }
.assembly ctor { }

.class C {
  .method public specialname rtspecialname instance void .ctor() cil managed {
    ldstr      "C"
    call       void [mscorlib]System.Console::WriteLine(string)
    ldarg.0
    call       instance void [mscorlib]System.Object::.ctor()
    ret
  }
}

.class D extends C {
  .method public specialname rtspecialname instance void .ctor() cil managed {
    ldstr      "D"
    call       void [mscorlib]System.Console::WriteLine(string)
    ldarg.0
    call       instance void C::.ctor()
    ret
  }
}

.class E extends D {
  .method public specialname rtspecialname instance void .ctor() cil managed {
    ldstr      "E"
    call       void [mscorlib]System.Console::WriteLine(string)
    ldarg.0
    call       instance void D::.ctor()
    ret
  }
}

.class Program {
  .method public static void Main() cil managed {
    .entrypoint
    newobj     instance void E::.ctor()
    pop
    ret
  }
}

So why is leaking ‘this’ bad, anyway?

Say you’ve decided in the implementation of D’s constructor that you would like to stick ‘this’ into a global hash-map. Doing this sadly means other code could grab the pointer and begin accessing it before E’s constructor has even run. That is a race at-best and a ticking time-bomb in all likelihood. For example:

class C {
    public static Dictionary s_globalLookup;
    private readonly int m_key;
    public C(int key) {
        m_key = key;
        s_globalLookup.Add(key, this);
    }
}

Even though we have taken great care to initialize our readonly field m_key before sticking ‘this’ into a dictionary, any subclasses will not have been initialized at this point. Only if C is sealed can we be assured of this. Another piece of code that grabs the element out of the hashtable and begins calling virtual methods on it, for example, is in a race with the completion of the initialization code for subclasses.

Leaking ‘this’ isn’t always such an explicit act. Merely invoking a virtual method within the constructor may dispatch a more derived class’s override before the more derived class’s constructor has run. And therefore its state is most likely not in a place conducive to correct execution of that override. It is fairly common knowledge that invoking virtual methods during construction is an extraordinarily poor practice, and best avoided.

Failure to construct

Let’s move on to the second issue. Imagine we suffer an exception during construction of an object. Perhaps this is due to a failure to allocate resources, or maybe even due to argument validation. It is clear that a leaked ‘this’ object in such cases will be a problem, because the object will have escaped into the wild even though its initialization failed. Subsequent attempts to use the object will obviously pose problems. What is more subtle is that if the class in question declares a destructor (in C++) or finalizer (in C#), a problem may now be lurking.

Let’s say we have the original situation shown above: C derives from D derives from E. Now say an exception happens within D’s constructor. At this point in time, C’s constructor has run to completion, D’s constructor has run partially up to the point of failure, and E’s has not run at all. (And, of course, in the case of C#, all member-initializers for all classes have actually run.) What happens to the cleanup code?

In C++, only constructors that have run will have their corresponding destructors executed. In the above situation, where C, D, and E each declares a destructor, only C’s will be run during stack unwind. It is imperative, therefore, that D handles failure within its constructor rather than relying on the destructor. For example:

class D : C {
    int* m_pBuf1;
    int* m_pBuf2;
public:
    D() {
        m_pBuf1 = … alloc …;
        m_pBuf2 = … alloc …;
    }
    ~D() {
        if (m_pBuf2) … free …;
        if (m_pBuf1) … free …;
    }
}

If the allocation destined for m_pBuf2 fails by throwing an exception, the destructor for D will not run, and therefore m_pBuf1 will leak. The C++ solution to this particular example is to use smart pointers and member initializers for the allocations, because successfully initialized members do get destructed, even when the constructor (or indeed one of the member initializers) subsequently fails. This means that destructors for a particular class do not have to tolerate that class’s state not having been fully constructed, because those destructors will never run. Destructors must not, however, invoke virtuals, for two obvious reasons: (a) subclasses may not have ever been initialized, and (b) destructors run in reverse order, and so the destruction code for the subclass will have already run by the time a baseclass’s destructor runs.

In C#, finalizers will run, regardless of whether an object’s constructor ran fully, partway through, or not at all. If the object is allocated – and so long as GC.SuppressFinalize isn’t called on it – the finalizer runs. This distinctly means that C# finalizers must always be resilient to partially-constructed objects (unlike C++ destructors). Thankfully finalizers are a rare bird, and therefore this issue is seldom even noticed by .NET programmers.

This issue does not arise in the case of .NET’s IDisposable pattern. If a constructor throws, the assignment to the target local variable does not occur. And therefore the variable enclosed in, say, a using statement remains null. This means that there is no way to possibly invoke Dispose on the object. Moreover, the allocation in using occurs prior to entering the try/finally block. Hence, you really had better be writing constructors that don’t throw and/or protecting such resources with smart-pointer-like things with finalizers, a la SafeHandle.

Impediments to language support

As if these weren’t sufficient cause for concern, I also mentioned earlier – and somewhat vaguely – that partially-constructed objects interfere with language designers’ efforts to add invariants, immutability and concurrency-safety, and non-nullability to the language. And all of these are important properties to consider in our present age of complexity and concurrency, so this point is worth understanding more deeply. Let’s look at each in-order.

Invariants and safe-points

A partially-constructed object obviously may have broken invariants. By definition, invariants are meant to hold at the end of construction, and so if construction never completes, the rules of engagement are being broken.

Imagine, for example:

class C {
    int m_x;
    int m_y;
    invariant m_x < m_y;
    public C(int a) {
        m_x = a;
        m_y = a + 1;
    }
}

It is ordinarily very difficult to ensure that each instruction atomically transitions the state of an object from one invariant safe-point to another. A common technique is to define well-defined points at which invariants must hold. We might model each failure point as one such technique. But even in C#, the above program does not satisfy this constraint, because an overflow exception may be generated at the ‘m_y = a + 1’ line. Or a thread-abort exception may be generated right in the middle of those two functions. Or, if addition were implemented as a JIT helper, an out-of-memory exception could get generated due to failure to JIT the helper function.

In such cases, we’d like to say that the object does not exist. But the sad fact is that the object *does* exist, and if the object has acquired physical resources at the time of failure to construct, we must compensate and reclaim them. The ideal world looks a lot like object construction as transactions, where the end of construction is the commit-point. The state-of-the-art is very different from this, however, and so any static verification and theorem proving that depends on invariants on an object holding, well, invariantly, is subject to being broken by partially-constructed objects.

Immutability… or not

Immutability is also threatened by partially-constructed objects. Immutability is a one of many first class techniques for solving concurrency-safety in the language, so this one is quite unfortunate.

In C#, for example, we might be tempted to say that a shallowly immutable type is one whose fields are all marked readonly. (And a deeply immutable type is one whose fields are all readonly, and also refer to types that are immutable.) A readonly field cannot be written to after construction has completed. Unfortunately, if the ‘this’ leaks out during construction, we may see those readonly fields in an uninitialized or even changing state:

class C {
    public static C s_c;
    readonly int m_x;
    public C() {
        m_x = 42;
        s_c = this;
        while (true) {
            ++m_x;
        }
    }
}

This is quite evidently a terrible and malicious program. C appears to be immutable, because it only contains readonly fields, but is quite clearly not, because the value of m_x is assigned to multiple times. If we had a guarantee that all readonly fields were definitely assigned once-and-only-once before ‘this’ can escape, then we’d be close to a solution. But of course we have no such guarantee. In C#, at least.

A related issue is co-initialization of objects. This is interesting and relevant, because in such cases we actually want to leak out partially-constructed objects. Imagine we’d like to build a cyclic graph comprised of two nodes, A and B, each referring to the other. With a naïve approach to immutability, we simply cannot make this work. Either A must first refer to B, in which case A refers to a partially-constructed B; or B must first refer to A, in which case B refers to a partially-constructed A. Both are equally as bad. The two assignments are not atomic.

Cyclic data structures are a commonly cited weakness of immutability, and an argument in favor of supporting partially-instantiated objects in a first class way, although there are approaches that can work. One example is to separate edges from nodes, and represent them with different data structures. We can then build the nodes A and B, and then build the edges A->B and B->A without needing to use cycles.

It’s not-null, or at least it wasn’t supposed to be

Tony Hoare called it his billion-dollar mistake: the introduction of nulls into a programming language. I think he sells himself short, however, because the absence of nulls in an imperative programming language – however worthy a pursuit – is actually a notoriously difficult to attain.

Spec# is one example of a C-style language with non-nullability, in which T! represents a “non-null T”, for any T. Although this is done in a pleasant way conducive to C#-compatibility – a significant goal of Spec# -- I’d personally prefer to see the polarity switched: T would mean “non-null T” and T? would mean “nullable T”, for any T, reference- and value-types included. This is much more like Haskell’s Maybe monad, and is the syntax I’ll use below for illustration purposes. But I digress.

Non-nullability is a wonderful invention, because it is common for 75% or more of the contracts and assertions in modern programs to be about a pointer being non-null prior to dereferencing it, both in C# and in C++. Relying on the type-system to prove the absence of nulls is one big step towards creating programs that are robust and correct out-of-the-gate, particularly for systems software where such degrees of reliability are important. And it cuts down on all those boilerplate contracts sprinkled throughout a program. Instead of:

void M(C c, D d, E e)
    requires c != null, d != null, e != null
{
    … use(c, d, e) …
}

You simply say:

void M(C c, D d, E e)
{
    … use (c, d, e) …
}

No opportunity to miss one, and no need for runtime checks. It’s beautiful.

A problem quickly arises, however, having to do with partially-constructed objects. All of an object’s fields cannot possibly be non-null while the constructor is executing, because the object has been zero’d out and the assignments to its fields have not yet been made. Clearly constructor code needs to be treated “differently” somehow. We cannot simply live with the fact that ‘this’ escaping leads to a partially-constructed object leaking out into the program, because that could lead to serious errors. These serious errors include potential security holes, if unsafe code is manipulating the supposedly non-null pointer. One advantage to adding non-nullability is that runtime null checks can be omitted, because the type system guarantees the absence of nulls in certain places. In this situation, partially-constructed objects lead to holes in our nice type system support. Either the dynamic non-null checks are required as back-stop, or we’ll need some other coping technique.

There are related issues with non-nullability, like with partially-populated arrays. Imagine we’d like to allocate an array of 1M elements of type T, and we will proceed to populate those elements following the array’s allocation. There’s clearly a window of time during which the array contains 1M nulls, and then 1M-1 nulls, …, and if we finish 1M-1M nulls, i.e. 0 nulls. It is only at that last transition that the array can be considered to contain non-null T’s. The standard technique is to use an explicit dynamic conversion check, or to force the creation of the list to supply all of the elements of the array at construction time.

Coping techniques

There are, thankfully, some interesting ways to cope with partially-constructed objects. There is, in fact, a spectrum of techniques, ranging from escape analysis in various forms, to limitations around how objects are constructed such that a partially-constructed one can never leak, to automatic insertion of dynamic checks to prevent the use of partially-constructed objects, to static annotations that treat partially-constructed instances as first class things in the type system. And of course there’s always the technique of “deal with it”, which is the one that most C++-style languages have chosen, including our beloved C#.

The F# approach: restrictions and dynamic checks

F#, it turns out, does a very good job in preventing partially-initialized objects. A first important step is that fields in F# are readonly by-default, unless you opt-in to mutability using the mutable keyword. Therefore data structures are mostly immutable. And the construction rules are meant to make it very unlikely that you’ll expose a partially-constructed object during construction. How so? It’s simple: such fields must be initialized prior to running ad-hoc construction code, and if you attempt to initialize them multiple times, the compiler supplies an error. In other words, you really have to work hard to screw yourself, unlike C++ and C# where it’s very easy.

It’s of course possible to do some dirty tricks to publish or access a partially-initialized object, despite needing to work very hard at it. There is, however, a nice surprise awaiting us when we try. For example:

type C() =
    abstract member virt : unit -> unit
    default this.virt() = ()

type D() as this =
    inherit C()
    do this.virt()

type E =
    inherit D
    val x : int
    new() = { x = 42; }
    override this.virt() = printf "x: %d" this.x

let e = new E()

This example attempts to perform a virtual invocation from C before the more derived class has been fully initialized. This overridden virtual simply (attempts to) prints out the value of x. If we compile and run this program, however, we will notice that we get an exception: “InvalidOperationException: the initialization of an object or value resulted in an object or value being accessed recursively before it was fully initialized.”

Pretty neat. The compiler will stick in the checks necessary when ‘this’ is being accessed, to dynamically verify that an object is not being leaked before having been fully initialized. The F# approach can be summed up as trying to make things airtight as possible statically at compile-time, but admitting that there are holes – primarily due to inheritance – and dealing with them by inserting dynamic runtime checks. This tradeoff clearly makes sense for F#, because it is attempting to attain a robust level of reliability around immutability.

F# also adds non-nullability for the most part. Like Haskell’s Maybe monad, F# adds an option type that can store a single None value which lies outside of the underlying type’s domain to effectively represent null. Because F# is a .NET language it of course also needs to worry about nulls at interop boundaries with other languages like C# and VB. This is a great step forward; first class CLI support would be a nice next step.

A slight variant of the F# idea is to initialize data up the whole class hierarchy in one pass, and then run ad-hoc construction code in a second pass in the usual way. So long as readonly data can be initialized without running the ad-hoc construction code, this helps to statistically cut down on the chances for accidental leaking of ‘this’.

Type system: T versus notconstructed-T

We can model two kinds of T in the type-system: T and notconstructed-T. The constructor for any type T would then see the ‘this’ pointer as an notconstructed-T, and everything else – by default – sees ordinary T’s.

What good does this distinction do? It enables us to add verification and restrictions around the use of notconstructed-T’s and limitations to the conversion between the two types. See this paper by Manuel Fahndrich and Rustan Leino for an example of how this approach was taken in Spec#’s non-nullability work.

For example, we can prohibit conversion between T and notconstructed-T altogether, thereby disallowing escaping ‘this’ references altogether. If the type of ‘this’ within a constructor is different than all other references to type T, and they are not convertible, we’ve successfully walled the problem off in the type system. This protects against erroneous method calls, so that a constructor could not call any of its own methods, because these methods expect an ordinary T whereas the constructor only has a notconstructed-T. And because you cannot state notconstructed-T in the language, you cannot let one leak by storing it into a field.

We could add more sophisticated support, by allowing a programmer to explicitly tag certain non-field references as T-notconstructed. This makes the concept first-class in the language, and allows one to explicitly declare the fact that code is interacting with a partially-constructed T:

class C
{
    int m_x;
    public C() {
        m_x = V();
    }
    protected virtual int V() notconstructed {
        … I know to be careful …
    }
}

In this example, the programmer has annotated V with ‘notconstructed’. This enables the call from the constructor because the method’s ‘this’ is an uninitialized-T, and serves as a warning to the programmer that he or she should take care, much like the code written inside a constructor.

We must also decide whether fields are offlimits via notconstructed-T’s. If yes, we can add F#-style dynamic checks for initialization, but only for attempted accesses against notconstructed object’s fields. This is nice because it means the scope of dynamic checks are limited, and used in a pay-for-play manner. And we could even enable a programmer to sidestep the error by stating at the use-site that they understand a particular field access may be to uninitialized memory, like Field.ReadMaybeUninitialized(&m_field).

To be honest, the reason this approach has likely not yet seen widespread use is that the cost is not commensurate with the benefit. At least, in my opinion. To make something like partially-constructed objects a first-class concept in a programming language, programmers would need to want to know where they are dealing with them. For systems programmers, this makes sense. For many other programmers, it would be useless ceremony with no perceived value. And yet the initial approach where nothing new needed to be stated, but yet escaping ‘this’ was prevented, blocks certain patterns of legal use. This is where theory and practice run up against one another. There is, however, presumably a nice middle ground awaiting discovery.

Winding down

I meant for this to be a short post. But the topic really is fascinating, and has been coming up time and time again as we do language work here at Microsoft. But it is truly fascinating mainly because, like nulls, the problem is widespread yet tolerable, and most C++ and C# programmers learn the rules and make do. Partially-constructed objects are a major blemish, but not a crisis that threatens the complete abandonment of imperative programming.

I do believe language trends indicate that more will move away from C++-style object initialization and related issues, and more towards immutability and treating data and its initialization separately from imperative ad-hoc initialization code. Haskell, F#, and Clojure, for example, show us some promising paths forward. There are a plethora of other attempts at solving related problems, and I unfortunately could only scratch the surface.

Although these techniques are not new, the primary question – to me – is how close to “the metal” in systems programming these concepts can be made to work. Typically for those situations, you need to rely on a mixture of static verification and complete freedom, because the dynamic checking is too costly and the code to work around overly-limiting static verification also adds too much cost. But as soon as you add complete freedom into the picture, you run into partially-constructed objects as a consequence, and all the issues I’ve mentioned above.

Anyway, I hope it was interesting.

6/27/2010 12:52:29 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, May 16, 2010

My article about Transactional Memory (TM) was picked up by a few news feeds recently.

If I had known this would occur, I would have written it with greater precision.  Because my article presents a mixture of technical challenges interspersed among more subjective and cultural issues, I am sure it is difficult to tease out my intended conclusion.  To summarize, I merely believe adding TM to a shared memory architecture alone is insufficient.

Indeed, I remain a big fan of transactions.  Atomicity, consistency, and isolation, and coming up with strategies for achieving all three in tandem, are part and parcel to architecting software.

After watching Barbara Liskov's OOPSLA Turing Award reprise, I decided to reacquaint myself with some old Argus papers this weekend.  It has been some time since I last read them.  Argus was Liskov's language for distributed programming and her follow-on to CLU.  As with most research done by brilliant people, the work was way ahead of its time, has appeared in ad-hoc incarnations and permutations over time, and remains relevant today.  This research is particularly interesting to work that my team is doing right now, especially its notion of guardians.  And it is relevant to the TM discussion too.

The Argus approach of using isolation to coarsely partition state and operations into independent bubbles, and then communicating asynchronously between the so-called guardians that are responsible for this state, is an architecture that is common among most highly concurrent programs.  This aids state management and fault tolerance.  Argus makes an interesting observation that, although guardians may be sent messages concurrently -- and indeed activities themselves may even introduce local concurrency -- manipulation of state can be done safely and even in parallel thanks to transactions.

The requirement is that messages are atomic and commute.  Transactions, it turns out, are a convenient way of implementing this requirement.

You will observe a similar architecture in other places, including in some languages that have adopted TM.  Haskell has moved in this direction.  Everything is purely functional and so, of course, no state is mutated in an unsafe way by default.  However, with the introduction of concucrrency comes mutable cells for message passing and with parallelism comes indeterminism.  You can push the state management problem up indefinitely, but at the top there are almost always mutable operations on real-world state (even if it is "just I/O").  Haskell programs have a safe architecture to begin with, and it is the intentional and careful addition of specific facilities that forces one to focus on the problematic seams.  One could say that Haskell starts clean and stays clean, versus most shared memory-based languages which start dirty and try to attain cleanliness (at least when it comes to concurrency).

Why aren't transactions sufficient, then, given that the I in ACID stands for Isolation?  You wouldn't model a database as one flat table in which each row is a single byte, however, would you?  As you begin to decompose your program into isolated state, your bubbles (or guardians) are the tables, and your objects are the rows.  This is just an analogy but I find it useful to think in these terms.  Taking a bunch of intermingled state and pouring transactions on top is not going to give you this nicely partitioned separation of state which has proven to be the lifeblood of concurrency.

Even once you've attained a more isolated architecture, however, transactions are not a panacea.  They are just one of many viable state management techniques in a programmer's arsenal, hierarchical state machines being another notable example.  And in fact, many of the problems I mentioned in the TM article are still worrisome even when you start from the right place.  From within a guardian, you may wish to enlist the aid of another unrelated guardian to perform a coordinated atomic activity, because a higher level invariant relationship between them must be preserved.  Or an application which composes multiple guardians may wish to do so atomically.  Even Argus required manual compensation for such things.  This can be solved in part by DTC.  But experience suggests that continuing to push the enlistment scope one level higher eventually leads to substantial problems.  A topic for another day, I suppose.

My primary conclusion is that TM is a great complement to highly concurrent programs, but only so long as you start from the right place.  The Argus and Haskell approaches are conducive to large-scale concurrency, but it is primarily because of the natural isolation those models provide; the addition of transactions address problems that remain after taking that step.  But without that first step, they would have gotten nowhere.

5/16/2010 9:48:58 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, April 25, 2010

We use static analysis very heavily in my project here at Microsoft, as a way of finding bugs and/or enforcing policies that would have otherwise gone unenforced.  Many of the analyses we rely on are in fact minor extensions to the CLR type system, and verge on “effect typing”, an intriguing branch of type systems research that has matured significantly over the years.

Many of these annotations are done on methods, rather than types.  A few examples include:

  1. [MayBlock] indicates that a method is free to call methods that might block.
  2. [NoAllocations] indicates that a method is neither allowed to allocate, nor call another method that might allocate
  3. [Throws(...)] indicates that a method is allowed to throw an exception of a type in the set { … }, or call other methods that may throw exceptions in the set { … }.
  4. And so on.

It turns out there’s a general way for handling these annotations.  And indeed, you will quickly find that pursuing ad-hoc solutions to each independently leads to troubles.  We shall briefly look at the generalization.

We must first observe that each falls into one of two categories: additive or subtractive.  MayBlock and Throws are additive.  They say what is permitted.  NoAllocations, on the other hand, is subtractive, because the annotation declares what is not permitted.  This distinction, we shall see, is crucial.

First we can imagine that each distinct effect shown above has a distinct effect type.

The types EMayBlock, ENoAllocations, and EThrows correspond to the annotations above.  This will permit us to model effects using subtyping polymorphism.  We will use the usual notation, i.e. “S <: T” means “S is a subtype of T”, or “a S is substitutable in place of a T”.  For example, String <: Object.  Throws is special, because it has a type hierarchy of its own beneath the root type.  As you might guess, this hierarchy is infinite in size and is comprised of each possible permutation of exception types.

There are two special kinds of effects: the null effect (ENil), and a set of other effects (EMany).  The latter permits us to create a new, unique effect type merely by concatenating a list of other effect types.

Each method is then given an EMany effect type containing its full set of effect types.  For example:

[MayBlock, Throws(typeof(FileNotFoundException)), NoAllocations]
void M() { … }

Is given the distinct effect type EMany { EMayBlock, EThrows(typeof(FileNotFoundException)), ENoAllocations }.

We should make one generalization before moving on.  ENil ~ EMany { }.  In other words, having no effects is equivalent to a list of no effects.  Furthermore, EMany { } ~ EMany { ENil }.  In other words, having a list of no effects is equivalent to having no effects.

Now we are ready to weave everything together.  The main question confronting us is as follows: What is the subtyping relationship between the various effect types, including the null and list types?

The easiest to do away with is the EMany type.  Given two EMany types E and F, then E <: F if, for all effects T in E’s type set, there exists an effect type U in F’s type set such that T <: U.  In simpler terms, a list is a subtype of another list so long as all of its components are also subtypes of a component of the other.  This is very abstract, but we shall see soon why it is useful.

Now we get to see why the additive and subtractive distinctions are so important:

Given an additive effect type EAdditive, we say ENil <: EAdditive.

Given a subtractive effect type ESubtractive, we say ESubtractive <: ENil.

The first statement says that a method with no effects is substitutable for a method with additive effects, and the second says that a method with subtractive effects is substitutable for a method with no effects.  The corollaries are perhaps just as important.  A method with additive effects cannot take the place of a method with no effects, whereas a method with subtractive effects can.

For the simple single-effect case, effects depicted in this way represent points on a line, where ENil is zero, subtractive effects are negative integers, and additive effects are positive integers.  The lattice obviously becomes rather complicated as many effects accumulate.

Where does substitutability come up with respect to methods, anyway, you may ask?  The first is in determining which other methods can be called.  If a method M with effects E is trying to call another method N with effects F, this is permitted so long as F <: E.  The next is in virtuals and overriding.  A virtual with effects E may be overridden by a method with effects F so long as F <: E.  The following example illustrates this idea, in addition to the composition of the subtyping rules we have shown so far:

class C
{
    public virtual void M() {}
    [MayBlock, NoAllocations]
    public virtual void N() {}
}

class D : C
{
    [MayBlock, NoAllocations]
    public override void M() {}
    public override void N() {}
}

In this example, the four methods are given the following effect types:

  • C::M gets EMany { ENil }, or just ENil.
  • C::N gets EMany { MayBlock, NoAllocations }.
  • D::M gets EMany { MayBlock, NoAllocations }.
  • D::N gets EMany { ENil }, or just ENil.

What does all this gibberish mean?  Well it’s straightforward and intuitive, actually.

We are attempting to add the MayBlock and NoAllocations effects to the overridden M method which has none.  Because MayBlock is additive, this is illegal (someone might call C::M thinking the code will not block), whereas it is OK for NoAllocations (calls through D::M are assured no allocations will happen, even though calls through C::M are guaranteed no such thing).  Similarly, we are attempting to remove both effects from the overridden N method.  Because MayBlock is additive, this is OK (M isn’t required to block, even though calls through C::M may suspect it of doing so), whereas it is decidedly not OK for NoAllocations (calls through C::M will reasonable assume allocations do not happen, whereas D::M would be free to perform them).  It may take some thought to convince yourself that this is correct, but I hope that you find that it is.  All of this works because of the subtyping of effect types.

All of this works similarly with delegates.  The source delegate signature is akin to the base class in the above example, whereas the target method being bound to is like the override.

Things get a little more complicated when considering the EThrows effect.  It is additive, so it is of course true that ENil <: EThrows(*).  However, what if we have two different EThrows, and wish to inquire about substitutability of one in place of the other?  We can come up with a simple rule that is general purpose for all set-of-type kinds of effects.  Namely, consider two instances A and B of the same effect type:

Given an additive effect type EAdditive, then A <: B if, for all types T in A’s set-of-types, there exists a type U in B’s set-of-types such that T <: U.

Given a subtractive effect type ESubtractive, then A <: B if, for all types T in A’s set-of-types, there exists a type U in B’s set-of-types such that U <: T.

These sound quite similar, except that they end differently (i.e. T <: U vs. U <: T).  We may illustrate the additive case with EThrows; to illustrate the subtractive case, let us imagine we can declare a ENoAllocations effect type that specifies which precise types may not be allocated:

class A{}
class B : A{}

class C
{
    [Throws(typeof(Exception)), NoAllocations(typeof(A))]
    public virtual void M() {}
    [Throws(typeof(FileNotFoundException)), NoAllocations(typeof(B))]
    public virtual void N() {}
}

class D : C
{
    [Throws(typeof(FileNotFoundException)), NoAllocations(typeof(B))]
    public override void M() {}
    [Throws(typeof(Exception)), NoAllocations(typeof(A))]
    public override void N() {}
}

The results should not be surprising.  D::M overrides C::M’s exception list, by being more specific and declaring that FileNotFoundException is thrown instead of just Exception.  This is OK.  Whereas D::N overrides C::N’s list by being more general purpose, specifying Exception instead of FileNotFoundException.  This is clearly not OK.  The NoAllocations type works in exactly the reverse.  D::M attempts to prohibit allocations of B, but this is merely one possible subtype of the base method C::M’s declaration of A, and therefore this is illegal.  Whereas D::N ensures no instances of A are allocated, which of course subsumes the base method C::N’s declaration that no B’s are allocated.

Everything gets a little more interesting when you consider generics.  For example, how would we type a general purpose Map method?  (This pattern arises quite frequently.)  We would presumably want it to somehow “acquire” the effects of the delegate it invokes on all elements in a list.  For example:

U[] Map<T, U>(T[] input, Func<T, U> func);

This declaration is stronger than necessary.  The Func<T, U> class – prepackaged with the .NET Framework – does not have any effects on it.  So it may not, for example, bind to a method that has any additive effects like Throws on it.  This is rather unfortunate.

To solve this we could imagine treating effects with parametric polymorphism:

[Effects(E)]
U Func<T, U, [EffectParameter] E>(T x);

This fictitious syntax merely says that Func can be instantiated with an effect type E, and that the Func “method” itself acquires the effect E.  (Admittedly I should stop using faux-attribute syntax for illustrations since we’ve reached this level of language integration.)  Now Map can be declared as such:

[Effects(E)]
U[] Map<T, U, [EffectParameter] E>(T[] input, Func<T, U, E> func);

This says that Map has the same effects as the Func that is supplied as an argument.  It turns out that we may want to extend this further, by enabling symbolic manipulation of effects.  We may wish, for example, to specify that the Func is not allowed to block, by stating it does not have [MayBlock] in it.  You could imagine using something very similar to generic constraints to achieve this.  It is also interesting to allow concatenation of multiple effect types, both through partial and full specialization.  For example, Map above may clearly have effects of its own.  You also tend to want generic constraints like, 'where E : F', which of course just depends on the aforementioned subtyping rules.  And of course C# 4.0's co- and contravariance can be applied to effects too.

Anyway, I have probably gone beyond most readers’ interest level in this subject.  Things sure do get very interesting when you allow symbolic manipulation of effects.  They get even more interesting when you begin to think of types as having “permissible effects” attached to them.  However, the main thing I wanted to point out with this brief article is that this pattern arises quite frequently.  And despite everyone struggling through what seem to be odd corner cases as they develop ad-hoc solutions, there really is a sound generalization behind it all.  Many languages have first class effect typing, and I have found it liberating to think of many of these type system annotations through that lens.  Perhaps you shall too.

4/25/2010 9:35:15 AM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, February 28, 2010

Simon Peyton Jones was in town a couple weeks back to deliver a repeat of his ECOOP’09 keynote, “Classes, Jim, but not as we know them. Type classes in Haskell: what, why, and whither”, to a group of internal Microsoft language folks. It was a fantastic talk, and pulled together multiple strains of thought that I’ve been pondering lately, most notably the common thread amongst them.

In the talk, he compared polymorphism in Java-like languages (including C# which I will switch to referring to over Java hereforth) with ML and Haskell. In other words, how does a programmer commonly write code in each language that is maximally reusable? Of course, C# programmers primarily achieve this through subclassing, whereas functional programmers rely on type parameterization. Over the years, however, the former group has begun to borrow a great deal from the latter; as evidence, witness the growingly-pervasive use of generics in both Java and C# over the past decade. The talk was given mainly through the lens of this evolution, which appears to approach an interesting limit if projected far enough into the future.

Type classes came on the scene towards the end of the 1980’s, and immediately became a fertile seed for research and exploration in the relationship between subclass and parametric polymorphism. Type classes are much closer to subclass polymorphism than Haskell’s borrowed ML-style, which is to say parametric polymorphism. This is most intriguing because Haskell does not rely on subclassing, and so the mixture of two breeds new patterns.

I thought that it might be interesting to compare the mixture of subclass and parametric polymorphism in Haskell vis-à-vis type classes with the same in C# vis-à-vis a mixture of interfaces, generics, and generic constraints. Hence this post. We shall proceed by examining some basic type classes in Haskell with their equals in C#. Though similar, the dissimilarities are as stark as the similarities. And the lack of higher kinds -- particularly when combined with type classes -- means that some Haskell patterns simply are not expressible in C#.

The Simple Case: Equality (or Lack Thereof)

The most basic type class of all is Eq, which allows the comparison of two like-typed pieces of data. This may seem like a commodity if you ordinarily write code in languages like Java and C# which have a strong notion of object identity. In Haskell, however, equality is value equality over algebraic data types rather than objects, so polymorphism over equality operators is quite a bit more important. Indeed, as we shall see, Haskell’s approach is more powerful than == in Java-like languages. (Witness the neverending dichotomy between reference and value equality vis-à-vis Object.Equals in C#.) But alas, let us proceed by crawling in a series of logical steps, rather than leaping to the conclusions.

Haskell’s Eq type class is defined as such:

class Eq a where

(==), (/=) :: a -> a -> Bool

x /= y = not (x == y)

x == y = not (x /= y)

As you see, Eq provides two operators: == and /=. Default implementations of each define == as the inverse of /= and /= as the inverse of ==. Not only is this a convenience, but it also specifies the desired contract implementations ought to abide by. Other types may become members of the Eq class by mapping the one or both operators to type-specific functionality. You will immediately recognize the similarity to virtual methods in OOP languages, where the operators can be overridden by subclasses.

Of course all of the primitive data types already implement Eq, so you get value equality over numbers, strings, etc. Imagine we declared a new Coords type – comprised of two integers – and want to make it a member of Eq also – wherein equality is determined by a pairwise comparison of each’s members:

data Coords = Coords { fst :: Integer, snd :: Integer }

We make Coords a member of the Eq type class, and thereby define equality over instance, through the ‘instance Eq Coords where’ construct. This maps type class functions to real implementation functions. The example here defines them inline, though you may of course refer to existing functions instead:

instance Eq Coords where

(Coords fst1 snd1) == (Coords fst2 snd2) = (fst1 == fs2) && (snd1 == snd2)

Now we can take a ‘[Coords]’ and ask whether a particular ‘Coords’ exists within it.

A function may constrain a type variable to a certain class, and thereby access members of that class. For example, the following ‘isin’ function tests whether an instance of some type ‘a’ is contained within a list of type ‘[a]’. To do this, it demands that ‘a’ is a member of Eq using the syntax “Eq a =>”:

isin :: Eq a => a -> [a] -> Bool

x `isin` [] = False

x `isin` (y:ys) = x == y || (x `isin` ys)

The moral equivalent to the Eq type class in C# is not so easy to decide. The most obvious first guess is
the built-in == and != operators. However, we will quickly find that this is not quite right, because these operators are not polymorphic in C#. To illustrate this point, let’s try to write the ‘isin’ method in C#, using generics and the == operator, for example:

bool IsIn<T>(T x, T[] ys)

{

foreach (T y in ys) {

if (x == y)

return true;

}

}

This function will not compile. The reason is that == and != in C# are not defined over all types (specifically not for value types). You can get IsIn to compile by restricting the T to a reference type:

bool IsIn<T>(T x, T[] ys) where T : class

{

… same as above …

}

Although this code is deceptively similar to the Haskell example, it is actually quite different. The == used to compare two instances compiles into the MSIL CEQ operator, effectively hard-coding an object identity comparison. Even if an overloaded == operator for a particular instantiated T is available, the compiler will not bind to it. Why? Because it is overloading and specifically *not* overriding. For example, say we had a MyData type and an overloaded == operator for comparing two instances:

class MyClass

{

public static bool operator ==(MyClass a, MyClass b) { return true; }

public static bool operator !=(MyClass a, MyClass b) { return false; }

}

According to this, all MyClass objects are equal. However, the following call yields the answer ‘false’:

IsIn<MyClass>(new MyClass(), new MyClass[] { new MyClass() });

The same problem arises should instances of MyClass get referred to by Object references. == and != do not perform any kind of virtual dispatch; the selection of implementation is chosen statically.

Perhaps it is the Equals method inherited from System.Object, then? This, at least, is virtual. And indeed, this gets much closer to Eq. Any type may override Equals, and a generic definition defined in terms of it dispatches virtually and allows subclasses to change behavior on a type-by-type basis:

bool IsIn<T>(T x, T[] ys)

{

foreach (T y in ys) {

if (x == y || (x != null && x.Equals(y)))

return true;

}

return false;

}

(Even this is slightly different, because it assumes a certain type-agnostic behavior about nulls.)

This is cheating, however. We’ve taken advantage of the fact that someone thought to put an Equals method on System.Object, thereby giving all Ts such a method. There are clearly limits to how many crosscutting things can be added to System.Object before it becomes overwhelmed with concepts, not to mention the size (e.g. v-tables). Moreover, Equals on Object is weakly typed; a better solution is to use interfaces, like the IEquatable<T> interface that introduces a strongly typed Equals method:

public interface IEquatable<T>

{

bool Equals(T other);
}

And to use a generic type constraint on IsIn’s T, much more akin to what ‘isin’ in Haskell above did:

bool IsIn<T>(T x, T[] ys)

where T : IEquatable<T>

{

foreach (T y in ys) {

if (x == y || (x != null && x.Equals(y)))

return true;

}

return false;

}

This is cheating a little less, because we can implement an interface after-the-fact without impacting a class’s type hierarchy. This, in fact, looks remarkably similar to the Haskell ‘isin’ shown earlier, using type classes and parametric polymorphism, where here we have used interfaces in place of type classes.

We might be tempted to define a default NotEquals method over all IEquatable<T> instances, just like Haskell does by implementing the defaults for == and /= as the inverse of each other:

public static class Equatable

{

public static bool NotEquals<T>(this IEquatable<T> @this, IEquatable<T> other)

{

return !this.Equals(other);

}

}

This is not perfect. It is not polymorphic; see my previous post for an extensive discussion of this and related points. And what about nulls? If '@this' is null, the default implementation is going to AV. We’d need to bake in type-agnostic knowledge of null again. Sigh!

Sadly, it turns out this whole approach in general isn’t quite right anyway. For two reasons:

  • First, we still infect the type in question with the interface being implemented; it cannot be done completely outside of the type’s definition, as with type classes.
  • Second, type classes in Haskell do not actually require a value of the type in question to dispatch against the class’s functions, whereas we clearly do in the above example: we need to virtually dispatch against the object, and rely on this virtual dispatch to execute different code for each type. This will come up as we look at the numeric classes, but it is a critical difference.

A closer analogy is to use IEqualityComparer<T>:

public interface IEqualityComparer<T>

{

bool Equals(T x, T y);

}

(IEqualityComparer<T> in .NET also has a GetHashCode method on it. Let’s ignore that for now.)

Unfortunately, if our IsIn method were to use IEqualityComparer<T> to do its job, callers would be required to pass an instance explicitly; we cannot infer a “default” comparer based solely on the T:

bool IsIn<T>(T x, T[] ys, IEqualityComparer<T> eq)

{

foreach (T y in ys) {

if (eq.Equals(x, y))

return true;

}

return false;

}

Type classes actually function rather similarly, with two major differences:

  1. The interface object – called a dictionary – is passed and used implicitly.
  2. The mapping from types to dictionaries is done implicitly, whereas in .NET you’ll need to find an instance of the interface in question through other means.

This second difference is solved by a little hack in .NET. If you take a look at the EqualityComparer<T>.Default property, you shall see a lot of slightly gross reflection code to return an instance of IEqualityComparer<T> for any arbitrary T. The code checks some well-known types and conditions, and ultimately falls back to the aforementioned interfaces and default Equals method for the most general case. It’s not pretty, but it’s a beautiful hack given the tools at our disposal in C#.

A Harder Case: Polymorphic Numbers, on Output Parameters

The Eq type class is easy. The functions it defines are polymorphic on their inputs, but not on their outputs; both == and /= return Bool values. Once we transition to polymorphic output parameters or return values, we encounter a pattern quite different from that which is found in most .NET interfaces.

Let’s illustrate these differences by looking at Haskell’s Num type class:

class (Eq a, Show a) => Num a where

(+), (-), (*) :: a -> a -> a

negate :: a -> a

abs, signum :: a -> a

fromInteger :: Integer -> a

Here we see another feature of Haskell type classes: inheritance. Num derives from both Eq and Show – indicated by “(Eq a, Show a) => Num a” – the latter class of which we have not yet shown but is the moral equivalent to .NET’s Object.ToString method. It enables pretty printing of values, clearly something that would be expected to be common among all numeric data types. Haskell’s numeric class hierarchy is quite elegant, enabling highly polymorphic computations. A nice little tutorial of can be found here: http://www.haskell.org/tutorial/numbers.html.

But the question at hand is what the C# equivalent would be.

Our first approach would be to mimic the IEquatable<T> solution above:

interface INumeric<T>

{

T Add(T d);

T Subtract(T d);

T Multiply(T d);

T Absolute();

T FromInteger(int x);

}

This works fine, and primitive types in .NET could presumably implement it:

struct int : INumeric { .. }

struct float : INumeric { .. }

struct double : INumeric { .. }

This enables polymorphic code, like a Sum method, through the use of generic type constraints:

public static T Sum<T>(params T[] values)

where T : INumeric<T>

{

T accum = default(T);

foreach (T v in values)

accum = v.Add(accum);

return accum;

}

This example works great. Why then, you might wonder, doesn’t LINQ use this instead of providing special-case overloads of Average, Min, Max, Sum, etc. for all well-known primitive data types?

The primary reason is the performance hit taken to perform addition through O(N) interface calls versus O(N) MSIL ADD instructions. It is just a basic fact of life that today’s leading edge separate compilation techniques will not achieve parity with the hand specialized variants. While it is true that the JIT compiler *could* specialize the code for specific Ts and specific interfaces to emit more efficient instructions, like int, float, etc. over INumeric<T> calls, it will not do so today. This reduces the ability to share code – which admittedly is what we want here – and is tangled up in a judgment call based on heuristics. But I digress.

There is a larger problem that arises with other examples, at least from a language expressiveness point-of-view: the need to have an instance in hand to invoke interface methods. FromInteger, for example, is rather awkward to write. In fact, we cannot write a method with INumeric<T> like we could in Haskell:

public static T MakeT<T>(int value)

where T : INumeric<T>

{

… ? …

}

How do we invoke FromInteger, given that no T is available at the time of MakeT’s invocation? You can’t; you need to arrange for an instance to be available. There are ways out of this corner. One solution is to mandate that T has a default constructor:

public static T MakeT<T>(int value)

where T : INumeric<T>, new()

{

return new T(value).FromInteger(value);

}

That is always acceptable for structs, since they always have such a constructor; but this practice requires that classes be designed to possibly not hold invariants at all times, and so is not always acceptable or at the very least requires design accommodation.

The alternative is probably obvious. Use a similar approach to IEqualityComparer<T>:

interface INumericProvider<T>

{

T Add(T x, T y);

T Subtract(T x, T y);

T Multiply(T x, T y);

T Absolute(T x);

T FromInteger(int x);

}

And now, of course, each method that does polymorphic number crunching must accept an instance of INumericProvider<T>. That’s particularly cumbersome, so it’s more likely that .NET developers would prefer the aforementioned approach, where the type must provide a default constructor.

Admittedly, I seldom run into this particular problem in practice; but when I do, I really wish I had something like Haskell type classes to help me out.

Before moving on, it is worth pointing out one Haskell type class problem that explicit interface object passing in .NET helps to avoid. Should you need multiple implementations of a given class for the same type, as is relatively common with equality comparisons, you must disambiguate in Haskell by separation by module and being careful about what you import. This is similar to C#’s extension methods. With explicitly passed interface objects, however, it is trivial to manage and pass separate objects if you’d like.

Close, but No Cigar: Higher Kinds

There is one last feature that Haskell provides – a pretty big one, I might add – that C# simply cannot do: higher kinded types, or polymorphism over constructed types. This feature is orthogonal to type classes, but gets used pervasively in conjunction with them. An example will make this stunningly clear:

class Monad m where

(>>=) :: m a -> (a -> m b) -> m b

(>>) :: m a -> m b -> m b

return :: a -> m a

fail :: String -> m a

m >> k = m >>= \_ -> k

fail s = error s

Let’s try to transcribe the core of this class in C#, renaming >>= to Bind, and omitting the >> and ‘fail s’ operators because they have default implementations:

public interface IMonad<M, A>

{

M<B> Bind<B>(M<A> m, Func k);

M<A> Return(A a);

M<A> Fail(string s);

}

This approach is tantalizingly close. It suffers from the already-admitted problem that, for any M<A> instance, you will need to pass the appropriate IMonad<A> provider object – just as with the IEqualityComparer<T> and INumericProvider<T> examples above.

But the code of course won’t *actually* compile, because the type variable M cannot be constructed as shown here. We find references to M<A> and M<B>, which are complete nonsense to C#. M is just a plain type variable. M is required to be what Haskell calls a type constructor (* -> *), which is a generic type that must be instantiated before it is a terminal type. I’ve written about this before. Although it seems like a trivial omission in C#’s language definition, it strikes at the heart of the type system.

A fictitious syntax for expressing this in C# might be:

public interface IMonad

where M : <>

{

...

}

And if, say, M were expected to be a two- or three-parametered type, we would find, respectively:

... where M : <,>

... where M : <,,>

And so on.

This could in theory work. But C# -- and more worrisome .NET and the CLR – do not support this presently, and, to be quite honest, likely never will. It is immensely powerful, however. Life without monads is a life destined to continuous repetition. The “LINQ Pattern”, for example, is one example case in .NET where, for each ‘source’ type, we must create a “copy” of the original System.Linq.Enumerable variant. And shame on those who wish to write polymorphic code that will work for any LINQ provider.

Winding Down

Let’s wind down. I need to go grab dinner at Mama's Fish House on Maui right now.

I hope to have shown some of the similarities and dissimilarities between type classes and interfaces, and some patterns that arise when these things are mixed with parametric polymorphism. The mix of inheritance for type classes, but not for implementation types, in Haskell is unique. C#, of course, allows inheritance both amongst interfaces and implementations which is both a blessing and a curse.

I do think both camps have something to teach one another. For example, having a default interface lookup mechanism for arbitrary types in C# would be wonderful, and indeed might provide a replacement for extension methods that has more longevity. I’m sure much of this will happen with time; either “in place” as the respective languages evolve, or as new languages are created with time.

But most importantly, I hope that the blog post was educational and fun. Enjoy.

2/28/2010 8:52:59 PM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, February 09, 2010

One of my comments in the 2nd edition of the .NET Framework design guidelines (on page 164) was that you can use extension methods as a way of getting default implementations for interface methods.  We've actually begun using these techniques here on my team.  To illustrate this trick, let's rewind the clock and imagine we were designing new collections APIs from day one.

Let's say we gave the core interfaces the most general methods possible.  These may neither be the most user friendly overloads nor the ones that most people use all the time.  They would, however, be those from which all the other convenience methods could be implemented.  An INewList<T> interface that was designed with these principles in mind may look like this:

public interface INewList<T> : IEnumerable<T>

{

    int Count { get; }

    T this[int index] { get; set; }

 

    void InsertAt(int index, T item);

    void RemoveAt(int index);

}

This interface is missing all the nice convenience methods you will find on .NET's IList<T>, like Add, Clear, Contains, CopyTo, IndexOf, and Remove.  So it's not really as nice to use.  You can't write an API that takes in an INewList<T> and performs an Add against it, for example, like you can with IList<T>.

One approach to solving this might be to write a concrete class -- much like .NET's System.Collections.ObjectModel.Collection<T> -- that provides concrete implementations of all of these methods, and then other lists can simply subclass that.  But we can do better.

Instead, let's give INewList<T> default implementations of all of these methods.  How do we do this?  That's right: with extension methods.  Voila!

public static class NewListExtensions

{

    public static void Add<T>(this INewList<T> lst, T item)

    {

        lst.InsertAt(lst.Count, item);

    }

 

    public static void Clear<T>(this INewList<T> lst)

    {

        int count;

        while ((count = lst.Count) > 0) {

            lst.RemoveAt(count - 1);

        }

    }

 

    public static bool Contains<T>(this INewList<T> lst, T item)

    {

        return lst.IndexOf(item) != -1;

    }

 

    public static void CopyTo<T>(this INewList<T> lst, T[] array, int arrayIndex)

    {

        for (int i = 0; i < lst.Count; i++) {

            array[arrayIndex + i] = lst[i];

        }

    }

 

    public static int IndexOf<T>(this INewList<T> lst, T item)

    {

        var eq = EqualityComparer<T>.Default;

        for (int i = 0; i < lst.Count; i++) {

            if (eq.Equals(item, lst[i])) {

                return i;

            }

        }

        return -1;

    }

 

    public static bool Remove<T>(this INewList<T> lst, T item)

    {

        int index = lst.IndexOf(item);

        if (index == -1) {

            return false;

        }

 

        lst.RemoveAt(index);

        return true;

    }

}

Well isn't that neat.  We've now given any INewList<T> implementations all these common methods without dirtying their class hierarchies, built atop a tiny core of extensibility.  This is much like .NET's Collection<T> which exposes the core as abstract methods.  Indeed, we can go even further.  Any convenience overloads, like the multitude of CopyTos on List<T> in .NET, can be given to all INewList<T>'s also.  And yet implementing INewList<T> remains as braindead simple as it was before: two properties and two methods.  In fact, it's simpler than doing a more feature-rich IList<T>, because the convenience methods come for free.

It would be even niftier if you could add these methods straight onto INewList<T>, and have the C# compiler emit the extension methods silently for you.  In other words:

public interface INewList<T> : IEnumerable<T>

{

    ... interface methods (as above) ...

 

    void Add(T item)

    {

        InsertAt(Count, item);

    }

 

    void Clear()

    {

        int count;

        while ((count = Count) > 0) {

            RemoveAt(count - 1);

        }

    }

 

    ... and so on ...
}

Although this would just be sugar for the NewListExtensions class shown earlier, it sure saves some typing and makes it the pattern more apparent and first class.

Though cool, this whole idea is certainly not perfect.

For one, there are no extension properties.  So you can't use this trick for properties.

But the more obvious and severe downside to this approach that these methods are not specialized for the given concrete type.  For example, the Clear method is potentially far less efficient than a hand-rolled List<T>, because it does O(N) RemoveAts rather than a single O(1) fixup of the count.

Recall now that the compiler binds more tightly to instance methods than extension methods.  So we could implement our own little list class with a faster Clear method if we'd like:

class MyList : INewList<T>

{

    ... the two properties and two methods from INewList<T> ...

 

    public void Clear()

    {

         .. efficient! ...
    }

}

Now when someone calls Clear on a MyList<T> directly, the compiler will bind to the efficient Clear.

This is still not perfect.  If you pass the MyList<T> to an API that takes in an INewList<T>, any calls to Clear will fall back to the extension method.  Extension methods are not virtual in any way.  You can try to simulate virtual dispatch, but it gets messy quick.  For example, say we defined an IFasterList<T> that includes all those convenience methods that lists frequently want to make faster; we can then do a typecheck plus virtual dispatch in the extension method.

For now, let's pretend that's just the Clear method:

public interface IFasterList<T> : INewList<T>

{

    void Clear();

}

Of course, MyList<T> above would now implement IFasterList<T>.  Invocations through IFasterList<T> will automatically bind to the faster variant; but if objects that implement IFasterList<T> get passed around as IList<T>s, you lose this ability.  So the Clear extension method can now do a typecheck:

    public static void Clear<T>(this INewList<T> lst)

    {

        IFasterList<T> fstLst = lst as IFasterList<T>;

        if (fstLst != null) {

            fstLst.Clear();

            return;

        }

 

        int count;

        while ((count = lst.Count) > 0) {

            lst.RemoveAt(count - 1);

        }

    }

This works but is obviously a tedious and hard-to-maintain solution.  It would be neat if someday C# figured out a way to "magically" reconcile virtual dispatch and extension methods.  I don't know if there is a clever solution out there.  I am skeptical.  Nevertheless, despite this flaw, the above techniques are certainly thought provoking and interesting enough to play around with and consider for your own projects.  And at the very least, it's fun.  Enjoy.

2/9/2010 6:21:49 PM (Pacific Standard Time, UTC-08:00)  #   

 Friday, January 08, 2010

Sometimes you need to wait for something before proceeding with a computation.

Perhaps you need to know the value of some integer that is being computed concurrently.  Maybe you need to wait for the bytes to flush to disk before telling another process the file is consistent and ready to read.  Or you need to get that next row back from the database before painting it on the UI.  It could be that you need to wait for the missile to leave the bay before closing the bay door.  And so on.

And sometimes there’s simply nothing better to do while waiting for these things to happen other than to let the CPU halt (or let other processes on the machine run).  You need to twiddle your thumbs a bit, and exhibit a little patience.  Or at least your program does.  This is simply an unfortunate fact of life.

This manifests numerous ways in our programming models:

        1) Waiting on an event.
        2) Waiting to acquire an already-held lock.
        3) Finding that the GUI message queue is empty and doing a MsgWaitForMultipleObjectsEx.
        4) Finding that the COM RPC queue is empty and doing a CoWaitForMultipleHandles.
        5) Issuing an Ada rendezvous ‘accept’ and finding that no messages await you, thus blocking.
        6) Issuing an Erlang ‘receive’ and finding that no messages await you, thus blocking.
        7) Waiting on a .NET 4.0 task.
        8) Issuing a ContinueWith on a .NET 4.0 task.
        9) And so on.

There are three big distinctions to make about the characteristic nature of this waiting: namely, (1) what condition's establishment is being sought -- i.e. the reason for the wait, (2) whether multiple such conditions of interest may be waited on simultaneously, and, related, (3) whether waiting for said condition(s) necessarily means that the processing of some other conditions that may arise elsewhere, but require the blocked context to run, cannot occur.

I will be the first to admit that this statement is rather abstract.  But it really does matter.

For example, MsgWaitForMultipleObjectsEx is a pumping wait.  Not only do you wait for the occurrence of one of several events to get set, but the arrival of a new top-level message at the message queue (either GUI or COM RPC-related) causes immediate processing of that message, presuming the thread is blocked at that call at the time.  Although you can be deeply nested in some complicated code, you “jump” to the event loop to run the message handling code.  Vanilla WaitForMultipleObjectsEx works in a similar way vis-à-vis APCs, provided the wait is alertable.  This is quite different from a fully blocking non-pumping wait, which only waits for one or more very specific events, but does not dispatch messages simultaneously.

Win32 esoterica aside, the concepts appear elsewhere.  The moral equivalent in Ada or Erlang is to do a selective-accept or -receive, intentionally not dispatching certain messages that might arrive in the meantime.  (To be fair, you can also do this in COM with message filters.)  This often happens when you nest accepts and receives.  You may be capable of processing messages A-Z at the top-level tail recursive loop; but if that nested accept only knows about message kinds M and N, then there are 24 other kinds that will not be picked up in the meantime.

Not pumping for messages is dangerous.  And it can lead to deadlock if you pump for the wrong ones.  Like if you’re accepting M or N, yet the triggering of M or N depends on first processing some message K waiting in the queue.  COM RPCs with cycles run face first into this.  And/or not pumping can lead to responsiveness and scalability problems.  Perhaps M or N eventually does arrive, yet little old K needs to wait an indeterminate amount of time before it is seen.  Whereas we could have overlapped its processing.  This is why most STAs pump while waiting, and, similarly, why many Erlang processes consist of a main loop that is prepared to handle any message the process accepts at that top level loop.  They may seem very different but they are strikingly not.

Yet paradoxically pumping for messages is also dangerous.  You must predict all the kinds of messages that may reentrantly get executed, and your state at the point of the blocking call must be consistent enough to tolerate them.  (At least those that will actually happen.)  In COM STAs, this can be wholly unpredictable and indeed because the CLR auto-pumps on STAs the blocking points can be hidden.  Overly aggressively admitting messages may seem like the right thing to do, until you’ve wedged yourself into some unforeseen inconsistent state.  You can avoid this by making each message handler atomic; see Argus.  But if you can't or don't have the discipline to do that, or aren't quite sure, you must not pump.  You either avoid pumping altogether or you selectively pump messages that do not touch the state encapsulated by the pump.  Or you lock access to state with a non-recursive lock and run the risk of deadlock.

I have found it clarifying to think about blocking in event loop concurrency and state machine terms, advancing from one state to the next in between waits.  It’s a slippery model, but particularly when working in message passing systems that employ event loops, it can help to identify all the familiar problems with shared memory, blocking, and consistency.

Indeed it is interesting how blocking and non-blocking systems can rapidly approach each other.  Starting from either extreme tempts you to tiptoe closer and closer to the middle.  The familiarity of the other extreme tempts you.  Until, alas, you just might meet in the middle.

1/8/2010 12:11:39 AM (Pacific Standard Time, UTC-08:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy