|
Personal Info:
Joe  leads the architecture of an experimental OS's developer platform, where
he is also chief architect of its programming language. His current mission is to enable
writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe
founded the Parallel Extensions to .NET project.
He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife,
writing books, writing music,
studying music theory & mathematics, and doing anything involving food & wine.
My books
My music
Disclaimer:
The content of this site are my own personal opinions and do
not represent my employer's view in anyway.
© 2012, Joe Duffy
|
|
 Tuesday, November 04, 2008
Type classes, kinds, and higher-order polymorphism represent some of Haskell’s most unique and important contributions to the world of programming languages. They are all related, and began life as type classes in Wadler and Blott’s 1988 paper, How to make ad-hoc polymorphism less ad hoc. Eventually, Jones introduced the (then separate) concept of constructor classes, in his 1993 paper, A system of constructor classes: overloading and implicit higher-order polymorphism. Eventually these two ideas were unified into a beautiful single set of features (namely, type constructors and kinds) in Haskell.
In this short essay, I’ll explain what these things are and why I’m sad that we don’t have them in C#.
To take the simplest motivating example, say we want to define a generic square function:
square x = x * x
Given a Hindley-Milney type system (with type inference), how should the compiler type this function? The challenge that immediately arises is that, to know the type of x and the function’s return value, we must know something about the function * being called within the body of square. But to know something about that function, we’d need to know the type of x. We’ve entered into a cycle, and have hit a wall. Clearly the type will be something generic, but polymorphic on what?
Imagine that we could infer the type of the * function as follows:
(*) :: a -> a -> b
In other words, * is a function that takes two values, both of type a, and produces some value of another type b. We know its two arguments must be of the same type because in square we pass the same value x to it twice. Given this typing for *, we could then type square similarly as:
square :: a -> b
In other words, square takes a single value of type a and produces a value of type b. The constraint on the type a here is, of course, that some function * is available that is typed as taking an a as input. There’s no obvious way to capture this in the type system, though we might conceive of something like:
square :: (* :: a -> a -> b) => a -> b
In other words, given a type a for which some function * is defined, which takes two a’s and returns a single b, the type of square thus takes an a and produces a b. You can’t say that in Haskell, although we’ll see a bit later that type classes allow similar constraints (with “=>”) to be written.
While this hypothetical typing is extremely general purpose, it would produce considerable challenges in its implementation. Standard ML throws up its hands and infers all mathematical operators (like *) as working with floats, meaning that all of the types above (both a and b) will be inferred under the type of float. (*) is of type float -> float -> float, and square is of type float -> float. Similarly, F# assumes you’re working with ints. Both Standard ML and F# have amazingly rich type inference systems, but this begins to run right up against the limits of what they can do. We’ll see some harder examples shortly.
You can probably guess that Haskell’s solution to this conundrum is to use higher order polymorphism with a feature of its type system called type classes. They allow us to classify types much in the same way types ordinarily classify objects. We can classify the set of numeric types as follows, for instance:
class Num a where
(*) :: a -> a -> a
… other numeric operations …
And then we can go ahead and provide concrete mappings for integers and floating point numbers:
instance Num Int where
(*) = addInt
…
instance Num Float where
(*) = addFloat
Each instance of the type class (in this case, Num) is a bit like a dictionary mapping the named functions (in this case, just *) to other functions that are defined for the concrete type (in this case, supplied in a’s stead). With this information defined, the Haskell compiler can now infer the type of square as:
square :: Num a => a -> a
This inference really just says that the function square is defined for all types a that are in the type class Num. The “Num a =>” part is a bit like a C# generic type constraint, in that it restricts what kinds of a’s can be supplied. Given what has been stated thus far, that’s just Int and Float. So we can only call the square function with types on which multiplication is properly defined, which is exactly what we want.
At this point, we might want to try defining a similar thing in C# using generics. (And for this simplistic example, and others like Haskell’s Eq a type class, we will succeed.) There are two basic ways we could achieve this. The first is to define an INum<T> interface (or abstract class—pick your poison), and give it an instance method to multiply the target with another number:
interface INum<T> {
T Mult(T x);
}
We would then have the basic numeric data types like Int32 and Float implement INum<T>:
struct Int32 : INum<Int32> {
public Int32 Mult(Int32 x) { return value * x; }
…
}
struct Float : INum<Float> {
public Float Mult(Float x) { return value * x; }
…
}
Given these definitions, it would be a breeze to write a Square method that only operates on INum<T>s:
T Square<T>(T x) where T : INum<T> { return x.Mult(x); }
Thankfully, we can recursively reference the T from within the generic type constraint.
Now, of course, there’s no way the C# compiler would infer the necessary INum<T> constraint. But given that we don’t have rich type inference (aside from for local variables) in C#, this doesn’t pose any new problems. Another slight annoyance is that you need to modify the source type to declare support for INum<T>, when a perfectly reasonable implementation could have been provided “from the outside,” but you’ll find that this will only occasionally get under your skin.
The second way we might go about this is to take an approach similar to .NET’s EqualityComparer<T> class, where we have an abstract base class that represents the ability to do something with instances of Ts. And then we only provide implementations on concrete Ts for which that ability makes sense. For example, we could have a Multiplier<T> that looks a lot like INum<T>:
abstract class Multiplier<T> {
public abstract T Mult(T x, T y);
}
Multiplier<T> on its own isn’t usable. But we can provide implementations for Int32 and Float:
class Int32Multiplier : Multiplier<Int32> {
public override Int32 Mult(Int32 x, Int32 y) { return x * y; }
}
class FloatMultiplier : Multiplier<Float> {
public override Float Mult(Float x, Float y) { return x * y; }
}
// And so on …
Now we can write a slightly different Square method that takes a Multiplier<T> as an extra argument:
T Square<T>(T x, Multiplier<T> m) { return m.Mult(x, x); }
Now there isn’t any kind of generic type constraint on Square’s T, but of course we can only call it if we have a concrete instance of Multiplier<T> in hand. And by definition that means there is a Mult method defined that we can call. (This isn’t wholeheartedly true. You can of course call Square<U> for any U, passing in null as the second argument. But presumably the method would check for null and throw. This is a real limitation, however, which would likely push us back in the direction of the original interface solution. If we had non-null types, we could get closer to a fully statically verifiable solution.)
Aside from a lot more typing, and the lack of rich type inference, we seem to have reached parity. The simple examples provided in the literature and Haskell’s Standard Prelude can be implemented in such a fashion. But we are kidding ourselves if we think these are the same thing.
The main problem is that C# doesn’t support higher-kinded type parameters. We haven’t yet seen a type class in Haskell that fully exploits this capability, but there are several. The simplest one I know about in the Haskell Standard Prelude is the Functor type. (Monad is also a great example, but is a bit more complicated (and sufficiently frightening) that this will be a topic for another day.) Functor’s definition is:
class Functor f where
fmap :: (a -> b) -> f a -> f b
The Functor type class offers a single function, fmap. It takes two things—a function that transforms a value of type a into a value of type b and some functor value of type f a—and returns some new functor value of type f b. This looks like an ordinary type class, except for one funny (and subtle) aspect. Functor abstracts over type f, but notice that we’re using f in fmap’s second argument and return type by actually constructing it with two other types a and b! In case you’re having a hard time thinking in Haskell, it’s as though we tried to write this in C# using our interface trick from earlier:
interface IFunctor<T> {
T<B> FMap<A, B>(Func<A, B> f, T<A> a);
}
This won’t compile. We can’t refer to T in the typing of FMap as T<B> and T<A>: it’s not expressible in C# and .NET’s type system. Let’s pretend for a moment, however, that we could. What is an example of class that might implement this? How about something that deals in terms of Nullable<T> instances?
class NullableFunctor<T> : IFunctor<Nullable<>> {
Nullable<B> FMap<A, B>(Func<A, B> f, Nullable<A> a) {
return new Nullable<B>(f(a.Value));
}
}
All you need to do is take a close look at a 1997 paper by Simon Peyton Jones, Mark Jones, and Erik Meijer, entitled Type classes: an exploration of the design space, and you will find a plethora of even more complicated (and useful) examples that use an innocent-sounding aspect of Haskell’s type system called multi-parameter type classes. All of the types are higher-order and are merely moved around and manipulated like abstract (higher-order) symbols. The type system gracefully gets out of the way and allows you to drop abstract type parameters into any holes they fit in, without mandating that you say too much. The secret sauce—as noted earlier—is kinds.
Kinds are used in the implementation of Haskell’s type system, and you won’t mention a whole lot about them anywhere. They basically categorize what kind of types can appear anywhere a type is expected. A great overview (with plenty of context) can be found in Mark P. Jones’s Functional Programming with Overloading and Higher-Order Polymorphism paper and, of course, the Haskell 98 Report.
Here’s a quick rundown. Kinds appear in one of two forms:
- the symbol * represents a concrete type (a.k.a. a monotype), and,
- if k1 and k2 are kinds, then k1 -> k2 is the kind of types that take a type of kind k1 and return a type of kind k2.
Kinds are formed in many ways: the primitive types (such as Char, Int, Float, Double, etc.) are an example of the former, and are of kind *. They “bottom out.” Type constructors, however, like Functor are an example of the latter, and are of kind * -> *. That is, they take a kind k1 (the first *) and produce another kind k2 (the second *). By giving some concrete type T (*) to Functor, we get back a Functor T (also *). The latter is therefore a bit like a function mapping one kind to another. Functions have a kind of * -> * -> *, because a function has two types: the type of arguments (the first *) and the type of its return value (the second *). These compose, so that you might have (* -> *) -> * -> *. And so on. Thinking about kinds can take a bit of getting used to.
But the really useful thing here is that kinds allow you to write higher order type constructors like those we have begun to explore above, like Functors and Monads. I.e., given a type t1 of kind k1 -> k2, and a type t2 of kind k1, then t1 t2 is a type expression of kind k2. This can be applied to the occurrences of f a and f b in Functor’s fmap function. In the type Functor f they are of kind * -> * -> *. When a concrete Functor instance is specified, e.g., by substituting T for f, this turns fmap’s T a and T b arguments to kind * -> *. That is, they still both expect another kind before bottoming out. And therefore we can substitute some concrete U and V types for a and b, to reduce them from kind * -> * to kind *.
Now we’re done. And, as if by magic, it all works.
 Sunday, November 02, 2008
A few months back, while writing my new book, I whipped together a tool to dump information about your processor layout using the GetLogicalProcessorInformation function from C#. You can find the code snippet in Chapter 5, Advanced Threads, of my book. (A developer on the Windows Core OS team, Adam Glass, had also written a similar tool in C++.) I will be posting code to the companion site for my book in the coming weeks, at which point you can easily get your hands on it.
Anyway, I sent the code to Mark Russinovich suggesting it might make a useful SysInternals tool, and he agreed. Now it's up on microsoft.com for download, under the name of Coreinfo: http://technet.microsoft.com/en-us/sysinternals/cc835722.aspx. When run, Coreinfo pretty prints information about the mapping from cores to sockets, cores to NUMA nodes, and what kinds of caches are shared on the machine. Particularly for somebody like me who is always running code on different kinds of machines -- and given that parallel code performance heavily depends on memory hierarchy -- I've found this tool to be invaluable and very helpful. Enjoy.
 Friday, October 31, 2008
Dan Grossman invited me to deliver a talk as part of the University of Washington's Computer Science and Engineering Colloquia series. It was recorded and will eventually air on UWTV, but has also been posted online:
Microsoft's Parallel Computing Platform: Applied Research in a Product Setting
The goal of Microsoft's Parallel Computing Platform (PCP) team is to enable the shift to modern, multi- and manycore hardware, by providing a runtime, programming models, libraries, and tools that make it easy for developers to construct correct, efficient, maintainable, and scalable programs through the use of parallelism. In doing so, tens of years of industry research has been combined and applied in a myriad of ways. This talk examines PCP's current progress, explicitly relating it to specific research of the past and present, in addition to surveying future efforts and possible research opportunities.
http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=768
<WMV - streaming, WMV - download, ...>
If you're not aware of the work we're doing in Visual Studio 2010 -- both in .NET 4.0 and C++ -- this talk gives a pretty good overview of all of it. It has a researchy feel to it, with plenty of pointers to interesting prior research that has influenced our work along the way.
 Thursday, October 30, 2008
I sat down last week to record a handful of interviews with some folks from Pearson Education.
They are now live on the InformIT website:
Apologies for the Quicktime-only format.
 Thursday, October 02, 2008
The word “architect” means different things to different people in the context of software engineering. And it varies wildly depending on the kind of organization you’re in. An architect at a medium sized IT shop might focus on connecting disparate business systems together at a high level, but without diving down into code. An architect at a startup may be more like a tech lead, checking in code like mad, but also keeping the rest of the team in check. And a software architect at Microsoft can play an even varied number of roles because the company is so large and diversity of projects so great.
A colleague and mentor of mine who I respect greatly says that an architect is the guy (or gal) who is in charge of making those decisions which, if made incorrectly, could sink the project.
There is a lot to be said for this. These decisions are those with the broadest, deepest, and longest lasting impact. The decisions themselves are often made by team members initially, but the architect is responsible for providing constant and rigorous technical oversight. Architects set the high level technical agenda, look ahead several releases, and keep the team on course. They are ultimately to blame if the technical foundation is unsound and/or final solution fails to meet expectations. Their butt is on the line.
On one hand, an architect is the lead engineer with most at stake in the project. On the other hand, an architect is more like a member on the project’s board of directors, providing high level guidance and meddling as little as possible (but as much as is necessary) in the day-to-day details.
An architect’s success is measured by what he or she ships to customers, and not by the amazing ideas that were ultimately never realized. This necessarily means an architect’s success is deeply rooted in the team’s culture, work ethic, and ability. He or she needs to work through others to get things done.
There have been some great architects throughout the course of computer science, but who may not have been labeled as such. Linus Torvalds is the architect of Linux, and David Cutler the architect of Windows NT. John Backus was arguably the architect of FORTRAN, Niklaus Wirth the architect of Pascal, Bjarne Stroustrup the architect of C++, James Gosling the architect of Java, and Anders Hejlsberg the architect of C#. Bill Gates was the architect of Microsoft BASIC, and Charles Simonyi the architect of the initial versions of Microsoft Office (Word and Excel). In each case, you can see that the end result is very reflective of one person’s value system and ideas, but took a lot more than just that person to be successful. Each of these people learned to let go of their project just enough that it could achieve the scale that it was meant to achieve, but not so much that the project veered off course. Some projects have multiple architects, but the successful ones usually have one who is really in charge.
Already you can see some subjective opinion being thrown into the mix, and some of it is apt to be controversial. Although not comprehensive, I’ve put together seven guiding principles that I personally aspire to. I’ve certainly not mastered them all, but have always looked up those people around me who seem to have. Why seven? No reason, really. Over the past few years, I’ve tried to spend as much time as possible learning from successful architects, and these stand out in my mind as being the key common attributes that appear to be common among them.
0. Inspire and empower people to do their best work.
Architects ultimately succeed or fail based on the quality of people on their team. Knowing how to inspire and empower these people, so that they can do their best work, is therefore one of the most important skills an architect needs in order to be successful.
You can’t do it all yourself. This can be frustrating at times, and at times you might think that you can (particularly in times of frustration). I’ve personally hacked together 1,000s of lines of code that I’m incredibly proud of in a weekend, and that would have taken weeks or months to get done if I had to instead explain the idea to somebody else and wait for them to write those same 1,000s of lines of code. And the 1,000s of lines they write of course wouldn’t end up being the same as the ones you’d have written. And they may decide that they don’t like the design after all, start discussing it with colleagues, stage a mutiny, and ultimately overthrow what once seemed like a great idea. This is a tough pill to swallow. But it’s a sad fact of life that you need to learn to be comfortable with.
The same thing would have happened if you were the one to implement the idea, of course; the difference is that somebody else needs to be empowered to take the kernel of an idea, and run with it. That entails reshaping it as necessary to make it realistic and successful.
I’m not suggesting architects don’t write code (quite the opposite: see #3 below), but you can’t write it all (except for very small projects). If you buy the argument that an architect is just the leading senior engineer on the project, then by definition the architect is probably qualified to write quality code quickly. But what about the code they don’t write? Other people on the team need to write it, and the architect needs to have enough time (where he or she isn’t hacking code) to inspire those people to write the right code. This takes energy and effort. You need to paint a compelling picture of the future, but with enough open-endedness such that the team can flex their creative muscles and fill in the details.
This is the only way to scale. And architects need to scale to achieve broad impact.
Architects should also welcome all ideas with open arms. You want to foster an open and energetic environment on your team, where intellectual debate is the norm. All ideas are fair game.
That’s not to say all ideas are good ones, and ultimately the bad ones need to die a quick and painless death before going too far, but an architect who won’t even entertain new ideas from the team (typically because of NIH syndrome (i.e., Not Invented Here)) often drive away the best engineers. Great engineers hate to be told what to do. They don’t want to feel like they are walking in the shadows of somebody else. They want to use the skills that make them so great, which involves inventing bigger, badder, and more impactful designs. And you want them to use these skills too, because that’s why you hired them: these skills are crucial to the success of your project. Part of your role as the team’s architect is to recognize who on the team has the most potential, and to arrange for them to have as much leeway and creative freedom as possible. You don’t want to end up with a bunch of lackeys whose job is to “just implement” your ideas, because you’ll get what you paid for.
It’s a true sign of success when the culture you impart unto your team allows them to invent things in the spirit of your own design principles, but without you needing to do it yourself. Jim Gray, for example, inspired countless people to do great things. Does he get credit for each of those ideas? Of course not. But was he indirectly responsible for them to some degree, and do they all have a little Jim Gray in them? Absolutely. Being an architect on a team is similar; not every idea has to be your own. In fact, it’s far more powerful if few of them are.
1. Oversight, but not dictatorship.
That brings me to technical oversight. Because an architect is typically not a manager for his or her project (although in some cases he or she may be), arms-length influence needs to be used to get things done. In fact, the architect may have very little to say over specific project management, scheduling, and budget decisions, but is typically on the senior leadership team for the project. So when I talk about “leeway” above, I’m talking about the degree to which an architect monitors and attempts to meddle with the progress of the team. While it’s tempting for an architect to set the ship sailing to sea, and then turn around to work on the next big thing, this almost never works. The initial vision and idea is far from a shipping solution, and software engineering only gets interesting once you actually try to build something. Ideas are cheap. The architect needs to help the team work through the ramifications of certain technical decisions that were made up front, and help with the continual course correction.
Because an architect’s butt is ultimately on the line, he or she needs to work as fast as possible to correct problems when something goes wrong. This implies the architect is involved enough to notice when something goes wrong, hopefully well in advance of anybody else seeing it. I’ve seen many models that work, ranging from the architect being the approver for all major design decisions, to the architect simply reviewing all major design decisions after-the-fact, to the architect delegating this responsibility to trusted advisers. For example, Linus Torvalds for the longest time required that all checkins to the Linux code base be reviewed by him. Anders Hejlsberg still effectively approves each C# language design change. In my opinion, the closer to each major decision the architect can afford to be, the better.
Left to its own devices, the team would veer off course in no time. That’s not because of malicious intent, but rather because of the sheer diversity of software engineers. This diversity is present on many levels: in skill level, taste (which is hard to measure: more on that in #2 below), motivation, work ethic, interpretation of the vision, personal beliefs and experience, and so on. An architect acts as a low-pass signal filter, smoothing out any irregularities that deviate too far from the core design principles.
In Tony Hoare’s ACM Turing Award paper of 1981, The Emperor’s Old Clothes, he explains the risk of not providing this kind of architectural oversight:
“’You know what went wrong?’ he shouted - he always shouted – ‘You let your programmers do things which you yourself do not understand.’ I stared in astonishment. He was obviously out of touch with present day realities. How could one person ever understand the whole of a modern software product like the Elliott 503 Mark II software system? I realized later that he was absolutely right; he had diagnosed the true cause of the problem and he had planted the seed of its later solution.”
Sadly, this responsibility often entails being “the bad guy”. Sometimes you need to mercilessly kill an idea because it would put certain parts of the project at risk. Other times you need to let somewhat bad (but not too impactful) ideas go. There’s a tradeoff here, because each time you kill an idea you’re going to leave somebody feeling burned. And you may waste peoples’ time, depending on how much time has already been invested in that idea. Some battles are best left unfought. There is an art to be learned here: if you can get those with the idea to firmly believe that there has to be a better way, you can avoid being seen as the bad guy. “Sit back and wait” can work in some cases, but it can backfire too.
The deep involvement in the technical design details unfortunately means that the architect can become the bottleneck if he or she is not careful. This can slow the team down. Some slowdown can admittedly be a good thing, because it has the effect of forcing more thoughtfulness in each and every decision. But as the team grows, the granularity of decision oversight necessarily has to change to ensure the team is empowered to make progress. In order for this to work, you need to have trusted individuals who are involved at a finer granularity and will use the same principles and values. This takes trust and time.
2. Taste is a hard thing to measure, but is invaluable.
Software engineers like to measure. Many people try to make design decisions based on quantitative data, even though they know that engineering is more of an art than a science. But there is one common trait that, as far as I can tell, is impossible to measure, and yet common to all of the great software architects I know: good taste. And because it’s impossible to measure, those who lack it have a hard time understanding the difference between a design with good taste and one with bad taste.
There is a certain elegance and beauty to the designs created by architects with good taste. When you see it from a distance, you know it, but when viewed under a microscope—the kind of microscope used when debating the finer points with other engineers on the team—it is much harder to detect. Often it’s incredibly difficult to articulate why some particular design has good taste, which makes it even harder to justify. Eventually people are willing to trust your judgment because they begin to see it too.
In fact, good taste is perhaps one of the most important skills an architect needs to have. Bad taste leads to clunky designs that nobody likes to use. Steve Jobs knows this. And yet taste is probably the most difficult skill for an architect to develop, and one of the subtler ones that few people recognize as being necessary. Many managers think that throwing more engineers at a design problem will solve it, when in reality often all that is necessary is one person with very good taste and an eye for detail.
I’m not certain where taste comes from: an innate skill? Perhaps, but not exclusively. In my best estimation, good taste can be learned from paying close attention to the right things, taking a step back and viewing designs from afar often enough, being learned in what kinds of software has been built and was successful in the past, and having a true love of the code. That last part sounds cheesy, but is true enough to reemphasize: if you don’t feel a certain passion for your code and project, it’s a lot easier to let bad taste run rampant, because your care level isn’t as intense as it needs to be.
3. Write code and get your hands dirty.
The best architects realize that code is king. It rules all else. At the end of the day, Visio diagrams, high level vision documents, whiteboard works of art, design documents, emails, functional specifications, and so on, are all a means to an end, not the end itself. The code is your product, and if you don’t understand the code, you don’t understand the state of the project. And if you don’t understand that, you’re not in a position to know what’s working well, what isn’t working, and you can’t possibly have the deep understanding necessary to influence the engineers on the team. You’ve lost control.
The worst architects couldn’t code themselves out of a cardboard box. If you’re not writing actual product code, you’re not an architect: you’re an ivory tower has-been, and probably doing more damage than you are helping matters. Do your team a favor and move into management as quickly as possible.
Writing code also has the benefit of ensuring that you maintain credibility with the team. It’s easy to dictate crazy and grandiose ideas, but if you’re the one who has to implement such a grandiose idea, you’re apt to be more sympathetic with and mindful of the other engineers of the team. You need to keep yourself grounded and writing real product code will help to ensure your technical decision making carefully considers the implementability and down-to-Earth ramifications of your decisions.
Moreover, you need to be a programming expert. People need to respect your abilities, and you want your team to look up to you. You want them to come and ask for your advice because they want it, and enjoy it, and not force them to deal with you simply because of your position on the team. All of the great architects I’ve worked with have inspired me to grow simply because they know so damn much, and because I learn something new every time I interact with them. If they didn’t write code and understand the nitty gritty technical esoterica, this relationship would have been a shallow one.
4. The power of the dyad: know your weaknesses.
Architects need to play a dual role in understanding both business and technical needs and strategy. The degree of business savviness varies greatly among architects, although the best architects I know have a unique ability to understand both sides of the coin. But at the end of the day, they are first and foremost technology wonks, and the business angle is more of a curious hobby. In music, two notes sounding together form dyad, while three or more form a chord. The best architects I know realize their relative weakness on the business end of things and partner up with another senior leader with complementary skills, to fill in the gaps: this forms a harmonic interval. A dyad.
The partnership needn’t entirely be “business” vs. “technical”, although in commercial software that’s more often than not the two opposing forces. For example, my impression of the development of Scheme is that Guy Steele played the role of the architect while Gerald Sussman was the more business-oriented advisor, looking at how Scheme might be used to advance the broader research agenda but not necessarily meddling in the technical design details of the project.
If an architect is 80% technology and 20% business, partnering with somebody who is 20% technology and 80% business can be a killer combination. This allows you to bounce ideas off one another, and to get a certain level of objective feedback from a different perspective. If you’ve got a great technical idea, and bounce it off another techno-nerd, you might spend hours or days debating technical details that ultimately boil down to a matter of taste. But if you take that same idea and bounce it off your business partner, he or she is likely to provide more pertinent feedback: does it make sense from a business perspective, will customers need it, will it open up new product or revenue opportunities, are there more pressing matters to focus the team on, etc. These are things that, being a technology guy (or gal), wouldn’t immediately come to mind. But remember: it’s all about the customer.
5. It's for the customer, not you.
The best engineers often succeed because they focus on scratching a personal itch. That’s what Linus Torvalds, Bjarne Stroustrup, and countless others did. This is why Donald Knuth created TeX. The idea for a new technology thus begins as a very personal and selfish act. “Build something you’d use yourself, and the customers will come” is a common (cliché) idiom. Although there is certainly truth to this, it’s true only because the very fact that it is bothersome to the founding engineer is likely indicative that it’s bothersome to a broader set of people. It’s an example, where an example is just one element in a set that is used to demonstrate some common attribute among all elements in that set. Those people are your customers.
As a technology matures, it’s important to realize—particularly when building commercial software—that actual human beings will want to use the technology. It’s important to understand and respect their needs. It’s important to, at some point, realize that you’re not, in fact, building a system entirely for your own personal use. Not realizing this point can blind you and make you neglect the need to partner with somebody who understands the business angle of things. It can also lead to a feeling of needing to develop the perfect idealized solution and never ship to customers. Hey, when there are endless technical problems to work on, who would want to ship anyway? By its very definition, shipping software means that you’ve solved all of the major technical problems within a certain scope. What fun is that?
The fun is that you’re able to make an impact on your customers’ lives, hopefully for the better. Your initial technical vision has come to fruition, and you can move on. You get to prove your ideas by having real human beings to use the end product. If you never get to that state, then you’ve done some possibly interesting research—which is hopefully documented and used by somebody someday in the future to actually impact people by delivering a system based on those ideas—but you haven’t architected a product. You’re a researcher, not an architect.
6. Admit when you're wrong, fall on your sword, and then fix it.
You are going to be wrong sometimes. Trying to do big and bold things necessarily involves some risk. Being an architect requires a careful balance between sticking to your guns—your guiding principles and technical vision—and realizing when things aren’t working out and course correcting before it’s too late. It’s hard to tell when things are beginning to go off course, but when they’ve already gone off course it’s usually obvious. A common telltale sign that things are in trouble is when the team no longer believes in the vision. This may translate into attrition (often of your best engineers first), or just hallway grumblings. Listen carefully. If you’re not involved in the design decisions, writing code, and actually playing a significant role in your team’s daily lives, then you’re apt to miss this. As the architect, you are responsible for responding as quickly as possible to such situations before the shit hits the fan.
Some architects can fall into the trap of using dogma over intellect. Firm principles are of course something I’ve stressed throughout this article. But you need to be honest with yourself and admit when things are not going well. An architect who stands at the helm of a sinking ship, proclaiming that the ship stay its course because the brave new world lies ahead, will only drown (alone) when the ship finally goes underwater. Although this architect can then go around blaming his team for the failure (“if they had only seen the vision and stuck around, we would have succeeded”), the project will be long gone by then. It’s harder, but more noble, to recognize the problems proactively and do your best to fix them.
For example, Tony Hoare describes in the same ACM Turing Award paper mentioned above, how he felt responsible for the failure of the Elliot 503 Mark II project:
“There was no escape: The entire Elliott 503 Mark II software project had to be abandoned, and with it, over thirty man-years of programming effort, equivalent to nearly one man’s active working life, and I was responsible, both as designer and as manager, for wasting it.”
It can be particularly disturbing to realize that a large number of people have been going off in the wrong direction on your watch. Yes, you wasted their time. But you have to learn what went wrong, internalize it, and commit to never making the same mistake twice. You owe it to them to respond promptly. Everybody on the team will have learned and grown from the circumstances, and if you’re lucky the situation is salvageable. Sometimes it won’t be. But in any case you will gain the respect of many around you by making the right decision; particularly if you’re the only person with the broad technical responsibility, understanding, and insight necessary to make such a decision, people will feel relieved when you make it. And if you don’t make it, people will curse you for it.
In conclusion
I’m sure there are many other laundry lists of skills people might come up with that are necessary to be an effective architect, but these are a few of the things I see and respect in the people I look up to. I’ve named some of these people throughout this article. The most common trait is that they have done great things and left their mark on the industry. Being an architect, in the end, is all about helping others to succeed. If you’re a really good architect, you’ll inspire people and rub off on them. You’ll gain a certain level of respect that is unmistakable and priceless. And that, in my opinion, is far more fulfilling than anything you could accomplish on your own working in a vacuum.
It's been quite some time since I blogged about what I've been reading. That's not because I haven't been reading -- au contraire! -- but rather because I've been busy doing so. I find these posts interesting for myself, so that I can look back and see where my interests were at a particular point in time. Given the sheer number of additions, I can’t properly rate them like I have in the past. Here are the more interesting ones, those that stick out in my mind:
Music
- Theory of Harmony, Arnold Schoenberg. 1922.
- Psychology of Music, Carl E. Seashore. 1938.
- Study of Counterpoint, John J. Fux. 1965.
- The Study of Fugue, Alfred Mann. 1987.
- Counterpoint: The Polyphonic Vocal Style of the Sixteenth Century, Knud Jeppessen. 1992.
- Johann Sebastian Bach: The Learned Musician, Christoph Wolff. 2001.
- Guitar Man: A Six-String Oddyssey, or, You Love that Guitar More than You Love Me, Will Hodgkinson. 2006.
- Musicophilia: Tales of Music and the Brain, Oliver Sacks. 2008.
Mathematics
- Euclid's Elements (Books 1 - 13). 300 BC.
- The Principia : Mathematical Principles of Natural Philosophy, Isaac Newton and Andrew Motte. 1846.
- Introduction to Mathematical Logic, Alonzo Church. 1944.
- Foundations of Algebraic Topology, Samuel Eilenberg and Norman Steenrod. 1952.
- Foundations of Mathematical Logic, Haskell B. Curry. 1963.
- Diophantus Of Alexandria -A Study In The History Of Greek Algebra, Sir Thomas L. Heath. 1964.
- From Zero to Infinity: What Makes Numbers Interesting, Constance Reid. 1964.
- Euclid in the Rainforest: Discovering Universal Truth in Logic and Math, Joseph Mazur. 2006.
- Unknown Quantity: A Real and Imaginary History of Algebra, John Derbyshire. 2007.
- God Created the Integers: The Mathematical Breakthroughs that Changed History, Stephen Hawking. 2007.
- Infinite Ascent: A Short History of Mathematics (Modern Library Chronicles), David Berlinski. 2008.
Computers
- LISP 1.5 Programmer's Manual, John McCarthy. 1962.
- Computation: Finite and Infinite Machines, Marvin Lee Minsky. 1967.
- The Theory of Parsing, Translation, and Compiling (Volume I: Parsing), Alfred V. Aho and Jeffrey D. Ullman. 1972.
- The Theory of Parsing, Translation, and Compiling (Volume II: Compiling), Alfred V. Aho and Jeffrey D. Ullman. 1973.
- Algorithms + Data Structures = Programs, Niklaus Wirth 1976.
- A Discipline of Programming, Edsger W. Dijkstra. 1976.
- Architecture of Concurrent Programs, Per Brinch Hansen. 1977.
- The Elements of Programming Style, Brian W. Kernighan and P. J. Plauger. 1978.
- Mindstorms: Children, Computers, And Powerful Ideas, Seymour Papert. 1980.
- Selected Writings on Computing: A Personal Perspective, Edsger W. Dijkstra. 1982.
- CLU: Reference Manual (Lecture Notes in Computer Science), B. Liskov, et al. 1983.
- Algorithms and Data Structures, Niklaus Wirth. 1985.
- Communicating Sequential Processes, C. A. R. Hoare. 1985.
- The Little LISPer, Third Edition, Daniel P. Friedman and Matthias Felleisen. 1989.
- Common LISP, The Language, Second Edition, Guy Steele. 1990.
- The High Performance FORTRAN Handbook, Charles H. Koelbel, et. Al. 1993.
- 201 Principles of Software Development, Alan M. Davis. 1995.
- Algol-like Languages (Progress in Theoretical Computer Science), Peter O’Hearn and Robert Tennent. 1996.
Based on this list, you might surmise that I read a lot. ;) In fact, I typically have between 3 and 5 books going simultaneously (how parallel of me), so I use the term "read" somewhat nontraditionally. I prefer to absorb the information by immersing myself in many books in the same genre simultaneously, instead of committing to a single one. This seems to be effective, but is also slightly odd and perhaps quite esoteric to other people; the result is that every room in my home is littered with books each in some possibly long-forgotten state of being "read" (along with tattered academic papers, language manuals, etc). I like it, but some people believe this is an indication that I’m a tad insane. C’est la vie.
 Wednesday, October 01, 2008
The October 2008 MSDN Magazine issue just went live with 5 articles on concurrency, plus the editor's note. Four of the articles are written by members of the Parallel Computing team here at Microsoft, including one by me:
Enjoy the text, and be careful not to overdose on the excess of parallelism goodness. This edition was timed intentionally to coincide with the PDC. I'm hoping to see you there, because we have a plethora of exciting things to show, spanning managed .NET and native C++ programming. These articles are really just teasers.
 Friday, September 26, 2008
I just returned from TechEd Australia, which was a lot of fun.
I have a fair number of additional speaking engagements coming up:
As of the PDC the book will also be readily available. Wahoo!
If you'll be at any of the conferences and want to meet up, please drop me a line.
 Sunday, September 21, 2008
The enumeration pattern in .NET unfortunately implies some overhead that makes it difficult to compete with ordinary for loops. In fact, the difference between
T[] a = …;
for (int i = 0, c = a.Length; i < c; i++) …action(a[i])…;
and
T[] a = …;
IEnumerator<int> ae = ((IEnumerable<T>)a).GetEnumerator();
while (ae.MoveNext()) …action(ae.Current)…;
is about 3X. That is, the former is 1/3rd the expense of the latter, in terms of raw enumeration overhead. Clearly as action becomes more expensive the significance of this overhead lessens. But if your plan is to invoke a small action over a large number of elements, using an enumerator instead of indexing directly into the array could in fact cause your algorithm to take 3X longer to finish.
There are many reasons for this problem. They are probably obvious. Using an enumerator requires at least two interface method calls just to extract a single element from the array. Because there are O(length) number of these operations, the overhead imposed will be O(length) as well. Contrast that with the nice, compact for loop, which emits ldarg IL instructions that access the array directly. This will end up computing some offset (e.g., i * sizeof(T)) and dereferencing right into the array memory. The enumerator needs to do that, of course, but only after the two interface calls are made. Additionally, it is possible for the JIT compiler to omit the bounds check on the array access if it knows ‘c’ in the predicate ‘i < c’ was computed from ‘a.Length’, because arrays in .NET are immutable and their size cannot change.
(Strangely, it appears going through IList<T> is even slower than enumeration. In fact, it appears to be more than 3X the cost of going through IList<T>’s enumerator, and over 10X that of indexing into the array using true ldarg instructions instead of interface calls to IList<T>’s element indexer.)
All of this actually makes it somewhat difficult for those on my team building PLINQ to compete with hand written programs. That’s true of LINQ generally. In fact, LINQ tends to be worse, because you string several enumerators together to form a query, often leading to even more overhead attributed to enumeration. So you might reasonably wonder: if people care about performance, then why would they willingly start off 3X “in the hole” in hopes that they will eventually gain it back when they use machines with >= 4 cores? It’s a completely fair criticism (although you must recall that everything I’m talking about is “pure overhead” and once you begin to have sizable computations in the per-element action it matters less and less). We continually do a lot of work to try to recoup these costs.
There are actually many alternative enumeration models, and I think .NET needs to change direction in the future. In addition to the overhead associated with the pattern, .NET’s enumeration pattern is a “pull” model (versus “push”), which makes it incredibly hard to tolerate blocking within calls to MoveNext. Over time, I think we will need to pursue the push model more seriously.
I’ve thrown together a few different examples of alternative enumeration techniques. To cut to the chase, here is a simple micro-benchmark test that enumerates over 1,000,000 elements 25 times, invoking an empty (non-inlineable) method for each element. The per-element work here is quite small (although not empty) and so the results are a bit more extreme than a real workload would show:
For loop (int[]) 739255 tcks % of baseline
For loop (IList<int>) 7534609 tcks 1019.216%
ForEach loop (int[]) 829617 tcks 112.2234%
int[] IEnumerator<int> 2152414 tcks 291.1599%
IEnumerator<int> 2062876 tcks 279.048%
IFastEnumerator<int> 1758992 tcks 237.9412%
IForEachable<int> [s] 1103745 tcks 149.305%
IForEachable<int> [i] 976742 tcks 132.1252%
IForEachable2<int> 957883 tcks 129.5741%
These are:
- “For loop (int[])” is an ordinary for loop over the array directly.
- “For loop (IList<int>)” is an ordinary for loop over the array’s IList<T> interface.
- “ForEach loop (int[])” is an ordinary foreach loop over the array directly.
- “int[] IEnumerator<int>” uses the array’s implementation of IEnumerator<T>.
- “IEnumerator<int>” is a custom IEnumerator<T> implementation.
- “IFastEnumerator<int>” is an implementation of new pull interface (defined below).
- “IForEachable<int>” is an implementation of a new push interface (defined below) that uses delegates to represent the per-element action. The only difference between the “[s]” and “[i]” variants are that the delegate is bound to a static method for “[s]” and an instance method for “[i]”.
- “IForEachable2<int>” is a slight variant of IForEachable<T> (also defined below).
Notice that with IForEachable2<T>, we’ve gotten within 30% of the efficient for loop. Unfortunately, I do get somewhat different numbers when compiling with the /o+ switch:
For loop (int[]) 777746 tcks % of baseline
For loop (IList<int>) 7569517 tcks 973.2634%
ForEach loop (int[]) 735846 tcks 94.61264%
int[] IEnumerator<int> 2340361 tcks 300.9159%
IEnumerator<int> 2063039 tcks 265.2587%
IFastEnumerator<int> 1806568 tcks 232.2825%
IForEachable<int> [s] 1090644 tcks 140.2314%
IForEachable<int> [i] 946090 tcks 121.6451%
IForEachable2<int> 1234201 tcks 158.6895%
For comparison purposes, I get numbers like this if the loop body is completely empty except for accessing the current element:
For loop (int[]) 452039 tcks % of baseline
For loop (IList<int>) 422732 tcks 93.51671%
ForEach loop (int[]) 461274 tcks 102.043%
int[] IEnumerator<int> 1958711 tcks 433.3058%
IEnumerator<int> 1730502 tcks 382.8214%
IFastEnumerator<int> 1372421 tcks 303.6068%
IForEachable<int> [s] 1091720 tcks 241.5101%
IForEachable<int> [i] 958401 tcks 212.0173%
IForEachable2<int> 664572 tcks 147.0165%
And this (with /o+):
For loop (int[]) 262146 tcks % of baseline
For loop (IList<int>) 263302 tcks 100.441%
ForEach loop (int[]) 372924 tcks 142.2581%
int[] IEnumerator<int> 1889132 tcks 720.6412%
IEnumerator<int> 1635837 tcks 624.0175%
IFastEnumerator<int> 1479579 tcks 564.4103%
IForEachable<int> [s] 1096712 tcks 418.3592%
IForEachable<int> [i] 962261 tcks 367.0706%
IForEachable2<int> 698340 tcks 266.3935%
These numbers aren’t quite as meaningful because we have no idea what’s being optimized away by the C# and JIT compilers. For example, they may notice we’re not using the current element at all and therefore eliminate the access altogether. Nevertheless, the relative ranking of efficiency has remained nearly the same (with the notable exception of the array’s IList<T> test being much less worse).
(All of these numbers were gathered on a 32-bit OS on a 64-bit machine. Because the JIT compilers for 32-bit and 64-bit are so different, you can expect vastly different results across architectures.)
Anyway, here is what IFastEnumerator<T>, IForEachable<T>, and IForEachable2<T> look like:
interface IFastEnumerable<T>
{
IFastEnumerator<T> GetEnumerator();
}
interface IFastEnumerator<T>
{
bool MoveNext(ref T elem);
}
interface IForEachable<T>
{
void ForEach(Action<T> action);
}
interface IForEachable2<T>
{
void ForEach(Functor<T> functor);
}
abstract class Functor<T>
{
public abstract void Invoke(T t);
}
I also have a data type called SimpleList<T> that implements each of these, including IEnumerable<T>. This is what the test harness uses for its benchmarking. So any boneheaded mistakes I’ve made in the implementation of this class could cause us to draw the wrong conclusions about the interfaces themselves. Hopefully there are none:
class SimpleList<T> :
IEnumerable<T>, IFastEnumerable<T>, IForEachable<T>, IForEachable2<T>
{
private T[] m_array;
public SimpleList(T[] array) { m_array = array; }
// Etc …
}
The class of course implements IEnumerable<T> in the standard way:
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return new ClassicEnumerable(m_array);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return new ClassicEnumerable(m_array);
}
class ClassicEnumerable : IEnumerator<T>
{
private T[] m_a;
private int m_index = -1;
internal ClassicEnumerable(T[] a) { m_a = a; }
public bool MoveNext() { return ++m_index < m_a.Length; }
public T Current { get { return m_a[m_index]; } }
object System.Collections.IEnumerator.Current { get { return Current; } }
public void Reset() { m_index = -1; }
public void Dispose() { }
}
The idea behind IFastEnumerable<T> (and specifically IFastEnumerator<T>) is to return the current element during the call to MoveNext itself. This cuts the number of interface method calls necessary to enumerate a list in half. The impact to performance isn’t huge, but it was enough to cut our overhead from about 3X to 2.3X. Every little bit counts:
IFastEnumerator<T> IFastEnumerable<T>.GetEnumerator()
{
return new FastEnumerable(m_array);
}
class FastEnumerable : IFastEnumerator<T>
{
private T[] m_a;
private int m_index = -1;
internal FastEnumerable(T[] a) { m_a = a; }
public bool MoveNext(ref T elem)
{
if (++m_index >= m_a.Length)
return false;
elem = m_a[m_index];
return true;
}
}
(Update: after writing the blog post, I made a couple slight optimizations that make this a bit tighter (fewer field fetches):
class FastEnumerable : IFastEnumerator<T>
{
private T[] m_a;
private int m_index = -1;
internal FastEnumerable(T[] a) { m_a = a; }
public bool MoveNext(ref T elem)
{
T[] a = m_a;
int i;
if ((i = ++m_index) >= a.Length)
return false;
elem = a[i];
return true;
}
}
The impact to performance isn't huge, but does improve the performance to about 2.1X of the baseline.)
The IForEachable<T> interface is a push model in the sense that the caller provides a delegate and the ForEach method is responsible for invoking it once per element in the collection. ForEach doesn’t return until this is done. In addition to having far fewer method calls to enumerate a collection, there isn’t a single interface method call. Delegate dispatch is also much faster than interface method dispatch. The result is nearly twice as fast as the classic IEnumerator<T> pattern (when /o+ isn’t defined). Now we’re really getting somewhere!
void IForEachable<T>.ForEach(Action<T> action)
{
T[] a = m_array;
for (int i = 0, c = a.Length; i < c; i++)
action(a[i]);
}
Delegate dispatch still isn’t quite the speed of virtual method dispatch. And delegates bound to static methods are actually slightly slower than those bound to instance methods, which is why you’ll notice a slight difference in the original “[s]” versus “[i]” measurements. The reason is subtle. There is a delegate dispatch stub that is meant to call the target method: when the delegate refers to an instance method, the ‘this’ reference pushed in EAX points to the delegate object when it is invoked and the stub can simply replace it with the target object and jump; for static methods, however, all of the arguments need to be “shifted” downward, because there is no ‘this’ reference to be passed and therefore the first actual argument to the static method must take the place of the current value in EAX.
The IForEachable2<T> interface just replaces delegate calls with virtual method calls. Somebody calling it will pass an instance of the Functor<T> class with the Invoke method overridden. The implementation of ForEach then looks quite a bit like IForEachable<T>’s, just with virtual method calls in place of delegate calls:
void IForEachable2<T>.ForEach(Functor<T> functor)
{
T[] a = m_array;
for (int i = 0, c = a.Length; i < c; i++)
functor.Invoke(a[i]);
}
And that’s it. Here is the program that drives the little micro-benchmark tests that I showed output for at the beginning:
class Program
{
public static void Main()
{
const int size = 2500000;
Random r = new Random();
int[] array = new int[size];
for (int i = 0; i < size; i++) array[i] = r.Next();
SimpleList<int> list = new SimpleList<int>(array);
const int iters = 25;
long baseline = 0;
GC.Collect();
GC.WaitForPendingFinalizers();
// Regular for loop
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
for (int j = 0, c = array.Length; j < c; j++)
DoNothing(array[j]);
}
baseline = sw.ElapsedTicks;
Console.WriteLine("For loop (int[])\t{0} tcks\t% of baseline", baseline);
}
// Regular for loop (IList<int>)
{
Stopwatch sw = Stopwatch.StartNew();
IList<int> ia = array;
for (int i = 0; i < iters; i++)
{
for (int j = 0, c = ia.Count; j < c; j++)
DoNothing(ia[j]);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("For loop (IList<int>)\t{0} tcks\t{1}%",
elapsed, 100*(elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// Regular foreach loop
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
foreach (int x in array)
DoNothing(x);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("ForEach loop (int[])\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// Regular foreach loop
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
IEnumerator<int> e = ((IEnumerable<int>)array).GetEnumerator();
while (e.MoveNext())
DoNothing(e.Current);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("int[] IEnumerator<int>\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
// IEnumerator<T>
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
IEnumerator<int> e = ((IEnumerable<int>)list).GetEnumerator();
while (e.MoveNext())
DoNothing(e.Current);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("IEnumerator<int>\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// IFastEnumerator<T>
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
int x = 0;
IFastEnumerator<int> e = ((IFastEnumerable<int>)list).GetEnumerator();
while (e.MoveNext(ref x))
DoNothing(x);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("IFastEnumerator<int>\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// IForEachable<T>
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
Action<int> act = new Action<int>(DoNothing);
((IForEachable<int>)list).ForEach(act);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("IForEachable<int> [s]\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// IForEachable<T>
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
DoNothingClosure dnc = new DoNothingClosure();
Action<int> act = new Action<int>(dnc.DoNothing);
((IForEachable<int>)list).ForEach(act);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("IForEachable<int> [i]\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
GC.Collect();
GC.WaitForPendingFinalizers();
// IForEachable2<T>
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
{
DoNothingFunctor dnf = new DoNothingFunctor();
((IForEachable2<int>)list).ForEach(dnf);
}
long elapsed = sw.ElapsedTicks;
Console.WriteLine("IForEachable2<int>\t{0} tcks\t{1}%",
elapsed, 100 * (elapsed / (float)baseline));
}
}
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
private static void DoNothing(int x) { }
class DoNothingClosure
{
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
public void DoNothing(int x) { }
}
class DoNothingFunctor : Functor<int>
{
public override void Invoke(int x) { DoNothing(x); }
}
}
To summarize, .NET enumeration costs something over typical for loops that index straight into arrays. Most programs needn’t worry about these kinds of overheads. If you’re accessing a database, manipulating a large complicated object, or what have you, inside of the individual iterations, then the overheads we’re talking about here are miniscule. In fact, walking 1,000,000 elements is in the microsecond range for all of the benchmarks I showed, even the slowest ones. So none of this is anything to lose sleep over. But if you have a closed system that controls all of its enumeration, it may be worth doing some targeted replacement of enumerators with the more efficient patterns, particularly if you tend to enumerate lots and lots of elements lots and lots of times in your program.
 Wednesday, September 17, 2008
In part 2 of this series, I described a new work stealing queue data structure used for work item management. This structure allows us to push and pop elements into a thread-local work queue without heavy-handed synchronization. Moreover, this distributed a large amount of the scheduling responsibility across the threads (and hence processors). The result is that, for recursively queued work items, scalability is improved and pressure on the typical bottleneck in a thread pool (i.e., the global lock) is alleviated.
What we didn’t do last time was actually integrate the new queue into the thread pool that was shown in part 1. This extension is actually somewhat simple. We’ll continue to use the IThreadPool interface so that we can easily harness and benchmark the various thread pool implementations against each other.
We’ll add a new class LockAndWsqThreadPool, which mimics the design of the original SimpleLockThreadPool class. We’ll only need to add two fields to it:
- private WorkStealingQueue<WorkItem>[] m_wsQueues: This is an array of queues—one per thread in the pool—that will be used to store recursively queued work.
- [ThreadStatic] private static WorkStealingQueue<WorkItem> m_wsq: This represents the unique work stealing queue for a particular thread in the pool.
OK, so with these extensions there are clearly three specific changes we need to make:
- A new thread pool thread needs to allocate its work stealing queue.
- When queuing a new work item, we must check to see if we’re on a pool thread. If so, we will queue the work item into the work stealing queue instead of the global queue.
- When a pool thread looks for work, it needs to:
- First consult its local work stealing queue.
- If that fails, it then looks at the global queue.
- Lastly, if that fails, it needs to steal from other work stealing queues.
Let’s review each one individually. Later we’ll see the full code.
#1 is handled in the DispatchLoop function:
private WorkStealingQueue<WorkItem>[] m_wsQueues = new WorkStealingQueue<WorkItem>[Environment.ProcessorCount];
private void DispatchLoop() { // Register a new WSQ. WorkStealingQueue<WorkItem> wsq = new WorkStealingQueue<WorkItem>(); m_wsq = wsq; // Store in TLS. AddWsq(wsq);
try { /* a whole bunch of stuff … */ } finally { Remove(wsq); } }
private void AddWsq(WorkStealingQueue<WorkItem> wsq) { lock (m_wsQueues) { for (int i = 0; i < m_wsQueues.Length; i++) { if (m_wsQueues[i] == null) { m_wsQueues[i] = wsq; } else if (i == m_wsQueues.Length - 1) { WorkStealingQueue<WorkItem>[] queues = new WorkStealingQueue<WorkItem>[m_wsQueues.Length*2]; Array.Copy(m_wsQueues, queues, i+1); queues[i+1] = wsq; m_wsQueues = queues; } } } } private void RemoveWsq(WorkStealingQueue<WorkItem> wsq) { lock (m_wsQueues) { for (int i = 0; i < m_wsQueues.Length; i++) { if (m_wsQueues[i] == wsq) { m_wsQueues[i] = null; } } } }
#2, of course, happens within the QueueUserWorkItem function:
public void QueueUserWorkItem(WaitCallback work, object obj) { WorkItem wi = …; /* as before … */
// Now insert the work item into the queue, possibly waking a thread. WorkStealingQueue<WorkItem> wsq = m_wsq; if (wsq != null) { // Single TLS to determine if we're on a pool thread. wsq.LocalPush(wi); if (m_threadsWaiting > 0) // OK to read lock-free. lock (m_queue) { Monitor.Pulse(m_queue); } } else { /* as before… queue to the global queue */ } }
Lastly, #3 is the most complicated. Searching the local queue is done with a call to wsq.LocalPop. If that fails, the work stealing queue is empty, and the logic then looks a lot like the original thread pool’s dispatch loop logic in that we then look for work in the global queue. If that fails, we will just iterate over the other threads’ work stealing queues, doing a TrySteal operation. If none of them had work, we go back the global queue, try again, and then finally wait for work to arrive. (See the full code sample below for details.) Notice that there’s a fairly tricky race condition here that we’re leaving unhandled: if we search for work, try to steal, and ultimately find no work, we will then embark on a trip back to the global queue; during this trip, another pool thread might recursively queue work into its work stealing queue and we will miss it. Generally speaking, this is OK because that thread will eventually get to it (presumably) but with some clever synchronization trickery we can actually handle this case. Perhaps I will show such a solution in a subsequent part in this series.
Anyway, what we’re left with is code that looks something like this:
public class LockAndWsqThreadPool : IThreadPool { // Constructors-- // Two things may be specified: // ConcurrencyLevel == fixed # of threads to use // FlowExecutionContext == whether to capture & flow ExecutionContexts for work items public LockAndWsqThreadPool() : this(Environment.ProcessorCount, true) { } public LockAndWsqThreadPool(int concurrencyLevel) : this(concurrencyLevel, true) { } public LockAndWsqThreadPool(bool flowExecutionContext) : this(Environment.ProcessorCount, flowExecutionContext) { } public LockAndWsqThreadPool(int concurrencyLevel, bool flowExecutionContext) { if (concurrencyLevel <= 0) throw new ArgumentOutOfRangeException("concurrencyLevel"); m_concurrencyLevel = concurrencyLevel; m_flowExecutionContext = flowExecutionContext; // If suppressing flow, we need to demand permissions. if (!flowExecutionContext) new SecurityPermission(SecurityPermissionFlag.Infrastructure).Demand(); } // Each work item consists of a closure: work + (optional) state obj + context. struct WorkItem { internal WaitCallback m_work; internal object m_obj; internal ExecutionContext m_executionContext; internal WorkItem(WaitCallback work, object obj) { m_work = work; m_obj = obj; m_executionContext = null; } internal void Invoke() { // Run normally (delegate invoke) or under context, as appropriate. if (m_executionContext == null) m_work(m_obj); else ExecutionContext.Run(m_executionContext, s_contextInvoke, this); } private static ContextCallback s_contextInvoke = delegate(object obj) { WorkItem wi = (WorkItem)obj; wi.m_work(wi.m_obj); }; } private readonly int m_concurrencyLevel; private readonly bool m_flowExecutionContext; private readonly System.Collections.Queue m_queue = new System.Collections.Queue(); private WorkStealingQueue<WorkItem>[] m_wsQueues = new WorkStealingQueue<WorkItem>[Environment.ProcessorCount]; private Thread[] m_threads; private int m_threadsWaiting; private bool m_shutdown; [ThreadStatic] private static WorkStealingQueue<WorkItem> m_wsq; // Methods to queue work. public void QueueUserWorkItem(WaitCallback work) { QueueUserWorkItem(work, null); } public void QueueUserWorkItem(WaitCallback work, object obj) { WorkItem wi = new WorkItem(work, obj); // If execution context flowing is on, capture the caller's context. if (m_flowExecutionContext) wi.m_executionContext = ExecutionContext.Capture(); // Make sure the pool is started (threads created, etc). EnsureStarted(); // Now insert the work item into the queue, possibly waking a thread. WorkStealingQueue<WorkItem> wsq = m_wsq; if (wsq != null) { // Single TLS to determine if we're on a pool thread. wsq.LocalPush(wi); if (m_threadsWaiting > 0) // OK to read lock-free. lock (m_queue) { Monitor.Pulse(m_queue); } } else { lock (m_queue) { m_queue.Enqueue(wi); if (m_threadsWaiting > 0) Monitor.Pulse(m_queue); } } } // Ensures tha threads have begun executing. private void EnsureStarted() { if (m_threads == null) { lock (m_queue) { if (m_threads == null) { m_threads = new Thread[m_concurrencyLevel]; for (int i = 0; i < m_threads.Length; i++) { m_threads[i] = new Thread(DispatchLoop); m_threads[i].Start(); } } } } } private void AddWsq(WorkStealingQueue<WorkItem> wsq) { lock (m_wsQueues) { for (int i = 0; i < m_wsQueues.Length; i++) { if (m_wsQueues[i] == null) { m_wsQueues[i] = wsq; } else if (i == m_wsQueues.Length - 1) { WorkStealingQueue<WorkItem>[] queues = new WorkStealingQueue<WorkItem>[m_wsQueues.Length*2]; Array.Copy(m_wsQueues, queues, i+1); queues[i+1] = wsq; m_wsQueues = queues; } } } } private void RemoveWsq(WorkStealingQueue<WorkItem> wsq) { lock (m_wsQueues) { for (int i = 0; i < m_wsQueues.Length; i++) { if (m_wsQueues[i] == wsq) { m_wsQueues[i] = null; } } } } // Each thread runs the dispatch loop. private void DispatchLoop() { // Register a new WSQ. WorkStealingQueue<WorkItem> wsq = new WorkStealingQueue<WorkItem>(); m_wsq = wsq; // Store in TLS. AddWsq(wsq); try { while (true) { WorkItem wi = default(WorkItem); // Search order: (1) local WSQ, (2) global Q, (3) steals. if (!wsq.LocalPop(ref wi)) { bool searchedForSteals = false; while (true) { lock (m_queue) { // If shutdown was requested, exit the thread. if (m_shutdown) return; // (2) try the global queue. if (m_queue.Count != 0) { // We found a work item! Grab it ... wi = (WorkItem)m_queue.Dequeue(); break; } else if (searchedForSteals) { m_threadsWaiting++; try { Monitor.Wait(m_queue); } finally { m_threadsWaiting--; } // If we were signaled due to shutdown, exit the thread. if (m_shutdown) return; searchedForSteals = false; continue; } } // (3) try to steal. WorkStealingQueue<WorkItem>[] wsQueues = m_wsQueues; int i; for (i = 0; i < wsQueues.Length; i++) { if (wsQueues[i] != wsq && wsQueues[i].TrySteal(ref wi)) break; } if (i != wsQueues.Length) break; searchedForSteals = true; } } // ...and Invoke it. Note: exceptions will go unhandled (and crash). wi.Invoke(); } } finally { RemoveWsq(wsq); } } // Disposing will signal shutdown, and then wait for all threads to finish. public void Dispose() { m_shutdown = true; if (m_queue != null) { lock (m_queue) { Monitor.PulseAll(m_queue); } for (int i = 0; i < m_threads.Length; i++) m_threads[i].Join(); } } }
I have a little harness that measures the throughput of the different thread pool implementations for varying degrees of recursively queued work. I’ll share this out too in a subsequent part in this series, once we have a few more variants to pit against each other. Anyway, as you’d imagine, there is very little difference between LockAndWsqThreadPool and SimpleLockThreadPool when all work is queued from external (non-pool) threads. However, when I queue 10,000 items externally and, from each of those, queue 100 items recursively, I see a 3X throughput improvement on my four core machine. When I queue 100 items externally and, from each of those, queue 10,000 items recursively, the improvement is more than 8X. And so on. As the number of cores increases, the improvement only becomes greater.
Another aspect not shown—because of the very limited QueueUserWorkItem-style API we’re building on—is something called “wait inlining.” We do this in TPL. When you recursively queue work items in a divide-and-conquer kind of problem, there’s often more latent parallelism than will be realized. Instead of requiring all of that parallelism to consume a thread, and blocking each time a work item is waited on, we can run work items inline if they haven’t started yet.
One easy way to do this is to limit inlining to only threads that do so from their own local work stealing queue. Because we are guaranteed the local pop/push methods won’t interleave with such inlines, we can just acquire the stealing lock and search the list for the particular element, e.g.:
public bool Remove(T obj) { for (int i = m_tailIndex - 1; i > m_headIndex; i--) { if (m_array[i & m_mask] == obj) { lock (m_foreignLock) { if (m_array[i & m_mask] != obj) return false; // lost a race.
// Adjust indices or leave a null in our wake. if (i == m_tailIndex - 1) m_tailIndex--; else if (i == m_headIndex + 1) m_headIndex++; else m_array[i & m_mask] = null;
return true; } }
return false; } }
This is just a new method on the WorkStealingQueue<T> data structure. This requires that the local and foreign pop methods now check for null values and restart the relevant operation should one be found, because of the work item to be removed is not the head or tail item we cannot prevent subsequent removals from seeing it (i.e., the indices must remain the same).
Next time, in part 4 of this series, we’ll take a look at what it takes to share threads among multiple instances of the LockAndWsqThreadPool class. This allows many pools to be created within a single AppDomain without requiring entirely separate sets of threads to service each one of them. This capability enables you to isolate different work queues from one another, to ensure that certain components aren’t starved by other (potentially misbehaving) ones.
|
|
Recent Entries:
Search:
Browse by Date:
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat | | 26 | 27 | 28 | 29 | 30 | 31 | 1 | | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | 9 | 10 | 11 | 12 | 13 | 14 | 15 | | 16 | 17 | 18 | 19 | 20 | 21 | 22 | | 23 | 24 | 25 | 26 | 27 | 28 | 29 | | 30 | 1 | 2 | 3 | 4 | 5 | 6 |
Browse by Category:
Notables:
|