RSS 2.0

Personal Info:

Joe Send mail to the author(s) leads the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language. His current mission is to enable writing large-scale software that is reliable, secure, and scalable by-construction. Before this, Joe founded the Parallel Extensions to .NET project. He has been granted 19 patents, with 49 pending. When not working, Joe enjoys travelling with his wife, writing books, writing music, studying music theory & mathematics, and doing anything involving food & wine.

My books

My music

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2012, Joe Duffy

 
 Wednesday, December 14, 2005

Charles, dude, I still read Programming Windows from beginning to comatose. Frequently.

Maybe I'm a dying breed.

Will anybody actually read my book from beginning to comatose? I hope so.

12/14/2005 11:00:44 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, December 13, 2005

My recent {End Bracket} column, Transactions for Memory, shipped in the January MSDN Magazine. It's now been posted online: http://msdn.microsoft.com/msdnmag/issues/06/01/EndBracket/.

It's admittedly just a teaser, but hopefully strikes a good balance between hand-waviness and a useful explanation of the core ideas.

 

12/13/2005 9:46:18 AM (Pacific Standard Time, UTC-08:00)  #   

 Saturday, December 10, 2005

A wise man once said a picture is worth a thousand words.

Alas, for those authors hoping to fill pages of text with fluffy pictures I bear sad news: A sufficiently fluffy 4" tall illustration takes up only a measly 239 words worth of space.

Perhaps a picture is worth 239 words is more accurate. (Or alternatively, 4.18 pictures are worth a thousand words.)

12/10/2005 5:10:23 PM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, December 01, 2005

Lots of people try to roll their own thread-pool. Many people have different (good) reasons for doing so.

If you're one of these people, please tell me why. Either leave me a comment or send me an email at joedu@microsoft.com.

But if you're interested in performance, getting a good heuristic isn't as easy as you might think. The goal of such a heuristic is to have one runnable thread per hardware thread at any given moment. (A HT thread isn't equal to a full thread, but for sake of conversation let's pretend it is.) Acheiving this goal is much more complicated than it sounds.

  • If you have a task sitting in front of you, it's hard to intelligently determine whether scheduling it on another thread is the right thing to do. It might be quicker just to execute it synchronously on the current thread. When is that the case? When the current number of running threads is equal to or greater than the number of hardware threads. And any decisions must be made statistically, because presumably concurrent tasks could be contemplating new work simultaneously.
  • Remember I said running threads. If you have blocked threads, they are not making use of the CPU and thus need to be be considered differently in the heuristic. Just a count of threads isn't enough. If you have 16 tasks, 8 hardware threads, and statistically 50% of those tasks will be blocked at any given quantum, you want 16 real threads. If they block 75% of the time, you want 24. And so forth.
  • You aren't the only code on the machine. Another process could be happily hogging as many threads as there are hardware threads, in which case your algorithm just got twice as bad (or half as good) as it was originally. This type of global data is hard to come by. (I should note that most machines have more than 2 processes running simultaneously. I currently have 67 processes running with 605 total threads. That's an average of ~9 threads per process. Clearly this is a real concern.)

Scheduling a task on another thread is costly. Why? For a number of reasons.

  • Because unless you have ample hardware resources to run it, this implies at least one context switch to swap the work in. If it runs longer than that, it means many more. If you have more than one long running tasks competing for the same hardware thread, it means they will continually thrash the thread context in an attempt to make forward progress. As Larry puts it so eloquently, "...Context switches are BAD. They represent CPU time that the application could be spending on working for the customer."
  • And not only that (and perhaps worse), you're going to mess with the cache hierarchy. Your program might be happily working on conflict-free cache-lines, CASing right in the local cache without locking the bus, and then boom: You pass a pointer to an object to another thread (e.g. on the thread-pool), it pulls in the same lines of cache, and then you're both contending for the same lines back and forth. Your good locality goes right out the window and becomes a tax instead of a blessing. This sort of cache thrashing can kill good performance and scaling.
  • Lastly, threads aren't free you know. Just having one around consumes 1MB of reserved stack space (0.5MB in SQL Server). Same goes for fibers.

Some people are interested in using thread-pools for other purposes. (That, is: not performance.) They might want to manage a pool of work items, for example, which get scheduled fairly with respect to each other (in the fine-grained sense). No one task will complete very quickly during saturation, but at least each is guaranteed to move forward. A newly enqueued item won't sit festering in the queue while an older item continues bumbling along towards its goal. And sometimes, priorities must be used to evict lesser priority tasks when a higher priority task gets enqueued. These are all perfect cases where user-mode scheduling makes sense. Co-routines or (*cough, cough*) perhaps fibers could be used. Using threads for this simply adds way too much overhead.

Clearly getting this right is difficult. But the consequence of getting it horribly wrong today isn't too bad. (Although really crappy algorithms are noticeable.) When you only have 1-4 hardware threads on the average high-end machine, the difference between a great heuristic and a poor one isn't significant. That will change.

12/1/2005 11:30:04 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, November 30, 2005

Classic.

If you're bored, read any of these papers. I wish I were smart too.

11/30/2005 9:05:31 PM (Pacific Standard Time, UTC-08:00)  #   

 Sunday, November 27, 2005

Each Windows thread has a Thread Environment Block (i.e. TEB) which is a block of user-mode memory pointed at and reserved for use by the Windows kernel Thread data structure (KTHREAD). In addition to basic OS information like the active SEH filter chain, stack base and limit, and owned critical sections, applications can easily stash data into and retrieve data out of the Thread Local Storage (TLS) area of the TEB. This is done using the Win32 TlsAlloc, TlsGetValue, TlsSetValue, and TlsFree functions. You can view the TEB via the kernel debugger's !thread command.

(The CLR of course offers TLS functionality too, i.e. using ThreadStatics and the System.Threading.Thread's AllocateDataSlot, SetData, and GetData functions. This information does go into the TEB, but it is managed by the CLR. A call to SetData does not translate directly to a call to TlsSetValue.)

Win32--and Windows in general--makes liberal use of thread-local memory. I noted a few uses above (e.g. exception handlers) which are pervasive. Such usage creates an implicit affinity between the workload running on the thread and the physical OS thread itself. What do I mean by affinity? Simply that the work executing on a thread must continue executing on that exact physical thread for it to remain correct. This affinity isn't documented consistently nor is it easy to detect. You might be able to weasel around it by chance. But it makes it extraordinarily difficult to transfer logical work from one physical thread to another.

Imagine what would happen if we made a call to some Win32 function and then decided to swap out the logical work so that we could install new work. SetLastError might have been used to communicate a failure in a function called on either the thread the work is being swapped out of, or the destination once it gets rescheduled. But SetLastError installs the error information into the TEB. GetLastError will then either fail to retrieve information or, more likely, will retrieve somebody else's information, either of which would lead to all sorts of serious problems. Similar issues can happen if we (foolishly) tried to swap out a thread that owned a critical section, or some other thread-specific resource (like a mutex).

This is one major reason why fibers are still problematic as a general task scheduling solution for Windows. And it's a challenge if you even want to consider user-mode scheduling a la continuations. You just can't get around the platform's hidden thread affinity. We've done much better in managed code. Over time we are trying to use ExecutionContext as the currency for logical context information, which can be easily captured and restored by the runtime. But there are examples where we violate this (e.g. monitors), where we use the physical OS thread as the context (be fair: we do notify hosts of such situations via Thread.Begin/EndThreadAffinity).

But you can't escape the fact that the runtime itself is built right on top of Win32.

11/27/2005 9:50:32 PM (Pacific Standard Time, UTC-08:00)  #   

 Friday, November 25, 2005

I joined Microsoft mid-way through Whidbey's lifecycle. Mid-way means post-feature development, for the most part. There were plenty of unplanned features that I got to design and work on, but those were handled quite differently than the initial development process. The impression I formed during this period was that of a very regimented, structured, and process-heavy software engineering practice. Clearly this is good to ensure people don't screw up too badly, but it also places unnecessary constraints on your best talent.

Or, as Paul Graham said in his essay Hackers & Painters:

Big companies want to decrease the standard deviation of design outcomes because they want to avoid disasters. But when you damp oscillations, you lose the high points as well as the low.

At first, I thought this was necessary for the type of project I was working on, when compared to the projects I'd worked on in the past. But now that 2.0 is out the door, I'm enjoying myself quite a bit more.

Scrounging for change

Planning the direction for future releases is clearly a complex game, consisting of a mixture of top-down (where must the business go?) and bottom-up (what features do we want to do?) analysis. Customer needs come from all directions. At some point, somebody presumably must pull the trigger and unleash the team in a concrete and coordinated direction. Some don't have the stomache for this, but without it paralysis analysis sets in.

Of course, direction is a funny thing. It typically emerges over time rather than being planned explicitly, whether those doing the planning realize it or not. This particular case is no different. Before we even shipped Whidbey Beta2, we knew the primary focuses for Orcas, even down to the feature level in many cases. I suspect most people are already half-way down the path we'll eventually go, e.g. because they've been dreaming of and prototyping the new features they would like to implement for well over a year already.

But (in theory) somebody has to capture that, refine it, and communicate it to form a shared understanding. Presumably for purposes of making sure management is OK with what everybody wants to implement. But of course, what everybody wants to implement must first be turned into market segmentation and value propositions. OK, that statement is a tad cynical. Although I'm sure the process in this case will ferret out some thought bugs before they get put into code--which is clearly a good thing--I can't help but wonder whether the cost is worth the benefit. This deserves separate analysis, of course.

Flying under the radar

Planning aside, the projects I am most interested in as we move into the next few releases are the tiny incubation efforts. These are ordinarily small groups of individuals from across the company (including product teams and research). Such groups are a diverse mix of people with different backgrouns and goals, yet are drawn together because of a shared interest. If people are responsible for allocating their own time "on the side" to work on something, you can safely bet they are passionate about it and (more than) qualified to work on it. Being united by a shared interest can lead to fun collaboration and great end results. Generics is an example of a (big) project that evolved this way.

In many cases, there is no clear support from management in terms of funding for these projects. A head nod about its importance is about all you'll usually get at first. And in fact, often such wild-west efforts can go against the spirit of the planning. In terms of funding, research is funded to research it and obviously product teams can communicate with research. But for product teams to get something worthy of the productization stamp of approval (that's a vigorous management head-nod), clearly there's some level of prototyping that is needed. But simultaneously, most capable developers are focusing on staying inside the bounds of the aforementioned processes and fixing bugs. This is a slight catch-22.

Regardless of tedious funding problems, I enjoy these efforts the most. The path is not pre-defined--we must find it--and the ability for one individual to contribute substantially is very high. They feel like start-ups. But you don't have to worry about paying the bills and recruiting the best talent (the perks of working at a place like Microsoft). Many such projects I've been involved in have been primarily thought exercises. But a few have recently been given some level of funding. The groups of people that are working on them are, as noted above, often genuinely interested in what's being built. This is great. Very little process is needed...only enough to ensure we hit the deadlines for integration with the main product, and to report back to management to make sure they feel comfortable.

Only time will tell whether this approach will lead to better results. But something seems intuitively correct about the attitude that enabling your best people to do what they were hired for will lead to huge successes. Obvious corollaries can be drawn from this statement.

11/25/2005 12:32:54 PM (Pacific Standard Time, UTC-08:00)  #   

 Monday, November 21, 2005

I just replied to a set of questions on Brad's blog. But I then thought perhaps the information would be more generally interesting to my readers. So here it is:

In response to the two issues you brought up on your first reply:

  1. Failure to mention "chain-to-base" for Dispose in the Framework Design Guidelines Book;
  2. Question about when it is safe to call methods on base types within a Finalize method.

Update: First, I have to mention something up front. I failed to mention that almost nobody should write Dispose/Finalize any longer. SafeHandle is the best way to protect resources that span (or outlive) a process, beginning in 2.0. This alleviates the details of implementing this pattern and gives you reliability guarantees that you otherwise wouldn't get (i.e. critical finalization). If you need to do some form of pooling or asserting on failure-to-Dispose, this is still an option; but all "real" resource cleanup should be encapsulated in a SafeHandle.

Re (1):

If we neglect to say a derived class must chain to its base class Dispose method (if it has one), that’s a book-bug/omission. If you're writing a Dispose(bool) your preference is to call base.Dispose(bool), flowing the bool value to the base method; but if one doesn’t exist, a base.Dispose() suffices; otherwise, you simply must call base.Dispose().

This is important because Dispose implies cleaning up resources; if you don’t chain to the base type, clearly you are going to leak those resources. And it might be worse than just nondeterministic cleanup (when the user expected deterministic). Presumably Dispose on the derived class does a GC.SuppressFinalize(this), meaning the base type will never get placed into the Finalize queue, and thus won't release its resources (well, until process exit). The user of this class would notice this as unbounded resource consumption. I suspect the bug would be incredibly hard to find, too.

Re (2):

Generally, you should not make complex method calls from your Finalize method that could result in (accidentally) trying to use a resource which has already been disposed during the destruction process. This is ordinarily more of a concern during virtual method calls, where the most derived type's version is chosen dynamically; if the derived type has overridden a method and tries to use its own resources (which were already relinquished because of destruction ordering), bad things will happen. Calling a base method isn't nearly so dangerous, unless you do it after chaining the base type's destructor. The original document from which the book text was derived acknowledged that Dispose's use of this practice is risky and goes against the general advice.

But we made an exception for Dispose because it is a carefully controlled pattern. (This was in the original document.) Those writing types to follow the pattern are usually more sophisticated users that will feel more comfortable analyzing the call graphs. And virtual calls during destruction aren’t nearly as dangerous as virtual calls during construction. You’re typically concerned that a resource will be used before it’s been initialized (i.e. in the construction case), but presumably code called from your Finalize will be resilient to uninitialized state and isn’t going to make further virtual methods (which might introduce problems). Clearly if this isn't the case--and it could be difficult to verify that it is through test coverage--you will run into bugs, perhaps manifesting as crashing the Finalizer thread.

Note that the book contains an abridged (and more clear/scrubbed) version of the original document: http://www.bluebytesoftware.com/blog/PermaLink.aspx?guid=88e62cdf-5919-4ac7-bc33-20c06ae539ae. That document is actually quite a mess, doesn't stay on point, and bombards the reader with way too many details. I'm glad it got chopped up for the book.

11/21/2005 12:37:19 PM (Pacific Standard Time, UTC-08:00)  #   

 Thursday, November 17, 2005

I haven't posted a book post in a while. So here are a few recent reads.

First, those pertaining to computers:

Microsoft Windows Internals, Fourth Edition -- Mark E. Russinovich, David A. Solomon

10 of 10.
I can't believe I never got my hands on this puppy previously. After seeing Mark and David's pre-con at PDC this year, I had to run out and buy it. Yes, I read it like a novel. And yes, it was just as suspenseful and enjoying. If you want to learn more about Windows esoterica--including memory management, thread scheduling, I/O, and various other internals--this is the best book on the market. At least the best one I've seen so far. I can't say enough about it.

The Cathedral and the Bazaar -- Eric S. Raymond

8 of 10.
(I've owned this book for years, but picked it up for a re-read. I was surprised at how much new I gleaned from it.) OK, Eric Raymond is known as a complete MS-basher. But this book is quite well written. He makes tons of interesting claims, backed up with logical arguments (albeit little data), and challenges the traditional viewpoint on software development economics. He does so from more of an anthropologists view rather than an economist, but he does surprisingly well mixing the two. You can find a digital copy right here if you're too cheap to buy the book. ;)

Virtual Machines: Versatile Platforms for Systems and Processes -- Jim Smith, Ravi Nair

6 of 10.
This is probably a good book to have, and to skim through. I personally only read about 1/5th of it (those that were relevant and contained information I wasn't already entirely comfortable with) but the sections I did go through were well written. It covers various "virtual machines," from virtual execution environments--e.g. CLR and JVM--to hypervisors and more traditional virtualization (a la VirtualPC and VMWare). The content is, unfortunately, quite introductory in nature.

Next, those that have nothing to do with bits and bytes (but that I enjoyed nonetheless; non-fiction, of course):

In the Devil's Garden: A Sinful History of Forbidden Food -- Stewart Lee Allen

8 of 10.
Wow, super entertaining. This book talks about all of the foods throughout history which have been labeled "forbidden," yet for some reason always seemed to be secretly enjoyed by the more privileged classes in certain societies. Its chapters are broken up into the Deadly Sins, leading to some surprising and definitely engaging narratives throughout various cultures. I blew through this in just two sittings, mostly because it's an easy read, but also because I couldn't put it down.

Kitchen Confidential -- Anthony Bourdain

8 of 10.
OK, so this book is about food. A really good book about food. Anthony Bourdain did a brief stint as a "celebrity chef," but ultimately he's just a raw all-American cook. He is executive chef as Brasserie Les Halles in New York City. And his book details the grungy side of kitchens and the restaurant industry, but in a very intruiging and culinary-rich sense. I read it on the beach in Maui, which made it even better.

The Accidental Connoisseur: An Irreverent Journey Through the Wine World -- Lawrence Osbourne

7 of 10.
This is an enligntening tale of one man's journey through the world of wine. At first, he is confronted with a dizzying array of magical words thrown together by "experts" on taste and wine, and struggles to find his ground. Over time, a subtle transformation takes place, where the comfort level with the industry, influentials, and its products gradually rises. No strong conclusion is made, but the journey is fun.

11/17/2005 10:40:37 PM (Pacific Standard Time, UTC-08:00)  #   

 Wednesday, November 16, 2005

I wonder a few things.

How many out there write lots of multi-threaded code? If so, why; if not, why?

For those of you that do, is there a set of standard guidelines and practices that you follow? Are they public (e.g. a white-paper, book)? How much experience do you have with them? Any emperical evidence that they are better than nothing (or even another set of guidelines)?

For those practices, do you have tools to support development consistent with those practices (e.g. static analysis, dynamic analysis)? Are they commercial or homegrown? How do you protect against race conditions? How do you protect against deadlocks? Does your locking protocol employ specific rigorous engineering practices, such as using lock hierarchies/leveling, avoidance of dynamic dispatch under a lock, etc.?

Do you do user-mode scheduling? How? (E.g. fibers?)

Do you use our ThreadPool, or did you decide to roll your own? Why?

Are you a Monitor.Enter/Exit or a Win32 kind of guy or gal? Same goes for Monitor.Wait/Pulse/PulseAll and EventWaitHandle. Was this choice based on any data, or was it simply what you're most comfortable with?

There are just some of the things I wonder. Any answers to any questions would be super-cool.

11/16/2005 5:49:17 PM (Pacific Standard Time, UTC-08:00)  #   

 

Recent Entries:

Search:

Browse by Date:
<December 2005>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

Browse by Category:

Notables: