I don't know what's publicly available about our future ship schedules. But regardless, we begin M1 -- our first real coding milestone for the next version of the CLR -- on Monday. There's been some work going on in the meantime, of course, limited mostly to prototyping, design, and prioritization, but it's finally time to get serious, write real product code, and start hitting dates.
One fairly large item on our schedule is revamping our thread-pool. Our primary aim there is to enable fine-grained parallelism, and to supply new scheduling features that many people have asked for in the past. Today, coarse-grained parallelism is more attractive due to the costs associated with scheduling and dispatching work items, but we are going to change that.
This includes these tentative high level items:
- Low performance overhead of queueing and dispatching work
- Deadlock avoidance (surging) due to 100% blocking
- Queue partitioning and isolation
- Prioritization of work items
- Cancellation of work items, possibly with support for Vista IO Cancellation
- NUMA awareness such as CPU affinitization and/or user-hinted node affinitization
- And, of course, enhanced debugging and diagnostics
We'd love any feedback on any of these, including which sound more or less important to you. And if you have an interesting problem or scenario we might not have considered, please, please, please let me know.
A colleague of mine recently referred me to the Cilk work at MIT. This paper supplies a good overview. We've been slowly arriving at a similar design, so it's great to have prior art from which to draw. The idea most important with respect to the thread-pool is how multiple queues can be backed by a single physical thread store, and further the way in which queues are dynamically load balanced via thread leases and work stealing.