RSS 2.0

Personal Info:

Joe Send mail to the author(s) is the lead developer and architect for Parallel Extensions to .NET, tinkers with type systems, and is an author and speaker.

Blogroll:
Other
News
 C|Net
 Kuro5hin
 The Register
Technology
 <?xmlhack?>
 Daily WTF
 DevX
 Hacknot
 Java Today
 Microsoft Top 10 Downloads
 MSDN
 MSDN: "Longhorn"
 MSDN: XML Developer Center
 Slashdot
 Techdirt
 theserverside.com
 W3C
 Web Pages That Suck
 XML Cover Pages
 XML Journal
 xml.com
Technology Blogs
 Aaron Skonnard [PluralSight]
 Adam Bosworth [Google]
 Andy Rich [MS/C++]
 Arpan Desai [MS/XML]
 BCL Team [MS]
 Bill Clementson [Lisp]
 Bill de hÓra
 Bruce Eckel [J]
 Bruce Tate [J]
 Casey Chestnut
 Cedric Beust [Google]
 Chris Anderson [MS/Avalon]
 Chris Lyon [MS]
 Christian Weyer
 Clemens Vasters [newtelligence]
 Craig Andera [PluralSight]
 Dan Sugalski [Parrot]
 Daniel Cazzulino
 Dave Chappel
 Dave Roberts [Lisp]
 Dave Thomas [PragProg]
 Dave Winer
 Dion Almaer [J]
 Don Demsak
 Doug Purdy [MS/Indigo]
 Drew Marsh
 Eric Gunnerson [MS]
 Eric Rudder [MS]
 Eric Sink
 Fritz Onion [PluaralSight]
 Gavin King [J/Hibernate]
 Grady Booch [IBM]
 Hervey Wilson [MS/Indigo]
 Hillel Cooperman [MS/Shell]
 Howard Lewis Ship [J/Apache]
 Ingo Rammer [PluralSight]
 James Gosling [J/Sun]
 James Strachan [J/Groovy]
 Jason Matusow [MS/OSS]
 Jeffrey Schlimmer [MS/Indigo]
 Joe Beda [Google]
 Joel Spoelsky
 Jon Udell
 Josh Ledgard [MS/Evang]
 Joshua Allen [MS]
 Lambda
 Larry Osterman [MS]
 Maoni Stephens [MS/CLR]
 Mark Fussell [MS/XML]
 Martin Fowler
 Martin Gudgin [MS/Indigo]
 Me
 Michael Howard [MS]
 Miguel de Icaza [Mono]
 Mike Clark
 Omri Gazitt [MS/Indigo]
 Pat Helland [MS/PAG]
 Pinku Surana
 Raymond Chen [MS]
 Rich Lander [MS/CLR]
 Rob Howard
 Rob Relyea [MS/Avalon]
 Robert Cringely
 S. Somasegar [MS/DevDiv]
 Sam Gentile
 Scoble [MS/Evang]
 Scott Guthrie [MS/WebNet]
 Scott Hanselman
 Sean McGrath [J]
 Simon Fell
 Stanley Lippman [MS/C++]
 Steve Maine
 Steve Swartz [MS/Indigo]
 Steve Vinoski
 Steven Clarke [MS/Usability]
 Stuart Halloway
 Ted Leung
 Ted Neward [DM]
 Tim Bray [Sun]
 Tim Ewald [Mindreef]
 Tim O'Reilly
 Werner Vogels [Amazon]
 Wintellect
 Yasser Shohoud [MS/Indigo]
Top 20
 Brad Abrams [MS/CLR]
 Chris Brumme [MS/CLR]
 Chris Sells [MS/Ultra]
 Cyrus Najmabadi [MS/C#]
 Dominic Cooney [MS/XAF]
 Don Box [MS/Ultra]
 Don Syme [MS/R]
 Guido van Rossum [Python]
 Herb Sutter [MS/C++]
 Ian Griffiths
 Jason Zander [MS/CLR]
 Jim Hugunin [MS/CLR]
 Joel Pobar [MS/CLR]
 Krzysztof Cwalina [MS/CLR]
 Patrick Logan
 Paul Graham
 Rico Mariani [MS/CLR]
 Rory Blyth [MS/DN]
 Sam Ruby
 Wesner Moise
VC/Business Blogs
 Ed Sim
 Fred Wilson
 Jonathan Schwartz [J/Sun]
 Lawrence Lessig [Stanford]
 Mark Cuban
 Michael Hyatt
 Pierre Omidyar
 Ross Mayfield
 VentureBlog
 Weekly Read
Wine, Food & Tea
 The Silk Road of Wine
 Vinography: a wine blog
 Wine Whys

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2009, Joe Duffy

 
 Tuesday, January 23, 2007

Joel tagged me on this silly 5 things you didn't know about me meme.  Since it's floating around, and because I’d hate to disappoint a crazy Australian who’s apt to shoot vinegar in my eyes with a Super Soaker (woof, woof), I figure I'll give it a shot:

  1. I played in a heavy metal band when I was a teenager.  I was the lead guitarist, and loved to shred it up, with a sort of { 80's metal, 90's heavy metal, neo-classical, blues } style, influenced heavily by Slayer, Metallica, Pantera, GnR, Yngwie Malmsteen, Joe Satriani, and Stevie Ray Vaughan.  (Odd combination?  You bet!)  Don’t ask for pictures, I have conveniently "lost" them.  I stopped playing guitar for several years, but picked it back up in the past year and have been playing daily (focusing on building better music theory technical depth, but still shredding: my recent obsession is Dragonforce).  I also created several independent industrial and experimental techno CDs.  I try again every now and again, but I seem to have lost the patience.
  2. I stopped playing in the heavy metal band because I "accidentally" punched a guy in the face in a mosh pit at a Sepultura concert.  This broke my hand in several places and was quite unpleasant.  (I did stay for the whole set, I'm happy to report.  The bar was kind enough to supply a plastic cup with some ice.)  The emergency room wrapped it in a cast, and then after 4 weeks, it had healed incorrectly.  So then I had to have it manually rebroken, and spent the entire summer in a new cast.  At the end, I had no hand or arm strength, my finger was crooked (making it hard to hold a pick), and my band had broken up because we couldn't play out (and they apparently didn't have the motivation to replace me temporarily).  This was the end of my dreams of being a rock star.
  3. When I was 18 years old, I weighed a whopping 260 pounds.  Most of it was muscle, resulting from me working out literally every single day, and I have the stretch marks to show it.  I was a big dude, benched a little more than my weight, and even played center for a short period in high school football (I wasn’t keen on the fact that it’s apparently protocol that the quarterback routinely lays his hands on the center’s butt).  Now I weigh about 160 and have no muscle.
  4. I started programming when I was 12.  This came about because a local gaming place (Gamer's Guild in Milford, MA) had set up a new BBS system, on which they provided several MUDs, MOOs, MUSHes, etc., on their Commodore Amiga 3000's and 4000.  I became hooked, especially when I found out you could get access to the source code and actually change it.  When CircleMUD came out, circa '92 (IIRC), I got serious about hacking code, doing the bulk of my MUD programming on a hand-assembled Slackware Linux 1.0 IBM PC.  Ironically, yours truly (a Microsoft employee) did most of his learning on the Linux OS, using GNU's C Compiler.  (I actually refused to use DOS until I started doing games programming that required graphics and found some cool DOS toolkits.  I refused Windows even longer, until I got sick of using 'lynx' to browse the WWW.  I have always found Windows API programming, with UPPER_CASE type names everywhere, rather ugly and I simply didn't get it at first.)  I set up my own 8-line BBS and later in '94 converted to co-located Internet hosting, running a heavily modified CircleMUD (3.1?) for a year and a half.  There was typically around 20 players signed on at any given point.
  5. I am VERY obsessive about things.  I want to buy as many books as I can get my hold on, read them all the time, write code every waking moment, drink wine every night, go out for 15 course dinners every night, play guitar and learn every scale, major and minor, mode, modification, chord, arpeggio, I like to work, work, work until I am barely awake, and so on.  It drives my friends completely nuts, but it keeps me busy.  I've been that way ever since I was a little kid, so I don't think it's going to change anytime soon.

I don't know many people, so I'm going to commit meme heresy and not link to anybody else.  Sorry.

1/23/2007 11:52:37 PM (Pacific Standard Time, UTC-08:00)  #    Comments [2]

 Monday, January 22, 2007

I was recently asked by a customer how to guarantee alignment of CLR data on 16-byte boundaries.  They needed this capability to interoperate with code that uses SSE vector instructions to manipulate the data (which require 16-byte alignment).  The bad news is that there’s no real good way of doing this.  That is, there isn’t any “align at N bytes” feature for the CLR in which type layout and stack and heap allocation cooperate.  The good news is that you can fake it.

(I spoke about alignment with respect to atomic cmpxchg8b instructions previously, right here, for those interested in reading about that too.)

The details of how to go about ensuring 16-byte alignment depend on whether you allocate your data on the stack or the GC heap.  For illustration purposes, imagine we’re dealing with an array of float32[]’s.  We’d like to ensure the beginning lies on a 16-byte boundary:

  1. float [] a0 = new float[N]; // GC-allocated array of N floats
  2. float * a1 = stackalloc float[N]; // stack-allocated array of N floats

If you use the former, GC allocation (1), you’re going to have a really tough time.  The GC moves objects around on you as it performs compactions, and only aligns the 1st element of the array on a 4-byte boundary.  So even if you manage to get your object allocated on a 16-byte boundary (by chance), it is apt to move during a subsequent GC.

To solve this problem, you’d have to pin the object.  Pinning causes GC fragmentation, so I really encourage you to avoid this approach and go with stack allocation, (2), if you can afford it.  A float[] on the stack is similarly aligned to begin at a 4-byte boundary, but, unlike (1), it will subsequently not move around.  Of course stack allocation is often impossible, or difficult, if you are writing a reusable library that may be called in an unknown context (where the caller may have very little stack left).  This is a tradeoff you would have to make.  If the pinning is very short lived, i.e. the duration of a single function call, it might be tolerable for you, a la P/Invoke.

Regardless of whether you choose (1) with pinning or (2) by itself, you’ve now got a stable address.  And you can use the stable address to calculate the next 16-byte element in the array from the base address, and then use that as the start of the array.  You will need some extra padding at the end for the worst case, which is base + 3, meaning at most 12 bytes, so you need to allocate 3 extra floats in the array.  Here’s an example:

void * AlignUp(void * p, ulong alignBytes) {
    ulong addr = new UIntPtr(p).ToUInt64();
    alignBytes -= 1; // adjust pointer for arithmetic
    if (((1<<(IntPtr.Size*4 - 1)) - alignBytes) <= addr) throw new Exception(“overflow”);
    ulong newAddr = (addr + alignBytes) & ~alignBytes;
    return new UIntPtr(newAddr).ToPointer();
}

float * p = stackalloc float[N + 3];
p = (float *)AlignUp(p, 16);
… use p …

Note that if you were to use an array of doubles instead, you’d have some challenges.  That’s because a 8-byte value on the 32-bit CLR is only 4-byte aligned, and therefore you can end up with a situation where the next 16-byte granularity is in the middle of a single element.  For example, 12 + 8 = 20 byte, +8 = 28 byte, +8 = 36 byte, and so on.  None of these are 16-byte aligned.  Not that it really matters, so long as you allocate enough memory, but you will need to do some casting of the array reference, as shown in the above code, to do the arithmetic.

Note also that there’s a StructLayout attribute that allows you to specify alignment, through its padding field, but sadly this doesn’t impact the GC’s heap or the JIT’s stack alignment, and so it’s useless for our purposes.  Though the relative alignment within the data structure will be correct, the absolute alignment is not guaranteed to be so.

OK, so I know all of this isn’t pretty.  But it works.

1/22/2007 8:27:07 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

 Monday, January 08, 2007

I will be giving a PLINQ talk at the Declarative Aspects of Multicore Programming (DAMP) workshop at POPL next week:

[Update: 1/23/07 -- added a link to the slide deck below.]

PLINQ: A Query Language for Data Parallel Programming

Microsoft’s language integrated query (LINQ) technology provides an expressive syntax and suite of APIs which facilitate classic data parallel operations: filtering, mapping, reductions, loop tiling, sorts, nested loops parallelism, prefix scans, and more.  In this talk, we look at a new set of extensions to the LINQ technology, called parallel LINQ (PLINQ), which automatically optimizes and parallelizes query operations based on dynamic runtime information.  We believe that the use of a familiar SQL-like syntax will broaden the reach of PLINQ in industry, and provides programmers with a more declarative and flexible way of expressing data-intensive computations.

Download presentation (PPT).

The workshop is being chaired by Guy Blelloch from CMU, is located in Nice, France, and features some interesting recent research in the field of parallel programming.  If anybody will be in town and wants to meet up, please drop me a line at joedu AT microsoft.

1/8/2007 1:09:41 PM (Pacific Standard Time, UTC-08:00)  #    Comments [19]

 Sunday, January 07, 2007

Jim Larus, a Microsoft colleague of mine, recently co-authored a book on Transactional Memory with Ravi Rajwar from Intel, one of the most prominent authorities in the TM community.  It just became available.  You can purchase and download it online at Morgan & Claypool's website: Transactional Memory (Synthesis Lectures on Computer Architecture).  The series of which it is part is new, but there are at least a few other great books in its pipeline, including a CMP (chip multi-processor) architecture book by Kunle Olukotun, from Stanford and an architect of Sun's Niagara processor.

1/7/2007 12:12:42 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

 Saturday, December 30, 2006

I’m a big food addict. When I first moved to the Seattle area from Boston about three years ago, I was initially quite depressed. I expected that Seattle’s food scene would be totally bland in comparison to Boston’s, and NYC was no longer a “quick” drive away (for the occasional binge). Sadly, I still can’t say that most of Seattle’s top restaurants come even close to NYC, but it’s much better than I expected. Many of the better restaurants are right up there with Boston’s. The selection is much smaller, sure, but after a few years I’ve settled on some favorites. I’ve listed these below in approximate order of my most to least favorite.

(Note that the list isn’t complete. I’ll be updating it as time goes by.)

For anybody who recently moved to the area, is visiting, or has been here for a while and hasn’t checked out the food scene, this list might come in handy when making a choice. I even gave each five 1-5 star ratings—overall experience, food taste, food presentation, wine list, value, and service, just like a famous food critic would. (My descriptions, on the other hand, make it quite apparent that I’m not one.) Overall experience is not just the average of the other four, it’s really just how I rate the restaurant’s dining experience in general. A bit arbitrary, yes. I’ve also marked each with a rough guideline on price/person: $ is $0-$20, $$ is $20-$50, $$$ is $50-$100, $$$$ is $100-$200, and $$$$$ is >$200.

All this list really indicates is my personal likes and dislikes; grab a Zagat’s Guide for more detailed and thorough ratings. You also might want to check out tastingmenu.com. It’s a wonderful site with very detailed essays on individual meals and beautiful photographs. Everybody has their own personal taste, and I happen to disagree with some of their ratings, but that’s to be expected.

5 STARS OVERALL

Mistral
http://mistralseattle.com/
New French -- Seattle, WA (Belltown)
Food taste: 5
Food presentation: 5
Wine list: N/A (pairings)
Service: 3
Price: $$$$$

Mistral rivals some of the best restaurants in the US, and certainly beats out all others I’ve eaten at in Seattle, including the Herbfarm. This is pure foodie paradise and a great way to spend 5 hours. Dishes are elegantly and classically presented, but with many new-age molecular gastronomy components and techniques weaved seamlessly throughout. While the food is clearly French-heavy, most dishes were minimalist and surprisingly light, with judicious use of cream and butter.

I recommend the Mistral Experience, which is 9 courses and 6 (or so) paired wines. I actually ended up with 8 or 9 glasses, due to a complimentary glass of Champagne, an extra glass of dessert wine, and a few surprises tossed into the mix. Mistral offers only tasting menus, no a la carte, which surprisingly caused an entire six top to get up and leave(!) the first time I were there. Perhaps it was the price, which somehow ended up at $600 or so for two people. It was well worth it, though. The chef had absolutely no problem creating a 9 course vegetarian menu on the fly. And yes, I always ask in advance about this when I make reservations, as I would feel rude dropping in to a tasting menu-only restaurant and expecting them to completely rearrange the menu to be vegetarian friendly. Most good restaurants don’t mind.

I was at first taken aback by Mistral’s very simple interior design. It’s a pale white with hardwood floors, 12 or so tables, and subtle and warm candle lighting. The ceilings are very high, and there are only a couple windows, aside from the front glass. The result is an elegant, cozy, and quaint atmosphere, but also feels a little like you’re at a cocktail party in some chic art studio located in a dark LA alleyway, with a décor which hasn’t yet been filled out. After a few courses, the initial shock passed, and I eventually found it calming.

I have to admit the wait-staff took me off guard. The Maître d' was aloof and seemingly 100% clueless, and our waiter was a bit aggressive. To be honest, I was completely put off by his demeanor and surfer-like descriptions of the food, wine, etc. (duuuude). But to be fair… after a few courses, it became evident why he works there. He was absolutely passionate about the food and wine, and was oozing with detail after detail about the cuisine. He had a story to tell about every dish and every wine. In the end, he gave a different, but good, experience compared to what I was expecting. In a phrase, decidedly unpretentious and food geeky. The 3 rating for service is primarily because of the Maître d', not him.

Sadly I don’t tend to take very good notes when I dine out. Here are some things that stand out in my mind. Robuchon-like cauliflower soup, with nice European (high fat content) butter. Ferran Adria-like carrot and cucumber foams, without feeling ridiculously out of place. Sous-vide hamachi tuna, which was absolutely delicious and unexpected. Sous-vide as a cooking technique creates an odd meaty texture for fish (particularly tuna), but cooks it perfectly and evenly throughout. Plenty of exotic fishes and meats, mallard duck, foie gras. Fingerling potato puree with chive oil. Many, many fruit reductions, including pomegranate and quince. There were two desserts, one of which was a bursting(!) with vibrant flavor passion fruit sorbet, and tasted, well, like I had just crushed open and slurped the innards of a passion fruit. It had just the right level of pleasing sourness, and thankfully wasn’t enhanced with any sweetener at all. Desserts like maple ice cream, little hand-made cookies, and pastry. Everything had a place on the plate and wandered only a little bit, keeping the palate sufficiently amused.

The paired wines are perfectly selected, jumping all over the map, and don’t feel like just an afterthought. Pace was perfect. Given the size of the meal and starting time (9 courses, starting at 7pm), you might expect to have been a little rushed, but I wasn't. And the portions were just the right size; at the end, I was no more full than if I had eaten a typical 4-course dinner at a slightly less extravagant restaurant.

Rover’s
http://www.rovers-seattle.com/
Classic French -- Seattle, WA (Madison Park)
Food taste: 5
Food presentation: 4
Wine list: 5
Service: 4
Price: $$$$$

Rover’s cranks out classic French cuisine like no other Seattle restaurant. What makes Rover’s truly special, aside from its perfect execution and taste, are that dishes are typically locally-inspired, highlighting seasonal, fresh ingredients. All of this integrated into a truly French menu that whisks you away to Paris. OK, sure they always have great diver scallops, foie gras, truffles, … and I’m not sure how much of this is from Washington and Oregon, but these focal points are at least augmented with seasonal and regional flavors. This isn’t country style comfort food… it’s high end, small-to-medium sized dishes, but with plenty of cream, heavy butter, and warm flavors.

Go for the 8-course tasting menu, I say. You’ll feel it more than most 8-course tasting menus due to the heavy French flavors, but you won’t regret it. The wine list rocks and will keep you occupied hunting for a bottle for a couple days if you’re into that kind of thing, although you’ll have a tough time finding a bottle for under $200. Rover’s never disappoints. It puts all other classic French restaurants in Seattle to shame (of which there are very few, of course).

4 STARS OVERALL

The Herbfarm
http://www.herbfarm.com/
New American --Woodinville, WA
Food taste: 4
Food presentation: 4
Wine list: 5
Service: 4
Price: $$$$$

I’ve been here many, many times, and I’ve never been disappointed. They have only one seating per night of about 100 people (just an approximate guess), and the atmosphere is very much like going our for a night at the theatre. The menu is completely prix fixe, no a la carte. It makes for a nice, 5 hour-long evening out. If you show up 30 minutes early, you get a tour of the herb garden. After doing this a couple times, it gets a little old (since the script doesn’t change at all), but you do get to munch on some really fresh herbs to whet the appetite.

What truly makes the Herbfarm great is the ever-changing, seasonal, and locally inspired menu. If you check out their website, you’ll notice the menu rotates every two weeks or so, featuring locally available and seasonally fresh ingredients. The Mycologist’s Dream is our favorite (here’s a sample menu), featuring dozens of varieties of mushrooms, often foraged the morning before the meal from all over the state of Washington. And it’s great for vegetarians.

The food is squarely in the New American category. The dishes are very good and usually vibrantly seasoned with fresh herbs. The simple, light dishes are definitely what they do best—e.g. consommés, poached fish, simple mushroom dishes—but when things get heavy, the herbs tend to get lost in the mix and the result is usually good, but not special. I must admit that I find the style to be borderline “tired,” but that’s probably a reflection of how I feel about New American restaurants in general as I write this (you'll notice there are quite a few in Washington). Because the Herbfarm's specialty is locally produced, seasonal ingredients, the menu style can fluctuate quite a bit: without these light, herby dishes, it’s hard to find a common theme from one visit to the restaurant to the next. As a result you don't really go back craving your favorite dish as you would many other restaurants.

The paired wine selections are usually good, but sometimes not stellar. They have a massive and wonderful winelist which due to the paired wines you seldom get to explore. But in addition to the 5 paired wines, you can order others from their list. Last time, I ordered a tasty flight of Madeiras, ranging from 1890 to 1960.

The Harvest Vine
http://www.harvestvine.com/
Spanish / Tapas -- Madison Park, WA
Food taste: 5
Food presentation: 4
Wine list: 3
Service: 3
Price: $$$

I’ve died and gone to… Basque, Spain! This teensy little restaurant seats probably 40 people total, 20 seats of which are available at or around the tapas bar. They always keep those seats available for walk-ins, so you can usually get lucky on a weeknight (or at the right time on weekends; I’m not telling, sorry :) ).

Foods range from pure Spanish comfort dishes, like rustic cheese, three kinds of olives prepared in three different marinades, fried piquillo peppers with sea salt, fresh white anchovies, delicious, oozing with fat Serrano ham, blood sausages, and chorizo to new-age, but Spanish-inspired, treats like foie gras, monk fish liver, squab, ox, and scallops. The wine list is good, though not great, featuring almost exclusively Spanish and Portuguese wines. I’m salivating for a glass of Jerez right now (a fortified dessert wine, made the same way as Sherry, in fact Sherry is just the English word for Jerez (similar to how Sherry is Xérès in French)).

The Harvest Vine is always a treat.

Café Juanita
http://www.cafejuanita.com/
Northern Italian -- Kirkland, WA
Food taste: 4
Food presentation: 4
Wine list: 3
Service: 3
Price: $$$$

This is one of the best Northern Italian restaurants I’ve ever eaten at. And this is having eaten at plenty of Italian restaurants in the Boston area which is famous for its Italian North End neighborhood. There’s always a good mix of hearty and light dishes, with plenty of exotic meat, foul, and fish, spanning things like wagyu beef, prosciutto, braised octopus, boar, venison, foie gras, veal sweetbreads, …, the list just goes on and on. Just about every time I’ve gone, they have had a seasonal Mediterranean fish, usually baked with a simple ragout of herbs, olives, capers, and served whole. Last week it was Branzino (a.k.a. spigola). I’ve never been disappointed. They have just an OK selection of vegetarian dishes and salads.

They also have some wonderful cheeses, which you can often get paired with an aperitif (like grappa, a balsamic vinegar martini, etc.). The desserts are also very good. I’ll never forget the olive oil ice cream that I had there last winter: surprisingly delicious! Service is spotty, sometimes great, sometimes just OK, really depending on the server. Timing is usually very good but sometimes off. The décor is pretty bland, but as you can tell by my lack of a décor rating for the restaurants, that kind of thing doesn’t really put me off that much so long as the food knocks me out.

Lark
http://www.larkseattle.com/
New American -- Seattle, WA (Capitol Hill)
Food taste: 5
Food presentation: 4
Wine list: 2
Service: 3
Price: $$$

Lark is a new favorite of mine, and one at which I find myself at least one night every other week. Like the Harvest Vine, the restaurant is very walk-in friendly and thus conducive to impulsive dining out decisions.

The food at Lark is simple and just a tiny bit bigger than tapas sized dishes. This is classic New American food, but with some interesting twists and inventive ingredient combinations. A couple weeks back I had foie gras terrine with quince jam; they frequently have fresh, seasonal mushrooms sautéed delicately with just a little herb (like parsley) and sea salt; yellowtail carpaccio; always a wonderful squab or pheasant dish, usually with something like an apple frisee or nut compote. They also have some really great classics. You can’t go wrong with some cheese, of which they have a great selection, served with fresh, local honeycomb. Rosti potatoes with clabber cream, pommes de terre “Robuchon”, gnudi, and any of their salads are always a treat.

I have two gripes about Lark. First is that the winelist is pretty much crap. Thankfully I’m usually in on a weeknight, which means I’m up for at most a glass or two of something, in which case their by-the-glass list isn’t too far off the mark. But if you’re looking for a bottle, I’d recommend bringing your own instead. The second is timing. At Lark they tend to assume that you’re going to share a lot of the dishes. When I order things in a particular order, saying clearly “as the first course”, “as the main course”, etc., I am saying this for a reason. Pay attention! The food more than makes up for both problems, thankfully.

3 STARS OVERALL

Crush
New American -- Seattle, WA (Capitol Hill)
Food taste: 3
Food presentation: 4
Wine list: 3
Service: 3
Price: $$$$

Crush is a strong, solid, very good restaurant. I can’t honestly say that the food is great enough to put it up into the 4 star category, however, mostly because the composition of ingredients and flavors is at the same time predictable and over the top. Many dishes have way too much going on, but then again many are simple and delicious. Particularly given the interior décor, which consists of plasticy white chairs located in a white-and-black decorated old house, and the difficulty in getting reservations, I get the impression they are trying to appeal to the non-foodie chic crowd a little bit too much.

I tremendously enjoyed the sautéed Hudson Valley foie gras with warm huckleberries on a slice of pumpernickel bread. It had huge, warm flavors and awesome fat content. The sea scallops and Maui onion confit was also terrific. When I think of Crush, I think of consistently good New American food where you can’t go wrong. I don’t think I’ve had a dish I didn’t like there, although their portions are often a little bit much for me.

Il Terrazzo Carmine
http://ilterrazzocarmine.com/
Italian -- Seattle, WA (Pioneer Square)
Food taste: 3
Food presentation: 3
Wine list: 4
Service: 2
Price: $$$$

Restaurant Zoë
http://www.restaurantzoe.com/
New American -- Seattle, WA (Belltown)
Food taste: 3
Food presentation: 3
Wine list: 2
Service: 3
Price: $$$

Café Campagne
http://www.campagnerestaurant.com/cafe_home.html
French (Bistro) -- Seattle, WA (Pike Place)
Food taste: 4
Food presentation: 3
Wine list: 2
Service: 2
Price: $$$

Dahlia Lounge
Pacific Northwest -- http://tomdouglas.com/dahlia/
Seattle, WA (Downtown)
Food taste: 3
Food presentation: 3
Wine list: 3
Service: 2
Price: $$$

Dahlia is Tom Douglas’s most upscale restaurant. In some ways, it feels like Tom is trying too hard with this one. Having eaten at his others, he is in his element when the dishes are very rustic with lots of complex flavors woven throughout, sort of sloppy, exemplified by the Palace Kitchen. I call it “hunting lodge food”, since there’s usually some game meat and generally themes that you’d imagine enjoying after a long day’s hunt. (I don’t hunt by the way, so this is purely inspired by movies, books, and other works of fiction on the topic. :P) Dahlia, on the other hand, tries to be a little minimalist while at the same time giving the same huge, rustic, and yes, sloppy feel to the dishes. If it sounds strange, it is. It kind of doesn't work for me, although some dishes are hits: I am completely in love with their Italian Bread Salad, with rustic bread cubes, fresh mozarella, kalamata olives, fruity olive oil, chunky strips of cured ham, etc. as a lunch.

I think everybody should try Dahlia, but it’s not the kind of place I’d go to on, say, a weekly basis. The food can get a little repetitive and is very predictable, as are all of Tom’s restaurants. It’s almost comfort food, but at that price, I have a dozen other comfort food restaurants that I’d rather hit up. It’s a matter of personal taste really. The bakery next door has some serious artesianal bread and pastries. Check it out; the bakery is definitely worthy of a trip on Sunday morning just to get a loaf to snack on.

Nishino
http://www.nishinorestaurant.com/
Japanese -- Seattle, WA (Madison Park)
Food taste: 3
Food presentation: 3
Wine list: 3
Service: 2
Price: $$$

Lola
http://tomdouglas.com/lola/
New Greek -- Seattle, WA (Downtown)
Food taste: 4
Food presentation: 3
Wine list: 2
Service: 2
Price: $$$

Veil
http://www.veilrestaurant.com/
New American -- Seattle, WA (Pioneer Square)
Food taste: 3
Food presentation: 4
Wine list: 2
Service: 3
Price: $$$

2 STARS OVERALL

Tango
http://tangorestaurant.com/
Spanish / Tapas -- Seattle, WA (Capitol Hill)
Food taste: 3
Food presentation: 3
Wine list: 3
Service: 3
Price: $$$

Palace Kitchen
http://tomdouglas.com/palace/
Pacific Northwest -- Seattle, WA (Downtown)
Food taste: 4
Food presentation: 2
Wine list: 2
Service: 3
Price: $$$

Campagne
http://www.campagnerestaurant.com/camp_home.html
Classic French -- Seattle, WA (Pike Place)
Food taste: 3
Food presentation: 3
Wine list: 4
Service: 2
Price: $$$$

Seastar
http://www.seastarrestaurant.com/
Bellevue, WA
Food taste: 3
Food presentation: 3
Wine list: 3
Service: 3
Price: $$$

Purple Café and Wine Bar
http://www.thepurplecafe.com/
Pacific Northwest -- Kirkland, WA, Woodinville, WA, and Seattle, WA (Downtown)
Food taste: 2
Food presentation: 2
Wine list: 3
Service: 3
Price: $$$

Monsoon
http://www.monsoonseattle.com/
New Vietnamese -- Seattle, WA (Capitol Hill)
Food taste: 3
Food presentation: 3
Wine list: 2
Service: 3
Price: $$$

Cactus
http://www.cactusrestaurants.com/
Mexican -- Kirkland, WA, Seattle, WA (Madison Park), and Seattle, WA (Alki Beach)
Food taste: 4
Food presentation: 3
Wine list: 1
Service: 2
Price: $$

Brasa
http://www.brasa.com/
New Mediterranean -- Seattle, WA (Belltown)
Food taste: 3
Food presentation: 2
Wine list: 2
Service: 2
Price: $$$

1 STAR OVERALL

Many of these places are comfort food and quick bite places. It’s a little unfair to label them “1 star” since comparing a great Southern Indian restaurant to somewhere like Mistral is like comparing apples to oranges. Each is amazingly good, but are for vastly different purposes (and with different price points). C’est la vie.

The Dish
Breakfast -- Seattle, WA (Ballard)
Food taste: 3
Food presentation: 1
Wine list: N/A
Service: 2
Price: $

Udupi Palace
Indian (Southern) -- Bellevue, WA
Food taste: 4
Food presentation: 1
Wine list: N/A
Service: 1
Price: $

Preet’s
http://www.preets.com/
Indian (Punjabi) -- Redmond, WA
Food taste: 3
Food presentation: 2
Wine list: N/A
Service: 2
Price: $

Kabul
http://www.kabulrestaurant.com/
Afghan -- Seattle, WA
Food taste: 3
Food presentation: 2
Wine list: N/A
Service: 2
Price: $$ (a little expensive for what you get)

Serious Pie
http://tomdouglas.com/serious/
Seattle, WA (Downtown)
Food taste: 3
Food presentation: 1
Wine list: 1
Service: 1
Price: $$

Wild Ginger
http://www.wildginger.net/
Pacific Rim -- Seattle, WA (Downtown)
Food taste: 2
Food presentation: 2
Wine list: 2
Service: 2
Price: $$$ (overpriced)

Szmania’s
http://www.szmanias.com/
German -- Seattle, WA (Magnolia)
Food taste: 3
Food presentation: 2
Wine list: 2
Service: 2
Price: $$$

Typhoon
http://www.typhoonrestaurants.com/
Thai -- Redmond, WA (Pioneer Square)
Food taste: 3
Food presentation: 3
Wine list: N/A (good tea list, though)
Service: 1
Price: $$

Mediterranean Kitchen
Middle Eastern -- Bellevue, WA and Seattle, WA (Queen Anne)
Food taste: 2
Food presentation: 2
Wine list: 1
Service: 2
Price: $$

I Love Sushi
http://www.ilovesushi.com/
Bellevue, WA and Seattle, WA (Lake Union)
Food taste: 2
Food presentation: 2
Wine list: N/A
Service: 2
Price: $

This place serves up good middle-of-the-road sushi, not too great, not too bad either. The raw fish is where it’s at here—I’m not overly crazy about the cooked foods. It gets busy on weekdays during lunch, but you can snag a Bento Box to go which is a fun treat.

Third Floor Fish Café
http://www.fishcafe.com/
Pacific Northwest -- Kirkland, WA
Food taste: 3
Food presentation: 2
Wine list: 2
Service: 2
Price: $$

Shamiana
http://shamianarestaurant.com/
Indian and Pakistani -- Kirkland, WA
Food taste: 3
Food presentation: 2
Wine list: 1
Service: 2
Price: $$

Marina Park Grill
http://www.marinaparkgrill.com/
Pacific Northwest -- Kirkland, WA
Food taste: 2
Food presentation: 2
Wine list: 2
Service: 2
Price: $$

Some pretty decent seafood dishes and seaside fare, like burgers and salads. It’s right on the waterfront and is a great place to stop by while walking around Kirkland and hanging out at the park. All around decent place with a not-too-shabby by-the-glass winelist. I wouldn’t go out of your way to eat here for dinner, but it makes a great late Sunday afternoon lunch on a warm summer day. There’s a little ice cream shop right around the corner for a nice treat afterwards while you walk through the park.

Chutney’s
http://www.chutneys.com/
Indian -- Bellevue, WA
Food taste: 2
Food presentation: 2
Wine list: 1
Service: 1
Price: $$

Malay Satay Hut
http://www.chutneys.com/
Malaysian -- Redmond, WA
Food taste: 3
Food presentation: 1
Wine list: N/A
Service: 1
Price: $

NEXT ON MY LIST…

These are restaurants I haven’t been to yet, but that I’ve heard first hand are quite good. They are listed in the order in which I plan on visiting them (or in the case of Gypsy, hope ;) ). And yes, I am rather embarrassed I haven’t tried Canlis or Lampreia yet. I suspect both will be either 4 or 5 stars. My intent is to do so in the first couple weeks of the new year.

Gypsy
Invitation only. :(

Lampreia
http://www.lampreiarestaurant.com/

Canlis
http://www.canlis.com/

Le Gourmand

La Carta de Oaxaca
http://www.lacartadeoaxaca.com/

Cascadia
http://www.cascadiarestaurant.com/html/

Carmelita
http://www.carmelita.net/

Izumi

12/30/2006 2:12:39 PM (Pacific Standard Time, UTC-08:00)  #    Comments [6]

 Friday, December 29, 2006

Deadlocks aren’t always because you’ve taken locks in the wrong order.

In many systems, tasks communicate with other tasks through shared buffers.  In a concurrent shared memory system, these buffers might be simple queues shared between many threads.  In COM and Windows GUI programs these buffers might take the form of a window’s message queue.  In any case, if some task A performs a synchronous message send to task B, and task B does a synchronous send to task A, near simultaneously, and if neither task continues to process incoming messages, both will be blocked forever.

This is the classic reentrancy versus deadlock problem.  Ensuring that both A and B continue to process incoming messages while blocked on a send will eliminate the deadlock, albeit at the cost of possible reentrancy headaches.  Better yet, you could just send messages asynchronously.

Things can get quite a bit more complicated than this, of course.  Imagine that we have three operations, A, B, and C, being run over N concurrent streams of data.  We use data parallelism to partition the input data into and replicate the operations over the N streams, such that we have A0…AN-1, B0…BN-1, and C0…CN-1 operating on disjoint input.  A0 produces data for B0 which produces data for C0, and so on.  Elements are pulled on demand from the leaves (A0…AN-1) to the root (C0…CN-1), using a single execution resource (like a thread) per stream, i.e. E0…EN-1.

This is quite a bit like many real data parallel systems, including stream processing.

Imagine that sometimes AN finds that some input data must be given to BM instead of BN (where N != M); there is a similar story for B and C too.  We might be tempted to use some form of shared buffer to perform the inter-task communication here.  In other words, when A0 finds something for B1, it sends the data to it by placing it into B1’s input buffer.  This might be done asynchronously, i.e. A0 needn’t wait for B1 to actually consume the message, hence avoiding the sort of deadlock we noted earlier.

Unfortunately, since it might take some unknowable amount of time for B1 to process its input, we might worry about excessive memory usage for these buffers.  So we could put a bound on its maximum size using an ordinary bounded buffer… but once we’ve done that, we have turned what was an asynchronous send into a possibly-synchronous one, and in doing so introduced the same deadlock problem with which I began this whole discussion.

We could solve this by ensuing that, whenever a task must block because the destination buffer is full, it also processes incoming messages in its own buffer.  In other words, we use reentrancy.  Sadly, things are not always quite so simple.

Imagine this case: A0 has found data for B1, but B1’s buffer has become full.  So A0 is now waiting for B1 to process messages from its buffer to make room.  Nobody else will produce data for A0 at this point, so it’s stuck waiting.  Sadly, B1 has too become blocked trying to send a message that it has found for C0.  Because the same execution resource E0 that must free space in C0’s buffer is currently blocked in A0 waiting for B1 to free space from its buffer, and because the execution resource E1 is also waiting for E0, we now have a very convoluted deadlock on our hands.

The solution?  There’s no reason to keep the execution resource E0 occupied in A0 waiting for B1 in this case.  E0 could instead be freed up to run C0, freeing space for B1, and untangling the system.  Reentrancy strikes again, but this time, in a good way.  Note that in a heterogeneous system where these buffers are not controlled by the same resource, this solution is difficult to realize in practice.  Maybe A uses a custom bounded buffer written in C# to communicate, B uses SendMessages and COM message queues, and C uses GUI messages.  In this case, orchestrating the waiting to be deadlock free becomes a real challenge.

12/29/2006 11:27:31 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]

 Friday, December 22, 2006

You might have noticed I'm blogging a lot more about native code than I used to.  Most of my day-to-day work still has to do with managed code, but I've been dazzled by all the new cool things in Vista that you can't do yet in managed code (and probably won't be able to for some time to come, since the CLR needs to keep running on old OSes).  And, besides, many low-level Windows changes impact managed code too, like the fairness post from last week.

My question to you is:  Do you vehemently like or dislike either category of post?  Does the split between the two work?  I'm trying hard to keep it at about 50/50. 

I'm wondering not just for blogging reasons.  As you might imagine, my book is looking to be very similar to my blog.  It is, after all, "Concurrent Programming on Windows", which sometimes involves using native code to access features that managed doesn't expose to you.  Similarly, the CLR gives you many things that Windows alone doesn't give you.  I'm trying reeeeealllly hard to ensure the prose is not schizophrenic, flows and progresses nicely, and doesn't repeat things: e.g. "here's how you do it in VC++, here's how you do it in C#, ..."  This can be a challenge for many things, but is obviously very important.  Thankfully a huge portion of the book is more about applying the technologies generally, which tends to be pretty common across both environments.

So ... what do you think?

12/22/2006 11:56:18 AM (Pacific Standard Time, UTC-08:00)  #    Comments [10]

If you need RegisterWaitForSingleObject-like behavior in the new Vista threadpool -- a great feature, by the way, that amortizes the cost of losing a thread to waiting by cramming up to 63 waits into a single threadpool thread, leading to better scalability on systems that must wait for a lot of things concurrently, sometimes for long or unpredictable times (and the magic behind ASP.NET Asynchronous Pages) -- you'll want to look at the SetThreadpoolWait and related APIs.  You get basically the same functionality, with a mostly cleaned up interface and the added benefit of having cleanup groups to take care of resource cleanup (if you so choose):

VOID WINAPI SetThreadpoolWait(PTP_WAIT pwa, HANDLE h, PFILETIME pftTimeout);

One thing annoyed me about this API when I first tried to use it.  To be truthful, it still does.  Instead of a DWORD timeout specified in 1ms intervals, you'll notice you have to supply a pointer to a FILETIME data structure.  Wha-wha-whaat?  This differs from just about every other wait API on Windows and is not as simple as it sounds.

The no-timeout case, which is typically -1 (a.k.a. INFINITE), is simple enough: just pass a NULL pointer.  The "check, but immediately timeout if unsignaled without waiting," variant, typically specified with 0, is pretty straightforward.  Just initialize your FILETIME to contain 0's.  There are many ways to do this but using a struct field initializer is probably the easiest:

FILETIME ft = {0,0};
SetThreadpoolWait(..., &ft);

But for real timeouts, how the heck do you get your hands on a properly initialized FILETIME?  OK, here's where things get nasty.  You can grab ahold of a FILETIME from many places, including from an actual file's creation or modification time, but you probably don't want to do that.  GetSystemTimeAsFileTime(&ft) will turn the current time into a FILETIME, which is useless without additional math to turn it into a time relative to now (otherwise, you'd just use a {0,0} FILETIME as noted above).  Alternatively, you could start with a SYSTEMTIME (the current can be retrieved with GetSystemTime), which allows you to define a precise year, month, day, hour, minute, second, and/or millisecond, and then turn that into a FILETIME with SystemTimeToFileTime(..., &ft).  But how many times do you really have a precise absolute date and time at which you want your wait to stop?  Yes, February 3rd, 2019, at 6:29 AM (and 39.66 seconds) please.

Whew!  At this point, you're probably bewildered.  Most wait timeouts are defined in terms of relative time.  Thankfully, you can still define a relative time when calling SetThreadpoolWait.  But sadly you have to resort to some messy casting and data conversion to do this.  How do you do that?  Simple.  If the FILETIME's contents represent a negative integer, then the threadpool adds the negation of that to the current system time during registration to come up with an absolute time.  So it does the GetSystemTimeAsFileTime/addition nonsense for you.  Well, how the heck do you do create a negative FILETIME anyway?  "Easy"!

__declspec(align(8)) FILETIME ft;
*reinterpret_cast<LONG64 *>(&ft) = -1000;
SetThreadpoolWait(..., &ft);

That causes the threadpool to use a timeout of 1000 from the current time.  1000 units of what you might ask?  FILETIME represents its time with 100ns units, so this particular timeout of -1000 is interpreted to mean 100 microseconds from now.  Generally, if you have n milliseconds, you'd calculate n * 1000 (to get microsecond units) * 10 (to get 100 nanosecond units).

[Update: 12/26/2006: I changed the examples to use __declspec(align(8)), ensuring that the FILETIME is aligned on an 8-byte boundary.  Pavel noted in comments to this post that, since FILETIME consists of two DWORDs, VC++ will align on 4-byte boundaries.  On some architectures -- like IA64 -- treating these as a single 64-bit aggregate piece of data will present alignment faults to the software.  (This is unlike the behavior you'll see on X86 and X64, where such faults are fixed transparently by the hardware and/or OS, albeit at a performance hit.  You can ask the OS to fix these up on IA64 automatically, with SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT), but this results in even worse performance than on other architectures.)  Raymond wrote about this previously.  One solution is to align manually, as shown here; another is to memcpy data to and from the FILETIME/LONG64; yet another, which is probably the cleanest, is to simply access the FILETIME's fields directly.  E.g. FILETIME ft = { -m * 1000 * 10, ~0 } or ft.dwLowDateTime = -m * 1000 * 10; ft.dwHighDateTime = ~0, for some milliseconds timeout m.]

To save you some time and frustration in this process, you can just use the following little shim in your code.  I do.  It gives you a nice, relative DWORD-based timeout interface to the SetThreadpoolWait API:

VOID SetThreadpoolWaitWithMsTimeout(PTP_WAIT pwa, HANDLE h, DWORD dwMilliseconds) {
    __declspec(align(8)) FILETIME ft;
    PFILETIME pft;

    if (dwMilliseconds == -1) {
        // Just pass NULL to the wait API to signal "no timeout".
        pft = NULL;
    } else {
        pft = &ft;

        // FILETIME uses 100ns intervals. To convert milliseconds into 100ns units,
        // we must multiply by 1000 (to get microseconds) and then by 10 (to get 100ns).
        // We take the negative of this number to specify "relative to now" time.
        *reinterpret_cast<LONG64 *>(pft) = -dwMilliseconds * 1000 * 10;
    }

    SetThreadpoolWait(pwa, h, pft);
}

To be honest, I don't know the reasoning behind the break from tradition here.  [Update: 12/26/2006: Richard pointed out, very correctly, that the NTDLL implementations have alwasy dealt in terms of FILETIMEs and performed DWORD ms to 100ns conversions; in that sense, this is hardly a "break from tradition" for Windows generally, but is certainly a new direction for KERNEL32 itself.]  Using a FILETIME clearly causes some trickiness for something that should be damn simple, but it does have one distinct advantage.  The DWORD-based APIs only permit you to specify relative time, whereas the FILETIME approach allows you to specify relative or absolute, depending on what you need.  I have to admit that I can’t think of many cases with wait registrations where you’d want an absolute timeout, but I might be unimaginative.  This also has the benefit of removing some degree of ambiguity: with relative, you always have to wonder: time relative to what?  Is it relative to the time at which an actual wait thread sees the registration and adds the HANDLE to its list of waitable objects?  Or is it relative to the time at which the call to register the wait is made?  Or something else?  In the end, I’d have preferred a separate API for absolute time… I just like my DWORD-based relative timeouts.  They are familiar and comfortable.  Continuing to use RegisterWaitForSingleObject is an option, but doesn't let you use environments, cleanup groups, etc.  Tradeoffs abound.

12/22/2006 11:41:51 AM (Pacific Standard Time, UTC-08:00)  #    Comments [5]

 Thursday, December 14, 2006

Raymond just posted a brief entry about lock convoys.  While he links to a few other relevant posts, none of them mention the new anti-convoy features that all Windows locks now use.  So I thought that I would take the chance to do just that.

Many people claim that fair locks lead to convoys.  In my experience, however, few people really know the reason why.  Before Windows Vista (client OS) and Windows Server SP1 (server OS), mutexes, critical sections, and internal locks like kernel pushlocks used a lock handoff mechanism to guarantee true fairness.  In other words, the lock would not actually become "available" when released so long as there were threads waiting to acquire it.  Instead, the thread releasing the lock would modify the lock's state so that it appeared as if the next thread in the wait queue already owned it.  The new owner thread would then simply wake up, typically via the releaser signaling an event, and the owner would find that it already owned the lock, proceeding happily.  No other thread can "sneak in" between the time the new owner is woken and the time that it is actually scheduled for execution with this design.

While this sounds nice and, well, fair, it exacerbates convoys.  Why?  Because it effectively extends the lock hold time by the communication and context switch latency required to wake and reschedule the new owner.  Context switches on Windows are anything but cheap, and tend to cost anywhere from 4,000 to 10,000+ cycles.  Assume C represents the cost of a context switch.  Then with a truly fair lock, the lock will be in an intermediary handed off state, with no thread actually running code under its protection, for about C cycles.  It's actually worse than this.  Assuming the system is busy, a thread that is woken just goes to the end of the OS's thread scheduler queue, and is required to wait until it gets allocated a timeslice in which to execute.  This can make C much larger in practice.  And of course on highly loaded systems the condition worsens, which can add insult to injury (as we see momentarily).

To cope with the possibility of a scheduling delay, Windows uses something called a priority boost for any thread waiting on an auto-reset event (or that owns a window): the boost temporarily increases the target thread's priority which subsequently decays after it gets scheduled.  Assuming no other high priority threads are runnable, this ensures the latency is very close to C.  But C's still pretty darned big...

To illustrate the problem with the fair handoff scheme, imagine we have a lock L for which a new thread arrives every 2,000 cycles.  Each such thread runs for 1,000 cycles under the protection of the lock.  No problem, right?  On average, the lock will be held for 1,000 cycles, unheld for 1,000 cycles, and so on.  Assuming the arrival rate is somewhat random, but statistically averages out to the values mentioned, then occasionally we might get some contention, requiring waiting.  But for every contentious acquire, there should also be a big gap in time where there are no owners (or where wait queues can be drained).  A system with these characteristics should balance out well.  It should survive until threads begin arriving at a frequency of more than 1,000 cycles, give or take some epsilon, which is actually a doubling in throughput.  It's not even close to capacity.  (Real systems depend on many more factors than this simplistic view, but you get the point.)

As soon as the lock is fair, however, this scheme quickly becomes untenable and will come to a grinding halt.  Imagine that thread T0 acquires L at cycle 0; if it just so happens that T1 tries to acquire it at cycle 500, then T1 will have to wait.  Remember, on average, the arrival rate is 1,000 cycles, but that's just an average.  We expect the occasional wait to occur.  This wait, unfortunately, causes a domino effect from which the system will never recover.  T0 then releases L at cycle 1,000, as expected, handing off ownership to T1; sadly, T1 doesn't actually start running inside the lock until 5,000 (assuming 4,000 for C, and assuming no scheduling delay); in the 4,000 cycles it took for T1 to wake back up and start running, we expect 2 new threads will have arrived on average; these threads would see L as owned by T1 and respond by waiting.  By the time those threads execute, another 10,000 cycles (2*(C+1,000)) will have passed, and another 4 threads will have begun waiting.  And so on.  This process repeats indefinitely, the requests pile up (hopefully with a bound), and disaster strikes.  The system simply won't scale this way.

If you remove the strict fairness policy, however, the system scales.  And that, my friends, is why all of the locks in Windows are now unfair.

Of course, Windows locks are still a teensy bit fair.  The wait lists for mutually exclusive locks are kept in FIFO order, and the OS always wakes the thread at the front of such wait queues.  (APCs regularly disturb this ordering -- a topic for another day -- which actually calls into question the merit of the original design goal of attaining fairness in the first place.)  Now when a lock becomes unowned, a FIFO waking algorithm is still used, but the lock is immediately marked as unavailable.  Another thread can sneak in and take the lock before the woken thread is even scheduled (although priority boosting is still, somewhat questionably, in the system).  If another thread steals the lock, the woken thread may subsequently have to rewait, meaning it must go to the back of the queue, again disturbing the nice FIFO ordering.

The change to unfair locks clearly has the risk of leading to starvation.  But, statistically speaking, timing in concurrent systems tends to be so volatile that each thread will eventually get its turn to run, probabilistically speaking.  Many more programs would suffer from the convoy problems resulting from fair locks than would notice starvation happening in production systems as a result of unfair locks.

12/14/2006 10:31:42 PM (Pacific Standard Time, UTC-08:00)  #    Comments [12]

 Wednesday, December 06, 2006

I took this past week off so that I could work on my book.  Well, I'm happy to report that I've been successfully writing like a madman, averaging around 15-20 solid pages per day.  I still have a long way to go, but I'm getting more confident with the passing of each day that this book will be...  well...  a book that I'd actually like to sit down and read.

[Update: 12/7/2006: Correction made -- the CLR's JITs do not generate manual alignment code.  Instead, we defer to the costly OS handler for alignment fixups.]

In the process of writing the section on data alignment, I realized there is very little documentation on the alignment policy used by the CLR.  This is in contrast to Kang Su Gatlin's wonderful MSDN treatise on the subject for VC++, which leaves absolutely nothing hidden in the closet.  Well, I still don't have all of the answers for you.  Sorry.  You'll have to wait for the book.  But in the meantime, I've discovered that there's a myth that deserves a little debunking.

In the MSDN documentation for InterlockedCompareExchange64, it says:

"The variables for this function must be aligned on a 64-bit boundary; otherwise, this function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems."

I've also heard and read this from other various sources.  I've heard, for example, that LOCK CMPXCHG8B will still do a load/compare/store sequence, but that, if the address isn't 8-byte aligned, the instruction will not be atomic.  This would lead to sporadic atomicity failures, probably even more difficult to track down than a typical race.  Given that the CLR doesn't faithfully align 64-bit data types on 8-byte boundaries (as we'll see momentarily), I suddenly feared that Interlocked.CompareExchange(ref Int64, ...) was HORRIBLY broken.  Without an MP machine at home, I couldn't test this out, so I decided to do a little digging.

In the manuals for many AMD processors and older Intel X86 processors, I found no reference to CMPXCHG8B requiring an aligned address.  What I did find, however, in the Intel 64-bit and IA32 System Programmer's Manual Part A was the following (emphasis mine):

"The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance:

  • Any boundary for an 8-bit access (locked or otherwise).
  • 16-bit boundary for locked word accesses.
  • 32-bit boundary for locked doubleword accesses.
  • 64-bit boundary for locked quadword accesses."

If I'm reading that right, this means the common wisdom around 8-byte alignment and LOCK CMPXCHG8B is hogwash.  (Sadly, proving the absence of some flaky processor that crashes or has unpredictable behavior under certain circumstances is rather difficult, especially if someone at some point though it was true enough to put it in the MSDN documentation.  If somebody out there knows of a real case -- and it's not just hear say -- please let me know!)  If this is true of all X86 processors, it means that Interlocked.CompareExchange(ref Int64, ...) isn't horribly broken on the CLR after all.  (Yaay.)  It would have been broken...  because, as I said earlier, the CLR does NOT align 64-bit values on 8-byte address boundaries...

Conversing briefly with Simon Hall over email, the dev that owns most (all?) of the type layout infrastructure, I've concluded the following:  CLR type layout tries to eliminate all misaligned data layout through a combination of padding and field reordering.  This means that data of >= 8-bytes on 64-bit always begins on 8-byte boundaries, and data of >= 4-bytes on 32-bit always begins (at least) on 4-byte boundaries.  I say "at least" because emperical evidence shows that type layout actually aligns many 8-byte fields on 8-byte boundaries, even on 32-bit.  (It turns out this doesn't matter much...  neither the 32-bit JIT nor the GC respect this when allocating data.)  In summary, the CLR ensures that no field that could have fit inside a single 4/8-byte segment ever spills across a boundary.  The CLR also adds necessary padding to StructLayout(Sequential) types, while still preserving the original field ordering.

Therefore, the only cases where we end up with truly misaligned data is with StructLayout(Explicit) and StructLayout(Pack=...) types.  For example the simple struct, struct S { [FieldOffset(6)] int i; }, will always be misaligned, on 32- and 64-bit alike.  In such cases, our JIT simply generates the naive code and lets the OS perform misalignment fixups.  This is actually rather costly, as Kang Su's aforementioned article explaines.  We could have, like the VC++ compiler, generate the manual alignment code using a combination of loads and shifts, but my guess is that most of our customers don't care and will never notice.

To preserve the hard work done by type layout, our JITs and the GC guarantee that all allocated data is aligned on at least 4-byte (on 32-bit) or 8-byte (on 64-bit) boundaries.  I say "at least" once again because I know, for example, that VC++ aligns stack frames on 16-byte boundaries for 64-bit.  I don't claim to understand why.  We might do something similar.

Here's an interesting program that just prints out a few field addresses, and whether things are 8-byte aligned.  You'll interestingly notice that the int/long fields that are adjacent to one another are padded with 4-bytes in between on 32- and 64-bit, but that the JIT and GC only align on 4-byte addresses on 32-bit.  I presume this is so that the layout doesn't have to change between 32- and 64-bit, but I can't say for sure:

using System;
using System.Runtime.InteropServices;

class C {
    internal S s;
}

struct S {
    internal int x;
    internal long y;
    internal byte z;
}

unsafe class P {
    static void Main(string[] args) {
        int pad = 5;
        if (args.Length > 0) pad = int.Parse(args[0]);

        Console.WriteLine("Field\t[Begin\tEnd)\t%8");
        PrintStackS(pad);
        PrintHeapS(pad);
    }

    static void PrintStackS(int x) {
        int * pad = stackalloc int[x];
        S * s = stackalloc S[1];
        PrintAddr(s);
    }

    static void PrintHeapS(int x) {
        for (int i = 0; i < x; i++) new object();
        C c = new C();
        fixed (S * pcs = &c.s) {
            PrintAddr(pcs);
        }
    }

    static unsafe void PrintAddr(S * ps) {
        ulong xa = new UIntPtr(&ps->x).ToUInt64();
        Console.WriteLine("X\t{0:X}\t{1:X}\t{2}", xa, xa + sizeof(int), xa % 8);
        ulong ya = new UIntPtr(&ps->y).ToUInt64();
        Console.WriteLine("Y\t{0:X}\t{1:X}\t{2}", ya, ya + sizeof(long), ya % 8);
        ulong za = new UIntPtr(&ps->z).ToUInt64();
        Console.WriteLine("Z\t{0:X}\t{1:X}\t{2}", za, za + sizeof(byte), za % 8);
    }
}

Running it with a few different inputs yields these results:

C:\Temp>8by
Field   [Begin  End)    %8
X       12F440  12F444  0
Y       12F448  12F450  0
Z       12F450  12F451  0
X       1273670 1273674 0
Y       1273678 1273680 0
Z       1273680 1273681 0

C:\Temp>8by 2
Field   [Begin  End)    %8
X       12F44C  12F450  4
Y       12F454  12F45C  4
Z       12F45C  12F45D  4
X       1273664 1273668 4
Y       127366C 1273674 4
Z       1273674 1273675 4

If the CLR ever decides to support a 128 CAS operation, Interlocked.CompareExchange(ref Int128, ...), which I hope we will, we would need to guarantee alignment on 16-byte boundaries.  In comparison to CMPXCHG8B, CMPXCHG16B does indeed fail when issued against an address that isn't 16-byte aligned.  Instead of failing silently, a GP fault is generated.  This is difficult, because not only must type layout respect the alignment (you can already get this with StructLayout(..., Pack=16)), but the JIT and the GC would also need to allocate correctly.  Or, of course, you could over-allocate a chunk of data and shift the start pointer to the first aligned address inside of it.  This might work for the stack, but for GC allocated data this is going to keep shifting around on you, and probably won't work very well.  Before the CLR supports Interlocked.CompareExchange(ref Int128, ...), however, I suppose we ought to provide an Int128.  :)

12/6/2006 10:06:36 PM (Pacific Standard Time, UTC-08:00)  #    Comments [3]

 

Recent Entries:

Search:

Browse by Date:
<January 2007>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Browse by Category:

Notables:

Currently Up To:

Reading...

Listening...

Watching...