Friday, April 12, 2013

I am naturally drawn to teams that work at an insane pace. The momentum, and persistent drive to increase that momentum, generates amazing results. And it's crazy fun.

In such environments, however, I've found one thing to be a constant struggle for everybody on the team -- leaders, managers, and individual doers alike: remembering to take the necessary time to do the right thing. This sounds obvious, but it's very easy to lose sight of this important principle when deadlines loom, customers and managers and shareholders demand, and the overall team is running ahead at a breakneak pace.

A nice phrase I learned from a past manager of mine was, "sometimes you need to slow down to speed up."

By taking shortcuts today, though attractive in that they help meet that next closest deadline, you almost always pay for them down the road. You might subsequently become quagmired in bugs because quality was comprimised from the outset. You may create a platform that others build upon, only to realize later that the architecture is wrong in need of revamping, incurring a ripple effect on an entire software stack. You may realize that your whole system performs poorly under load, such that just when your startup was beginning to skyrocket to success, users instead flee due to the poor experience. The manifestation differs, but the root cause is the same.

The level of quality you need for a project is very specific to your technology and business. I'll admit that working on systems software demands different quality standards than web software, for example. And the quality demands change as a project matures, when the focus shifts from writing reams of new code to modifying existing code... although the early phases are in fact the most challenging: this is when the most critical cultural traits are not yet set but are developing, when things have the highest risk of getting set off in the wrong direction, and is when you are most likely to scrimp on quality due to the need to make rapid progress on a broad set of problems all at once.

So how do you ensure people end up doing the right thing? Well, I'd be lying if I didn't say it is a real challenge.

As a leader, it is important to create a culture where individuals get rewarded for doing the right thing. Nothing beats having a team full of folks that "self-police" themselves using a shared set of demanding principles.

To achieve this, leaders needs to be consistent, demanding, and hyper-aware of what's going on around them. You need to be able to recognize quality versus junk, so that you can reward the right people. You need to set up a culture where critical feedback when shortcuts are being taken is "okay" and "expected." I've made my beliefs pretty evident in prior articles, however I simply don't believe you can do this right in the early days without being highly technical yourself. As a team grows, your attention to technical detail may get stretched thin, in which case you need to scale by adding new technical leaders that share, recognize, and maintain or advance these cultural traits.

You also can't punish people for getting less done than they could have if they took those shortcuts. Many cultures reward those who hammer out large quantities of poorly written code. You get what you reward.

In fact, you must do the opposite, by making an example out of the people who check in crappy code.

Facebook has this slogan "move fast and break things." It may seem that what I'm saying above is at odds with that famous slogan. Indeed they are somewhat contradictory, however paradoxically they are also highly complementary. Although you need to slow down to do the right thing, you do also need to keep moving fast. If that seems impossible, it's not; but it sure is difficult to find the right balance.

I have a belief that I'm almost embarassed to admit: I believe that most people are incredibly lazy. I think most quality comprimise stems from an inherent laziness that leads to details being glossed over, even if they are consciously recognized as needing attention. The best developers maintain this almost supernatural drive that comes from somewhere deep within, and they use this drive to stave off the laziness. If you're moving fast and writing a lot of code, strive to utilize every ounce of intellectual horsepower you can muster -- sustained, for the entire time you are writing code. Even if that's for 16 hours straight. If at any moment a thought occurs that might save you time down the road, stop, ponder it, course correct on the fly. This is a way of "slowing down to speed up" but in a way where you can still be moving fast. Many lazier people let these fleeting thoughts go without exploring them fully. They will consciously do the wrong thing because doing the right thing takes more time.

I've developed odd habits over the years. As a compile runs, I literally pore over every modified line of code, wondering if there's a better way to do it. If I see something, I push it on the stack and make sure to come back to it. By the time I've actually commited some new code -- regardless of whether it's 10,000 lines of freshly written code, or a 10 line modification to some existing stuff -- chances are that I've read each line of code at least three times. I disallow any detail I see to slip through the cracks. And my mind obsesses over all aspects of my work, even during "off times" (e.g., eating dinner, walking down the hallway, etc). Each of these opportunities represents a chance to slow down, reflect, and course correct.

Do I still miss thing? Sure I do. But that's why it's so critical to have a team around you who shares the same principles and will help to identify any shortcomings that I've missed.

Another practice I encourage on my team is fixing broken windows. I'm sure folks are aware of the so-called broken windows theory, where neighborhoods in which broken windows are tolerated tend to accumulate more and more broken windows with time. It happens in code, too. If people are discouraged from stopping to fix the broken windows, you will end up with lots of them. And guess what, each broken window actually slows you down. As more and more accumulate, it can become a real chore to get anything meaningful done. I guarantee you will not be able to move very fast if too many broken windows pile up and start needing attention. Slowing down to fix them incrementally, as soon as they are noticed, speeds you up down the road.

Building a quality-focused team isn't easy. But creating a culture that slows down to do the right thing, while simultaneously moving fast, provides an enormous competetive advantage. It's not as common as you might think.

4/12/2013 11:06:33 AM (Pacific Daylight Time, UTC-07:00)  #   

 Thursday, April 11, 2013

I mentioned a few months back that my team had collaborated with MSR to publish a paper to OOPSLA about some novel aspects of our programming language (see here and here).

I was excited when Jonathan over at InfoQ asked to interview me about this work. We had a fun back and forth, and I hope the result helps to clarify some of the design goals and decisions we made along the way.

You can check it out here: Uniqueness and Reference Immutability for Safe Parallelism.

4/11/2013 11:53:29 AM (Pacific Daylight Time, UTC-07:00)  #   

 Saturday, March 16, 2013

It's really hard to build a great team. It can take years of hard work and an enormous amount of patience.

The reality is that there's only a finite (read: small) number of truly amazing software developers in the world, especially compared to the opportunities and exciting projects available to them.

And yet, great teams are fueled first and foremost by great people. I often liken this to the aphorism "a rising tide lifts all boats."

The original meaning of the phrase of course had nothing to do with software. It was the notion that focusing on growth of the overall economy's GDP will necessarily have a positive impact on the incomes of individuals within that economy. Now, of course, it's not always true, and I'm no theoretical economist, however the basic idea in spirit is an intuitively interesting one.

Applying this thinking to teams, it implies you should always strive to hire better and better people. That by doing so, the overall quality of the team will rise. Hiring better and better people has a nonlinear impact to the culture, because a team is not just a disjoint set of nodes, but is instead a fully connected graph of individuals who have conversations and collaborate together. A greater overall quality of the team means richer connections and more powerful, higher quality innovation and software. It means your chance of truly changing the world has grown nonlinearly as well.

I strive to only hire people who are better than me, and better than people already on the team, in some interesting dimension. As soon as you let your high standards drop even an ounce, the average drops and there is a cumulative snowballing effect. The connections grow weaker, and a nonlinear drop in quality and innovation will occur. This is my nightmare scenario because it can go downhill very quickly.

This applies to an entire company as well as individual teams. Including what can happen should the tides lower. The brain drain begins as a slow drip, and can quickly turn into a torrential downpour in an instant. It often starts from the top, because culture and hiring start from the top.

Now, I will be the first to admit that raising the tide is hard. Damn hard, in fact. I have another phrase which is "always be on." That incredible engineer you worked with ten years ago just might be the piece missing in the puzzle today, and a good way to lift the boats. Opportunities come and go when you least suspect them, and you want those people to want to join your team. I have several individuals that literally took years of effort to recruit. And the wait was well worth it. This advice applies to individual contributers as much as it does to managers. You never know if in a few years, you'll be leading a team, kicking off your own startup, or even just helping to make your own team a better place.

And as a leader you owe all of this to your existing team. By lifting the boats, your entire team benefits. They grow, learn new things, and reach new heights in their own careers.

Despite being hard work, this all pays off the end. There is very little I find more satisfying in life than building and growing a great team, seeing the year over year improvements, and creating amazing things together. Perhaps even more than coding. (gasp)

3/16/2013 12:38:53 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, March 03, 2013

The very notion of "authority" is 90% in your head. And it's one that often holds back otherwise very capable people.

This is yet another one that I got entirely wrong early in my own career.

When starting a new job, it's natural to be in what I call "understanding and assessment" mode. In fact, coming into a new job and telling everybody how they are doing things wrong is a recipe to not only get you off on the wrong foot, but also permanently poison your relationship with what would have been very important allies down the road. However, it's critical to turn the corner at some point before it's too late. The more experience you have, the less time it should take.

When I first came to Microsoft, I was suddenly surrounded by lots of smart people with momentum and energy on whatever it was that they were building. This led me to initially assume that these people knew what was going on. It led me to assume that, simply because some guy has the title of "Corporate Vice President" or "Distinguished Engineer", they knew what was going on. In fact, in my first day on the job -- and I will never forget this -- I was in a design meeting where I had the "audacity" to tell another DE (Microsoft's highest ranking technical position at the time) that I disagreed with what he was saying. I was polite about it, and to this day think I was right. However, I was pulled aside afterwards and told how stupid a move that was. And what's worse, I actually listened for my first two years and went out of my way not to rock the boat too much. This was against my better judgment. I was still young, and had come from another job where I was confident and could safely question anything; but I made the silly mistake of thinking "well, maybe things are different around here." I still wish I could get those two years back.

Allow me to let you in on a little secret. (Well, okay, it's not really a secret, but if only I could go back and tell my younger self this. And I suppose it ought to be obvious.) These people don't always know what is going on. It's probably safe to assume that these people have been rewarded in their careers because, statistically speaking, they are right more often than they are wrong. But it's still just statistics. And truthfully, if they are any good, they will like being questioned. They enjoy the technical debate. This is a critical aspect of a great team.

In fact, if they don't enjoy the debate, you are likely in the wrong place. The person who told me I was being stupid was actually right. The organization I had joined punished, rather than rewarded, people who questioned people in positions of authority. As soon as I realized this, I got out. It's a very personal preference, I suppose, but I personally prefer organizations that reward and promote people based on ability and direct impact. Sadly, in organizations where authority prevails, advancement is almost always based on who-likes-who, ass-kissing, and time-in-position. For folks looking for cozy jobs with guaranteed income, perhaps this is ideal; but you're quite unlikely to grow rapidly, build amazing things, and change the world in such places.

The employees I love the most are those that ask tons of questions and aren't afraid to tell me when I'm wrong. These people are inquisitive about everything, whether the topic is highly technical or pure business. By questioning my own views, and forcing me to articulate them, there is an overall strengthening of the culture. Not only do I benefit as a leader by having to methodically think through and defend my approach to problem solving, however those around me also benefit because (a) often they end up influencing the organization in big ways, and (b) even if my original stance survives, they understand the rationale behind certain decisions and can grow as a result. And it's fun! -- albeit passionately heated at times.

In fact, it's painful for me to see the opposite. An employee who has been trained to blindly respect authority. I was an admittedly rebellious youngster, so I've always known that this trait is simply not in my nature. I often reflect on how lucky I am to have landed in software rather than the military. Even when making those mistakes early in my career, I knew in my heart that they were mistakes at the time. But I do know that some people feel comfortable with a hierarchy of authority. They like the structure, and questioning it simply isn't an option. Sadly, some such people are beyond repair.

These days, I have to say that kids who grew up programming in their teens are the most fearless and rebellious. I have a dirty admission: I love hiring and mentoring and growing these individuals the most, because they have yet to be "trained" to respect authority. Once a vulnerable person early in their career has been brainwashed, it is an incredibly difficult thing to reverse. And so, as their career progresses, the longer these habits sit unaddressed, the less salvageable they become. Thankfully I caught it early on. I've managed plenty of folks who didn't.

Now, you can't be an asshole about it. And you can't be arrogant. Software is all about people and collaboration, and all of this questioning must be done with a single goal in mind: to make the software, the organization, and/or its people better. Authority is there for a reason, which is that ultimately someone needs to run the business, make decisions, and have their butt on the line. Sometimes the simple reality is that a leader's intuition is extremely good, and though data may be lacking to support the decision, you can trust it. It's okay to agree to disagree, or sometimes admit that someone simply has a stronger background than you in a particular area and so maybe you aren't in a position to fully understand why a certain decision was made. I have always tried to turn such occasions into learning opportunities. Jot down a few notes, and go read about it afterwards. I always jot down and research any term I hear that I'm not totally on top of, technical or business. It happens all the time.

In your next job interview, go out of your way to question a thing or two. If the person on the other end acts offended, either you asked in the wrong way (remember: respectful but inquisitive), or you shouldn't take the job. If it's a startup, read the business plan ahead of time and come prepared with some hard questions. If it's a corporation, ask what is rewarded, find holes in the technical architecture, question areas of the engineering process that could be improved. I really do think this is one of the most important cultural traits of a well-run team. And I guarantee you'll have way more fun on such a team, perhaps the most important cultural trait of all.

3/3/2013 9:52:29 AM (Pacific Standard Time, UTC-08:00)  #   

 Sunday, February 17, 2013

I sincerely apologize that comments are disabled on my blog right now.

I dislike one way conversations.

However, it turns out that spammers have caught up to the technology my blog used circa 2010 to filter out nasty and wretched comments. I simply can't keep up with deleting them by hand any longer.

I am, of course, always interested in your emails and feedback. Please feel free to shoot me a note at joedu AT microsoft DOT com if you are so inclined. I promise to read and respond.

And I sincerely hope that within the next month I'll manage to upgrade my blog software. It seems like a daunting task for some reason, even though it is trivial in nature.

2/17/2013 10:33:34 PM (Pacific Standard Time, UTC-08:00)  #   

The best people in software have an innate ability to communicate using code. They have an idea and simply code it up, thereby making it reality. In fact, the best people are, I would say, obsessed with code.

Pick somebody in software that has done great things. Bill Gates comes to mind for me, because that’s who inspired me to get started in software. He wrote code for as long as he could manage, and famously delivered code reviews even as his company grew to 1,000s of engineers. No matter who you pick, I am sure one thing rings true: they obsessed over the details. And when it comes to software, those details are in the code.

Those who cannot read and write code must spend all of their time convincing other people of their ideas, and are usually sufficiently disconnected from reality (i.e., the code) that their ideas do not work in practice. This is an awful situation to be in, particularly at a company whose primary asset is code. Worse, most people voluntarily place themselves into this category, particularly over time in their careers, because they believe that coding is "not one of their job responsibilities." What rubbish!

I have three particular pet peeve examples to give.

The first is what I like to call the "mediocre mid-level manager syndrome." I’ll admit that when you manage large enough teams, you have to give up on a bit of coding. I will personally never give it up entirely, even as I manage teams of 1,000s of engineers. I will always use the product my team is building, I will read the checkins to at least understand what’s going on and stay grounded, and, assuming I continue to manage groups building development platforms, I will write code using that platform. But for managers who assume responsibility for 10 or fewer engineers, there is absolutely no excuse for slacking in these areas. It’s just pure laziness, and the teams suffer enormously: such teams typically lack "adult supervision" in the area of engineering culture, lack a role model, and build wrong and crappy things. In short, such managers literally add negative value. I can’t tell you how many "Software Development Leads" at large corporations fall into this category. That this is often culturally accepted is totally broken; needless to say, it is not acceptable on my teams.

The second is something I call "code is beneath me." The two most prominent examples are folks late in their careers and researchers. The former often goes hand in hand with the mid-level manager problem. But I’ve seen it afflict software engineers too: "I’ve been a professional developer for 10 years, so my job is now to tell others what to do rather than doing anything myself." At this point, they might adopt the title Architect. The research issue, however, quite frankly perplexes me. Computer Science is an odd mixture of pure math and applied engineering. I get that many CS researchers are more math-oriented, and wish to basically do mathematics rather than software. I also get that much of this research bears fruit. But in my experience there is a very large contingency of researchers that do not produce "first rate mathematics," and yet resist becoming grounded in code. The idea that you can improve the state of software, whose bloodline is code, without ever writing a line of it or becoming proficient in it, is complete insanity. And yet it’s generally accepted.

The last example, which is close to home since I made this mistake myself for a couple years early in my career, is "I manage things, I don’t build them." The title of Program Manager is a specific manifestation of this problem. Most have backgrounds in CS, and have probably written a little code. But most PMs are also usually not very good at it, don’t love it, and probably haven’t written much since leaving school. And yet these people are often "in charge" of making decisions about features, prioritization, and competitive offerings. It’s true that some people have great intuition and can make some good decisions without knowing how things work. But when it comes to software, those abilities need to be grounded by the code. If that’s not interesting to them, I always encourage considering positions in sales and marketing, HR, or one of the other organizations in such companies that isn’t focused on actually building the software. I would literally abolish the PM position at my company if I could. Those who love code should become developers.

I have the utmost respect for people who have fallen into any of these traps, but then realizes it and gets out. Hey, I did so myself.

Even those that write code often don’t do it enough. I’ve seen so many fall into the trap of debating whether or not something would work, or how elegant it would be. Certain people are afraid of failure, or find it difficult to get motivated to "start" coding. The best people, however, realize that questions are easily answered by writing the code in prototype form. They go from 0-60 in an instant, having a vision of what they would like to build, and letting nothing get in the way. I call this "oozing code from your fingertips." I do think some of this is a skillset thing. In software, the top 20% are easily 50X more productive than the bottom 20%. But I also think these traits can be learned, given role models who exhibit and demonstrate the behavior.

Finally, I do encourage software leaders to read as much code as they can. Reading code is a great way to learn how things work, and to stay on top of what’s actually happening in your project. And it keeps your mind fresh, and often leads to new ideas.

If it isn’t obvious, I might have a slightly atypical bias here. But it’s one of the things I am most passionate about with respect to running software teams. Code speaks. Love the code.

2/17/2013 8:38:29 AM (Pacific Standard Time, UTC-08:00)  #   

I’ve been managing software teams for several years. Perhaps more importantly, I have worked for some excellent leaders and have had the opportunity to learn from their good (and bad) habits.

Because I haven’t written a line of .NET code in a few years now, that blogging well has kind of run dry. And sadly my team is not yet ready to openly share our platform externally, so I cannot blog about that either.

As a result, I thought it would be fun to start a series about leadership in software. Not just the kind of leadership expected of managers, but also individual developers and architects. I have no idea how frequently I’ll write something, however just having a continuum of content to contribute to when I have a spare moment will help liven this place back up again, I’m sure. Furthermore, one lesson that’s been imparted upon me over the years is that "writing is thinking"; so by writing this stuff down, I’m sure it will crystalize even further.

The series will be called "Software Leadership" because, after all, it’s about the software. I hope you enjoy.

2/17/2013 8:35:30 AM (Pacific Standard Time, UTC-08:00)  #   

 Saturday, December 08, 2012

I mentioned recently that a paper from my team appeared at OOPSLA in October:

Uniqueness and Reference Immutability for Safe Parallelism (ACM, MSR Tech Report [PDF])

It's refreshing that we were able to release it. Our project only occasionally gets a public shout-out, usually when something leaks by accident. But this time it was intentional.

I began the language work described about 5 years ago, and it's taken several turns of the crank to get to a good point. (Hint: several more than even what you see in the paper.) Given the novel proof work in collaboration with our intern, folks in MSR, and a visiting professor expert in the area, however, it seemed like a good checkpoint that would be sufficiently interesting to release to the public. Perhaps some day Microsoft's development community will get to try it out in earnest.

There seems to have been some confusion over the goals of this work. I wanted to take a moment to clear the air.

First, despite assertions elsewhere, the primary focus of this work was not "implicit parallelism." Instead, I would summarize our goals as:

  1. Create a single language that incorporates the best of functional and imperative programming. Both offer distinct advantages, and we envisioned marrying the two.
  2. Codify and statically enforce common shared-memory state patterns, such as immutability and isolation, with minimal runtime overhead (i.e., virtually none).
  3. Provide a language model atop which can be built provably safe data, task, and actor-oriented concurrency abstractions. Both implicit and explicit. This includes, but is not limtied to, parallelism.
  4. Do all of this while still offering industry-leading code quality and performance, rivaling that of systems-level C programs. Yet still with the safety implied by the abovementioned goals.

The language features in the paper are a vast subset of the full suite needed to achieve our overall project goals. However, these alone have exceeded our original expectations.

I've programmed a great deal in functional languages. I'm a long-time lover of LISP and ML, and my closest friends know about my hard-core dedication to Haskell (expressed in an admittedly odd manner). In fact, Haskell's elegant marriage of pure functional programming with monads, notably the state monad, was a major inspiration for the design of the type system. There are of course many other influences, such as regions, linear types, affine types, etc.; however, I'd say Haskell was the strongest.

In some sense, we have simply taken the reverse angle of Haskell with its monads: what would it be like to embed pure functional programming within an otherwise imperative language?

This first goal is proving to be my fondest aspect of the language. The ability to have "pockets of imperative mutability," familiar to programmers with C, C++, C#, and Java backgrounds, connected by a "functional tissue," is not only clarifying, but works quite well in practice for building large and complex concurrent systems. It turns out many systems follow this model. Concurrent Haskell shares this high-level architecture, as does Erlang. Well-written C# systems do the same, though the language doesn't (yet) help you to get it right.

Of course, as called out by the second goal, immutability and controlled side-effects are tremendously useful features on their own. Novel optimizations abound.

And it helps programmers declare and verify their intent. As mentioned in the paper, we have found/prevented many significant bugs this way. Did you ever want to verify that your contracts and assertions are pure, such that conditional compilation doesn't change the outcome of your program? Or that your sort comparator isn't mutating the elements while performing its comparisons? Neither has much to do with concurrency, although the latter facilitates parallel sorts. Many other systems introduce specific verification techniques to address specific problems, rather than employing a general purpose type system.

I would say the strength with respect to concurrency is not the type system itself, but rather what you can do with it.

The focus on implicit parallelism in the recent forum discussions was unfortunate. I guess "implicit parallelism" just makes for catchy and controversial titles. Yes, the type system makes implicit parallelism "safe and possible," some forms of which are indeed profitable, but it's not as though suddenly all of your for loops are going to run 8-times faster after a recompile. The optimization angle is an orthogonal, but very real, concern. There are decades of research and experience here.

Even when tasks are explicitly spawned, however, the fact that the type system catches unsafe mutable state capture that would lead to race conditions is, I dare say, game changing. I could never go back to the old model of instruction-level races, which now-a-days feels like programming a PDP6 to me (no insults implied). And yes, data parallel works great in this model. It may take a bit of imagination, rereading the article, and perhaps looking at related work such as Deterministic Parallel Java, to understand how, but it does.

The effort grew out of my work on Software Transactional Memory in 2004, then Parallel Extensions (TPL and PLINQ), and then my book, a few years later. I had grown frustrated that our programming languages didn't help us write correct concurrent code. Instead, these systems simply keep offering more and more unsafe building blocks and synchronization primitives. Although I admit to contributing to the mess, it continues to this day. How many flavors of tasks and blocking queues does the world need? I was also dismayed by the oft-cited "functional programming cures everything" mantra, which clearly isn't true: most languages, Haskell aside, still offer mutability. And few of them track said mutability in a way that is visible to the type system (Haskell, again, being the exception). This means that races are still omnipresent, and thus concurrent programs expensive and error prone to write and maintain.

Reflecting back, I am somewhat amazed that the language has taken so long to hatch. Type systems that are sound and strike the right balance of utility and approachability are hard work!

I am ecstatic that we've been able to make inroads towards solving these hard problems. My team is, quite simply, an amazing group of people, and without them the ideas would have never made it beyond the "that will never work" phase. I look forward to sharing more about our work in the years to come.

12/8/2012 11:42:47 AM (Pacific Standard Time, UTC-08:00)  #   

 Tuesday, October 30, 2012

.NET holds an enormous advantage over C++.

Well, okay, there are a few, but I’m thinking about one in particular: A single string type.

What’s not to love about that? Anybody who has done more than an hour’s worth of Windows programming in C++ should appreciate this feature. No more zero-terminated char* vs. length-prefixed char* vs. BSTR vs. wchar_t vs. CStringA vs. CStringW vs. CComBSTR. Just System.String. Hurray!

There’s one very specific thing not to love, however: The ease with which you can allocate a new one.

I’ve been working in an environment where performance is critical, and everything is managed code, for several years now. That might sound like an oxymoron, but our system can in fact beat the pants off all the popular native programming environments. The key to success? Thought and discipline.

We, in fact, love our single string type. And yet our team has learned (the hard way) that string allocations, while seemingly innocuous and small, spell certain death.

It may seem strange to pick on string. There are dozens of other objects you might allocate, like custom data types, arrays, lists, and whatnot. But there tend to be many core infrastructural pieces that deal with string manipulation, and if you build atop the wrong abstractions then things are sure to go wrong.

Imagine a web stack. It’s all about string parsing and processing. And anything to do with distributed processing of data is most likely going to involve strings at some level. Etc.

There are landmine APIs lurking out there, like String.Split and String.Substring. Even if you’ve got an interned string in hand (often rare in a server environment where strings are built from dynamically produced data), using these APIs will allocate boatloads of tiny little strings. And boatloads of tiny little strings means collections.

For example, imagine I just want to perform some action for each substring in a comma-delimited string. I could of course write it as follows:

string str = ...;
string[] substrs = str.Split(',');
foreach (string subtr in substrs) {
    Process(substr);
}

Or I could write it as follows:

string str = ...;
int lastIndex = 0;
int commaIndex;
while ((commaIndex = str.IndexOf(',', commaIndex)) != -1) {
    Process(substr, lastIndex, commaIndex);
    lastIndex = commaIndex + 1;
}

The latter certainly requires a bit more thought. That’s primarily because .NET doesn’t have an efficient notion of substring – creating one requires an allocation. But the performance difference is night and day. The first one allocates an array and individual substrings, whereas the second performs no allocations. If this is, say, parsing HTTP headers on a heavily loaded server, you bet it’s going to make a noticeable difference.

Honestly, I’ve witnessed programs that should be I/O bound turn into programs that are compute-bound, simply due to use of inefficient string parsing routines across enormous amounts of data. (Okay, the developers also did other sloppy allocation-heavy things, but string certainly contributed.) Remember, many managed programs must compete with C++, where developers are accustomed to being more thoughtful about allocations in the context of parsing. Mainly because it’s such a pain in the ass to managed ad-hoc allocation lifetimes, versus in-place or stack-based parsing where it’s trivial.

"But gen0 collections are free," you might say. Sure, they are cheaper than gen1 and gen2 collections, but they are most certainly not free. Each collection is a linked list traversal that executes a nontrivial number of instructions and trashes your cache. It’s true that generational collectors minimize the pain, but they do not completely eliminate it. This, I think, is one of the biggest fallacies that plagues managed code to this day. Developers who treat the GC like their zero-cost scratch pad end up creating abstractions that poison the well for everybody.

Crank up .NET’s XmlReader and profile loading a modest XML document. You’ll be surprised to see that allocations during parsing add up to approximately 4X the document’s size. Many of these are strings. How did we end up in such a place? Presumably because whoever wrote these abstractions fell trap to the fallacy that "gen0 collections are free." But also because layers upon layers of such things lie beneath.

It doesn’t have to be this way. String does, after all, have an indexer. And it’s type-safe! So in-place parsing at least won’t lead to buffer overruns. Sadly, I have concluded that few people, at least in the context of .NET, will write efficient string parsing code. The whole platform is written to assume that strings are available, and does not have an efficient representation of a transient substring. And of course the APIs have been designed to coax you into making copy after copy, rather than doing efficient text manipulation in place. Hell, even the HTTP and ASP.NET web stacks are rife with such inefficiencies.

In certain arenas, doing all of this efficiently actually pays the bills. In others arenas, it doesn’t, and I suppose it’s possible to ignore all of this and let the GC chew up 30% or more of your program’s execution time without anybody noticing. I’m baffled that such software is written, but at the same time I realize that my expectations are out of whack with respect to common practice.

The moral of the story? Love your single string type. It’s a wonderful thing. But always remember: An allocation is an allocation; make sure you can afford it. Gen0 collections aren’t free, and software written to assume they are is easily detectible. String.Split allocates an array and a substring for each element within; there’s almost always a better way.

10/30/2012 7:43:31 PM (Pacific Daylight Time, UTC-07:00)  #   

 Sunday, October 28, 2012

A glimpse of some research we've done recently just appeared at OOPSLA last week:

Uniqueness and Reference Immutability for Safe Parallelism

A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system's flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.

The official ACM page is here, and a tech report version is available on MSR's website.

As I said, this is just a glimpse. Its focus was mainly on the type soundness work we've done jointly with MSR, and less about the language, syntax, and uses. You'll have to use your imagination to fill in the rest ;-)

10/28/2012 3:58:30 PM (Pacific Daylight Time, UTC-07:00)  #   

 

RSS 2.0

Me
 

Joe Send mail to the author(s) is an architect and developer on a systems incubation project at Microsoft.

Recent

Search

Browse

Disclaimer:
The content of this site are my own personal opinions and do not represent my employer's view in anyway.

© 2013, Joe Duffy