Sunday, June 21, 2009

Intuition, Peformance, and Scale

Intuition is a double-edged sword. A blessing and a curse as it were. Intuition is knowing something with a reasonable sense of certainty without any justification for why you know it. It's this inability to clearly explain yourself when acting on intuition that is frustrating to you and those around you. Sometimes people will just trust your instincts but other times they will challenge you. There are many outcomes of such challenges. You may fail to convince anybody of your instincts. But, there are times when the challenge will lead you to a better understanding of your intuition. And that is exactly how I arrived at a simple realization of how performance and scale balanced to build usable, high volume web sites and services.

First, I should probably review performance vs. scale because there is a good deal of confusion about what the two qualities really are. Performance is a measure of responsiveness. How much time elapses while the service requested by the customer is carried out by the system. It doesn't matter whether this is a web page, a web service, or event. Time is the unit of measure for performance.

Scale measures how much work the system can perform within a given set of performance parameters. If the maximum acceptable response time for a web service is 500 milliseconds, scale will specify the maximum transactions per second the system can reach before response time exceeds the limit. Unlike performance, which can be accurately measured in most systems, scale is always an estimate. The system paths required to perform a customer request are pushed to their limits and the peak throughput is extrapolated. This is always a bit of science and a bit of faith as bottlenecks cannot accurately be predicted, especially at very high transaction rates.

As engineers we have a tendency towards optimizing performance. I suspect that one reason is that it's more exact. I can focus on individual hot spots, improve those, and measure an improvement in performance. Scale on the other hand is a bit more difficult. It often involves making seemingly illogical changes to the system that will often decrease performance. And that last bit is where engineers really object. How can you suggest approaches that are slower when the goal is increasing the throughput of the system?

A classic case in point is the ongoing debate about the performance advantages of stored procedures vs. application driven queries. The argument for stored procedure performance has always been based on several points. One is that stored procedures will have their statements parsed and query plans stored. This is true, but the reality is that most databases today do the same for all SQL so frequently used application SQL loses little to stored procedures here. The other main argument is that it is more efficient to process data near the data rather than pulling it into the application. This is mostly true. I say mostly because some aggregate and ordering operations may be more efficient to do in compiled code if the data sizes involved are relatively small. In those cases the data transfer overheads are often less than the database computational overheads.

So if stored procedures have an advantage over application SQL why would I be opposed to them. Am I a performance heretic? Because stored procedures don't allow the system to scale as well as moving the workload into the application servers. In fact, I go further than just avoiding stored procedures to reducing the complexity of queries as much as possible. I'm not a fan of ordering, grouping, or aggregation, especially if the application needs the full result set anyway. Look at any system architecture and the most difficult and expensive component to scale is the database.

Database servers are typically attached to SAN storage. Most sites use fiber channel for SAN so even if all the other hardware is equivalent in price to the application servers, database servers require SAN interfaces. If you are using a commercial database product, then depending upon the vendor and your contract, database servers also require additional license fees. It isn't unusual for a database server to cost anywhere from 150% to 300% more than an application server of equivalent performance. So from a cost perspective alone, I would prefer to push work to my application servers away from my database servers.

The bigger challenge comes as the database server reaches capacity. There are only two options available when this occurs. Scale out or scale up. Scale out requires a significant engineering effort while scale up puts you onto the slippery slope of bigger and bigger hardware and a less than desirable step function in capacity costs. So again, to the extent that I can delay capacity challenges on the database, I can maintain a simpler and less expensive architecture. Scaling out application servers is much simpler and for most application architectures comes almost for free until the number of servers becomes unwieldy.

This brings us to the fundamental "aha" moment. In building web applications to scale, you will regularly trade performance for scale. That's not to say that you will never pay attention to performance. Your performance has to be good enough to keep your customers happy. It turns out providing performance beyond that level though doesn't make them any happier and if you add that performance at the cost of scale, you're painting yourself into a dark corner.

If your goal is to create a site for the masses, then your architecture will need to revolve around protecting those components that are expensive or difficult to scale. Peak performance is not your primary concern. Acceptable performance as you bring on customers or your customers increase their use of your service is the challenge. Rather than measure each design against optimal performance for one customer, measure it against acceptable performance for all current and future customers. And be willing to give up a bit of response time for an increase in scale. Ultimately most customers won't pay you more for extra performance but you will earn more by having more customers or having them do more with your service.

Sunday, October 12, 2008

Television for Software Engineers

No, this isn't going to be an article about concrete architectural practices. But it may in fact be as useful as any of those. I watch some television. Okay, sometimes I watch a lot more than I should. But mostly I listen to it while doing other things, like now as I write this article. And somewhat surprisingly, I find some television to be incredibly applicable to the field of software engineer. More interesting is that the programs I find applicable have absolutely nothing to do with software or in most cases even computers. Here's programming I find very insightful.

Detective Programs

Some of these are much better than others. All software engineers are inherently intrigued by a good puzzle. Unfortunately most of the remaining population isn't so too many detective programs turn into juicy dramas with lame intrigue instead of good intellectual challenges. My two favorite by far are Columbo and Monk. What these two programs share is the reliance on incongruities in details to lead our hero to the solution. How does this apply to software? Well, debugging hard problems is almost always about paying attention to seemingly inconsequential details. Inconsistency in behavior in an unrelated flow is often the key indicator to the root cause.

Mythbusters

I enjoy Mythbusters immensely. Yes, it is truly geeky fun. And they get to blow real things up, not just simulated explosions in mathematical models and 3D renderings. But what is a true joy to watch is how thoroughly methodical they are in solving problems. They research, design, and prototype. They fanatically analyze their results and make adjustments based on the data. Few programs have been bold enough to expose the analytical and development process so transparently. In fact, the boldest thing Adam and Jamie do is solve a problem in front of millions of people, knowing that they will receive hundreds of comments on the work they do. Think about it. Would you work on a problem in front of millions?

Modern Marvels - Engineering Disasters

This series fascinates me. I'm addicted to it. I wish they would make more because I've seen all the episodes at least 3 times each. Two things stand out in this series and is a common theme across the more than 60 disasters they have presented:

  1. Catastrophic failures are always the result of compounding problems. They come about as the result of a "perfect storm". Nobody believed that the combination of events could occur within a critical time window so nobody planned for it.
  2. Engineers are an egotistical lot. We are sure we got it right and only when our creations collapse in front of us do we realize we missed something. It's not surprising though as creating what we create from nothing more than thought and will does require a good deal of egotism.

Every engineer in every discipline should watch this series. It gives you insight into the thought process required to make your creations more failure resistant. And you can see what happens when you fail to account for not just a collection of single failures, but for the simultaneous occurrence of these failures.

So that's my collection. Care to add your own?

Technorati Tags: , , , , ,