Ative at Work

Agile software development

Ative at Work

Performance and Scalability Myths

"When I hear the word performance I reach for my gun".

In the fuzzy front end of a project when unknowns abound people need a sense of stable footing.

Since it usually takes a long (calendar) time to understand the requirements and domain discussions tend to focus on more concrete things like the application architecture, and eventually everybody will be obsessed with performance. It is a safe harbour for “productive discussions” when the waters are full of unknown monsters like fuzzy or non-existing requirements, a vague understanding of the domain etc.

So performance discussions ensue. Many times the customer wants to know the hardware requirements up front so we order heaps of multi-CPU servers, SANs, middle-tier application servers, front-end application servers, load-balancers etc.

The next natural step is to think up an intricate object or component distribution strategy to ensure “scalability”.

I don’t know of anyone who ever got fired for “designing a system for high performance and scalability” and building complex, buzzword compliant distributed application architectures.

But maybe someone should be. Just to set an example.

When I worked with the outstanding architect Bjarne Hansen he would often mutter, “Fowler, page 87”.

He was referring to Martin Fowler’s book, Patterns of Enterprise Application Architecture. Page 87 has a section title in big bold letters: “The allure of distributed objects” – and it addresses precisely that.

We worked on a system designed to deliver high performance. It had a set of multi-CPU servers, a fiber SAN, a separate multi-CPU application server, handwritten OR mappers using high-performance stored procedures and a complex distributed caching object database on top to keep all the clients in sync and take the load off the database server.

As for performance…

It took some 80 people around two years to build the system. I worked with Martin Gildenpfenning for a total of about two man-weeks to optimize away the bottlenecks in the application. At that point it ran faster on a developer desktop with all the components deployed on a single machine including client, server, database server and GIS system than on the high-powered multi-server deployment set-up.

It turned out network latency was the limiting factor.

The system had been designed to optimize a scenario that did not happen in practice.

As Donald Knuth put it, more often than not premature optimization is the root of all evil.

The optimization work we did was quite simple. We did so as performance as bottlenecks became evident as we neared the end of the development cycle. At that point we had a real working application and knew the use cases to optimize. Armed with a set of profilers the task was quite easy.

In fact, the big lesson was that once the app is there and the real data is there it is easy to find the bottlenecks – and in general most performance bottlenecks can be solved in a very local manner: most of the bottlenecks could be removed by refactoring a single module only. Mostly it was a matter of replacing a algorithm with a better (non-quadratic!) one or introduce a bit of caching, for example, to not load the same object more than once from the database during the same transaction.

Discussing performance early in a project embodies the complete fallacies of the waterfall project model. It does not work. It is broken. Given a good design, optimizing performance is a simple, measurement-driven task that belongs in the end of every iteration in the project.

1) “It is easier to optimize correct code than to correct optimized code.” (Bill Harlan,

2) Don’t pretend you can spec the hardware for a system before you even know the requirements.

Someone said on the Ruby on Rails IRC channel:
> “PHP5 is faster than Rails”
< “Not if you’re a programmer”

Google and O’Reilley’s “Hacker of the Year” award recipient for 2005, David Hanson, related how they built and deployed the first major Rails application on an 800 MHz Intel Celeron server running the full web-server-app-server-database application stack. It maxed out when they had about 20,000 users on the application and at that time it was easy to scale it out.

At the same time Rails is widely claimed to “not be scalable” by the members of the J2EE “enterprise app” community. These are the same people who designed our multi-tier architecture that only maxed out the CPU idle times when it was put into production.

So, alas here is a checklist for your next project:

  1. Get some hard, measurable targets for performance and expected transaction volume. 
  2. If performance comes up, ask for the CPU, network, I/O utilization statistics for the current system (if it exists). At least this will provide some guidance to whether or not performance will be an issue.
  3. If you are asked to design the hardware, suggest building the app on a single server with plenty of RAM and doing some measurements later. Even if performance becomes an issue, you will benefit from some additional months of Moore’s law to get a better deal on those Itanium boxes.
  4. Create a simple design with no optimizations: use an off-the-shelf O/R-mapper; resist the urge to build complex caches, keep everything in the same process, fight the urge to write those “performance enhancing” stored procedures. Figure out the common use cases and implement a spike with them. Even if the tools and simple design introduces a few milliseconds of overhead you will have saved plenty of man-months in development to pay for an extra CPU to compensate.
  5. Scale the application out, not the components. The big lesson from the big web sites is that buying a load balancer and a rack of cheap servers each running the full application is good enough. In fact, for most applications you don’t even need more than one server.
  6. Measure early, measure often.



Søren Lindstrøm said:

I guess - the above recommedation only apply if you are capable of making a loosely coupled design. Judging from what I have seen so far, this simple principle is very very had to uphold.  

oktober 21, 2006 2:50

Martin Jul said:

Good design always requires a lot of discipline.

For example, one time I was working with a a guy on my team that said that it would be a very easy to fix a particular problem by inserting a small hack in a certain module. And he was going on and on about how to do it. I had to stop him and tell him that no matter how much he explained it, all I heard was that he said "hack".

Basically unless there is a lot of discipline many developers are willing to forget the principle of separation of concerns and the single responsibility principle. And after that optimization requires much more work since you have to optimize everywhere rather than locally in a few central components. But on the other hand we have to set the baseline at the level of proper craftsmanship - if you have a bunch of legacy,  spaghetti-coding cowboys on your team the first priority is not performance - it is getting to the state of a working application with much more focus on the basics.

Even when we were working on what was the biggest .NET-project in Europe we only spent around 2 weeks performance tuning it out of multiple years of development.

oktober 22, 2006 1:13

Søren Lindstrøm said:

In principle I totally agree, that you should adress the designissues before you adress the performance problems. However...

On this massive project I am currently working on, there is a wast codebase with numerous security- and performance problems. None of them are very hard to fix - theoretically. However the codebase is so strongly coupled that it is impossible to adress locally without a major and expensive refactoring effort.

So all I'm saying is this: Be realistic up front! If your organisation does not have the in house skills to produce professional software, but would like to get a piece of the pie anyway, you better start adressing performance issues as soon as possible, because it is no trivial task later in the process.  

oktober 23, 2006 12:08

Martin Jul said:

I think we agree - but we are talking about two separate things:

First, there is the "design for performance" trap that leads people to complex distributed designs that actually make everything slower because of the network latency etc. This is bad bad bad. Design for the simple case, then refactor as necessary.

Then, there is performance optimization on the working application. This is something to measure continuously (this means run a test with some production-sized data sets on each increment, not just a minimal test set). Then as soon as some puts code into the application that pushes it away from its performance target  - fix it. Since we are using small increments there is less code to optimize than if we cross our fingers and hope that it will go away later. Moores law is helpful, but since many projects specify their hardware upfront they don't even get any benefit from this. So the goal is to get to a known good state - including performancewise - and stay there.

And as you say - since a lot of organisations lack these basic skills there is plenty of work for us in the coming years teaching them to write better well-performing software faster.

oktober 23, 2006 8:41

Mike Dunlavey said:

I agree with everything you've said, and I can recognize the professionalism in your approach.

I've only got one nit to pick, and that is about profilers.  

When I am looking at a single-thread program that needs optimizing, I just run it under a debugger, halt it with a pause key, examine the call stack, and do it again.  Depending on how bad the performance problem is, it can take very few samples to see it.

The reason it works is that it points to specific call instructions that can be midway up the call stack, and the fraction of call stacks they appear on is an estimate of how much time you could save by removing them.  

Profilers can only narrow you down to the routine containing those calls (with luck), leaving you with some guesswork, and guesswork is not good.  The sampling method eliminates the guesswork.

I've never seen a performance problem in single-thread software that could not be found by this method in very short order, and it almost never something one could have guessed ahead of time.

After you do one such optimization, the whole process can be repeated several times, culminating in large speedup ratios.

juli 8, 2007 7:52

Martin Jul said:

Hi Mike

Thanks for your profiling tip - I love it for its simplicity.

I think performance is all about having a full arsenal of practises at hand - profiling through the debugger like that, writing performance data to log files (eg. response time from web service calls), or using different profilers.

I use a combination of different profilers to crack the really hard problems - some are good at analysing memory usage, others use a "sampling" approach like the one you describe - these are good for really big profiling problems without adding too much overhead to the application - and yet others use some kind of code instrumentation to get detailed line-by-line performance data. I still haven't found a single profiler that does it all well.

I totally agree with your experience that the bad code is never where we expect it to be. For example, I was profiling an application last week where we all assumed that its was slow due to calling a lot of slow web services. Yet the profiler revealed that we could also improve responsiveness by making small changes to a few frequently called internal methods for getting configuration data and resources (localisation). Having done this and measured again we now know with certainty that it is only slow because of slow web services and that no more significant gains are possible in the scenarios we profiled by improving the client-side performance alone.

juli 10, 2007 1:22

Mike Dunlavey said:

For distributed processes like yours, I have a method which I do not claim is easy, but it is effective.  I get a detailed log of all the messages, timestamped to the millisecond, and go through them one by one, making sure I understand the reason for each one, and looking for suspicious time lapses.  Often I find that either duplicate message exchanges are happening, or that there is an unnecessarily long delay between response and acknowledgement due to some some problem on our side that can be fixed.  It takes a few hours to do one iteration of this cycle, but the result is worth it.

juli 10, 2007 3:18

About Martin Jul

Building better software faster is Martin's mission. He is a partner in Ative, and helps development teams implement lean/agile software development and improve their craftmanship teaching hands-on development practises such as iterative design and test-first. He is known to be a hardliner on quality and likes to get things done-done.
© Ative Consulting ApS