Optimising your application

When I first released PerfectTablePlan I considered 50-200 guests as a typical event size, with 500+ guests a large event. But my customers have been using the software for ever larger events, with some as large as 3000 guests. While the software could cope with this number of guests, it wasn’t very responsive. In particular the genetic algorithm I use to optimise seating arrangements (which seats people together or apart, depending on their preferences) required running for at least an hour for the largest plans. This is hardly surprising when you consider that seating assignment is a combinatorial problem in the same NP-hard class as the notorious travelling salesman problem. The number of seating combinations for 1000 guests in 1000 seats is 1000!, which is a number with 2,658 digits. Even the number of seating combinations for just 60 guests is more than the number of atoms in the known universe. But customers really don’t care about how mathematically intractable a problem is. They just want it solved. Now. Or at least by the time they get back from their coffee. So I made a serious effort to optimise the performance in the latest release, particularly for the automatic seat assignment. Here are the results:

ptp308_vs_ptp_310.png

Total time taken to automatically assign seats in 128 sample table plans varying in size from 0 to 1500 guests

The chart shows that the new version automatically assigns seats more than 5 times faster over a wide range of table plans. The median improvement in speed is 66%, but the largest plans were solved over ten times faster. How did I do it? Mostly by straightening out a few kinks.

Some years ago I purchased my first dishwasher. I was really excited about being freed from the unspeakable tyranny of having to wash dishes by hand (bear with me). I installed it myself – how hard could it be? It took 10 hours to do a wash cycle. Convinced that the dishwasher was faulty I called the manufacturer. They sent out an engineer who quickly spotted that I had kinked the water inlet pipe as I had pushed the dishwasher into place. It was taking at least 9 hours to get enough water to start the cycle. Oops. As soon as the kink was straightened it worked perfectly, completing a cycle in less than an hour. Speeding up software is rather similar – you just need to straighten out the kinks. The trick is knowing where the kinks are. Experience has taught me that it is pretty much impossible to guess where the performance bottlenecks are in any non-trivial piece of software. You have to measure it using a profiler.

Unfortunately Visual Studio 2005 Standard doesn’t seem to include profiling tools. You have to pay for one of the more expensive versions of Visual Studio to get a profiler. This seems rather mean. But then again I was given a copy of VS2005 Standard for free by some nice Microsofties – after I had spent 10 minutes berating them on the awfulness of their “works with vista” program (shudder). So I used an evaluation version of LTProf. LTProf samples your running application a number of times per second, works out which line and function is being executed and uses this to build up a picture of where the program is spending most time.

After a bit of digging through the results I was able to identify a few kinks. Embarrassingly one of them was that the automatic seat assignment was reading a value from the Windows registry in a tight inner loop. Reading from the registry is very slow compared to reading from memory. Because the registry access was buried a few levels deep in function calls it wasn’t obvious that this was occurring. It was trivial to fix once identified. Another problem was that some intermediate values were being continually recalculated, even though none of the input values had changed. Again this was fairly trivial to fix. I also found that one part of the seat assignment genetic algorithm took time proportional to the square of the number of guests ( O(n^2) ). After quite a bit of work I was able to reduce this to a time linearly proportional to the number of guests (O(n) ). This led to big speed improvements for larger table plans. I didn’t attempt any further optimisation as I felt was getting into diminishing returns. I also straightened out some kinks in reading and writing files, redrawing seating charts and exporting data. The end result is that the new version of PerfectTablePlan is now much more usable for plans with 1000+ guests.

I was favourably impressed with LTProf and will probably buy a copy next time I need to do some optimisation. At $49.95 it is very cheap compared to many other profilers (Intel VTune is $699). LTProf was relatively simple to use and interpret, but it did have quirks. In particular, it showed some impossible call trees (showing X called by Y, where this wasn’t possible). This may have been an artefect of the sampling approach taken. I will probably also have a look at the free MacOSX Shark profiler at some point.

I also tried tweaking compiler settings to see how much difference this made. Results are shown below. You can see that there is a marked difference with and without compiler optimisation, and a noticeable difference between the -O1 and -O2 optimisations (the smaller the bar, the better, obviously):

vs2005_optimisation_speed.png

Effect of VS2005 compiler optimisation on automatic seating assignment run time

Obviously the results might be quite different for your own application, depending on the types of calculations you are doing. My genetic algorithm is requires large amounts of integer arithmetic and list traversal and manipulation.

The difference in executable sizes due to optimisation is small:

vs2005_optimisation_size.png

I tried the two other optimisation flags in addition to -O2.

  • /OPT:NOWIN98 – section alignment does not have to be optimal for Windows 98.
  • /GL – turns on global optimisation (e.g. across source files, instead of just within source files).

Neither made much noticeable difference:

vs2005_additional_opt.png

However it should be noted that most of the genetic algorithm is compiled in a single file already, so perhaps /GL couldn’t be expected to add much. I compared VC++6 and VS2005 version of the same program and found that VS2005 was significantly faster[1]:

vc6_vs_vs2005_optimisation_speed1.png

I also compared GCC compiler optimisation for the MacOSX version. Compared with VS2005 GCC has a more noticeable difference between optimised and unoptimised, but a smaller difference between the different optimisations:

gcc_optimisation_speed.png

Surprisingly -O3 was slower than -O2. Again the effect of optimisation on executable size is small.

gcc_optimisation_size2.png

I also tested the relative speeds of my 3 main development machines[2]:

relative-machine-speed.png

It is interesting to note that the XP box runs the seat assignment at near 100% CPU utilisation, but the Vista box never goes above 50% CPU utilisation. This is because the Vista box is a dual core, but my the seat assignment is currently only single threaded. I will probably add multi-threading in a future version to improve the CPU utilisation on multi-core machines.

In conclusion:

  • Don’t assume, measure. Use a profiler to find out where your application is spending all its time. It almost certainly won’t be where you expected.
  • Get the algorithm right. This can make orders of magnitude difference to the runtime.
  • Compiler optimisation is worth doing, perhaps giving a 2-4 times speed improvement over an application built without compiler optimisation. It probably isn’t worth spending too much time tweaking compiler settings though.
  • Don’t let a software engineer fit your dishwasher.

Further reading:

“Programming pearls” by Jon Bentley a classic book on programming and algorithms

“Everything is fast for small n” by Jeff Atwood on the Coding Horror blog

[1] Not strictly like-for-like as the VC++6 version used dynamic Qt libraries, while the VS2005 version used static Qt libraries.

[2] I am assuming that VS2005 and GCC produce comparably fast executables when both set to -O2.

18 thoughts on “Optimising your application

  1. Andy Brice Post author

    I have used gprof in the past. The output isn’t very easy to work with. Also my main development platform is VS2005 on Windows and gprof isn’t going to work with that.

    I could take the time to port my product to Linux, but (as you noted) Linux users don’t expect to pay for stuff.

  2. Nathan

    If you are seeing “impossible” call traces, what likely happened is a side-effect of LTProf walking the call stack (or similar) to determine callers. Function X called a function Z which tail-called Y, and Z’s stack frame no longer exists for LTProf to look at–all it sees is X’s stack frame and then Y’s stack frame, so it assumes X called Y. There’s no information there to tell it otherwise.

  3. FeepingCreature

    Fair, but you can use gprof on Windows too. There’s a port of GCC and the associated build tools called MinGW (which incorporates the Win32 headers); gprof should work with it.

    The page is at http://mingw.org/ .

    Have fun!

    –feep

  4. vanekl

    Also keep in mind that your code may not be at fault at all, but a library call may be slowing things down to a crawl because of poor implementation. I wont be naming any names, but the library name rhymes with Moost. (If you work with C++ much you’ll know what I mean.) I can still feel that hard-earned lesson.
    [Just to be clear, the “Moost” library is very useful, but contains some code that will bite you on the ass if you aren’t careful.]
    Andy’s right: don’t take anything for granted. Measure.

  5. Andy Brice Post author

    Feep,

    Time is money. Its worth far more than $60 of my time not to have to get my app working under mingw. Anyway I can probably use gprof with gcc on MacOSX. Shark might be a better choice though. It is also free.

  6. Benjamin Meyer

    Seeing that it worked on Both OS X and Windows hinted that it was written using Qt. Downloading the binary and running string over it indeed it is written in Qt :) You really should build it in Linux (which shouldn’t take much work at all sense you use Qt) and check out Valgrind/KCachegrind (http://kcachegrind.sourceforge.net/) which not only kicks ass, but is open source so you can download both the gui and the tools behind it and enhance them if you want. Valgrind includes other tools that will also automatically find memory leaks, thread issues and more. I use Valgrind/KCachegrind to profile my code and it is a fantastic tool.

  7. Andy Brice Post author

    I have used valgrid before and it is a great tool for finding memory problems. But I am not sure how much use it would be for profiling though.

    >which shouldn’t take much work at all sense you use Qt

    It’s always easy when you aren’t the one that has to do it. ;0)

    Anyway, I don’t intend to spend the hard cash required to buy a Linux licence for Qt any time soon. A 2 platform Qt licence is expensive enough thanks. Especially given that the worldwide market for commercial table planning software on Linux is approximately 0.

  8. Pingback: Top blogs, hot posts 12/19/2007 1:38:59 PM « conduong.co.cc

  9. Sean Murphy

    One implication of your work might be to segment the product into two or three “party sizes” so that you know how many folks are actually planning 1,000 seat parties. It would seem to me that they would be willing to pay more and would are benefiting the most from your optimization efforts. Do you specify a largest party size that Perfect Table Plan supports? You might think about three bands:

    1-200 (probably people don’t buy it unless they have at least
    some minimum 60? 80?)
    200-600
    600-1200

    Also it would seem that there would be an opportunity for a SaaS version of this product, especially for organizations that are doing a number of banquets and want to make the “in progress” view available.

  10. Andy Brice Post author

    Sean,

    I have been thinking quite a bit about segmentation. Watch this space.

    I am less convinced about a web version (although not ruling it out completely), but I am slowly adding web-enabled features to the software.

  11. Sean Murphy

    Our clients have found that a SaaS application allows you to learn a lot faster. It also enables different business opportunities than software appliance or “on premises software” models (although they have advantages as well). It just struck me that for certain kinds of larger parties enabling a more self-service model might be worth more.

    Also the space owner might be willing to develop a rich set of templates one for the application that could be shared across a variety of events. They might be willing to look at a “pay per use” model either as a cost they absorb to differentiate their service, or just as a pass through that would almost be de minimis for large parties. You are also enabling them, in conjunction with your existing or perhaps new printer relationships, to offer an additional service.

    Folks who make their living as wedding planners (or event just event planners) might also be interested in a “professional version” of the product. There are probably some additional ancillary revenue streams that you could enable that their clients would view as taking “friction” (e.g. delay and transcription error) out of the process.

    There is at least one player in the SaaS market http://www.toptableplanner.com/ and they segment between personal at $20 for six months and $80 for a year (multiple logins). Strangely they don’t segment on size (and claim unlimited size), where I would think they would have more trouble than you do because of the limitations of a browser based approach over a fully installed application. It doesn’t appear that they let you enter constraints on seating folks and they don’t appear to create a “placement.” I would think on the professional end in particular you would save folks a lot of time that’s more difficult to bill.

  12. Andy Brice Post author

    You make some good points Sean. But the key advantage of web apps is collaboration, and that just isn’t something my customers are asking me for. In fact, on the contrary, you really don’t want your mother-in-law meddling with your seating arrangements! Also it is instructive to note just how feature poor my web-based competitors are compared to PerfectTablePlan. I assume this is due to the current deficiencies of the web as a platform, rather than any shortcomings in the programming skills of my competitors.

  13. Andy Brice Post author

    It is ages since I used LTProf, so I don’t think I could tell you anything useful. The last time I did profiling I used Shark on the Mac.

Comments are closed.