Tag Archives: QA

TestLab² offer

The blog is being sponsored this month by TestLab², a software testing and QA company based in the Ukraine. I have used TestLab² on a number of occasions for third party testing of PerfectTablePlan releases on both Windows and Mac OS X. They found a number of bugs that I hadn’t been able to find on my own (testing your own software is always problematic) and gave me additional confidence that I hadn’t let any embarrassing bugs make it through into the final binaries. Their prices are very reasonable (from $20/hour) and I have always found them to be very professional and responsive (see my previous write-up on outsourcing testing). They also have access to operating systems that I don’t have set-up, e.g. Windows 8 and Mac OS X 10.8.

Special offer

Quote “successful software” when you ask for an estimate and they will give you a 20% discount. This offer is valid for first-time customers, for the next 14 days only.

TestLab².com website

Cppcheck – A free static analyser for C and C++

I got a tip from Anna-Jayne Metcalfe of C++ and QA specialists Riverblade to check out Cppcheck, a free static analyser for C and C++. I ran >100 kLOC of PerfectTablePlan C++ through it and it picked up a few issues, including:

  • variables uninitialised in constructors
  • classes passed by value, rather than as a const reference
  • variables whose scopes could be reduced
  • methods that could be made const

It only took me a few minutes from downloading to getting results. And the results are a lot less noisy than lint. I’m impressed. PerfectTablePlan is heavily tested and I don’t think any of the issues found are the cause of bugs in PerfectTablePlan, but it shows the potential of the tool.

The documentation is here. But, on Windows, you just need to start the Cppcheck GUI (in C:\Program files\Cppcheck, they appear to be too modest to add a shortcut to your desktop), select Check>Directory… and browse to the source directory you want to check. Any issues found will then be displayed.

You can also set an editor to integrate with, in Edit>Preferences>Applications. Double clicking on an issue will then display the appropriate line in your editor of choice.

Cppdepend is available with a GUI on Windows and as a command line tool on a range of platforms. There is also an Eclipse plugin. See the sourceforge page for details on platforms and IDEs supported. You can even write your own Cppcheck rules.

Cppcheck could be a very valuable additional layer in my defence in depth approach to QA. I have added it to my checklist of things to do before each new release.

Outsourcing software testing

Every time I write a post for this blog I carefully check it for typos. I then get my wife to proof-read it. She always finds at least one typo. Often there will be whole words missing that my brain must have interpolated when I checked it. I read what I thought I had written. She is unencumbered by such preconceptions.

Similarly, it isn’t sufficient to do all your own testing on software you wrote, no matter how hard you try. You will tend to see what you intended to program, not what you actually programmed. Furthermore your users have different experiences, assumptions, and patterns of usage to you. Even in the unlikely event that you manage 100% code coverage in your testing, those pesky users won’t execute those lines of code in the same order you did. I have spent hours testing a program without finding a bug, only to see someone else break it within minutes or even seconds.

So it is essential to involve people other than the original programmer in testing, in addition to (but not instead of) the testing programmers do on their own code. This poses something of a challenge to one-man-bands such as my own. I don’t have other programmers, let alone QA staff, to call on. I can, and do, use volunteer customers for beta testing. But, in my experience, beta testing is not an effective substitute for professional testing:

  • It is haphazard. I never hear from ~90% of my beta testers.
  • You can’t control beta testers sufficiently, for example you can’t set them tight deadlines, make them concentrate on a particular feature or do their testing on a particular operating system
  • The quality of bug reports from customers is often poor. Customers often don’t understand (or don’t have the patience) to describe a bug in enough detail for you to reproduce it.
  • Professional testers know how to break software.
  • The new release should be as polished as possible before any customers see it. Your beta testers will be some of your most enthusiastic customers. You don’t want to use up that goodwill by sending them buggy software.

Consequently I like to pay third party testers to test my own PerfectTablePlan product after I have finished my own testing and before I do any beta testing. Previously I have used softwareexaminer.com, but they are no longer in business. So I decided to try a couple of other offshore testing companies I had heard about:

testlab2.com
qsgsoft.com

The problem with paying a testing company is that it is hard to assess the quality of their work until it is too late. If they report few bugs it could because there are few bugs or because they didn’t do a very good job of testing. By using 2 companies to test the same software release I was also testing the testers (I didn’t tell them this).

I paid each company to do approximately 3 days testing on the Windows and Mac versions of PerfectTablePlan. I was very pleased with the results. Both companies found a useful number of bugs in the software. They were also able to test on platforms that I didn’t have access to at the time (64 bit Windows 7 and Mac OS X 10.6). I didn’t keep an exact score, but I would say that QSG found more bugs, while TestLab2 was more responsive.

QSG found some quite obscure bugs. They were even able to tell me how to reproduce a very rare and obscure bug that I had been trying to track down for months without success. Communications were sometimes a little slow (at least partly due to us being in different time zones) but it wasn’t a huge issue. My only real grumble is their billing. Despite several reminder emails from me I am still waiting to be invoiced for the work several months later. I like to pay my bills promptly and then forget about them.

TestLab2 didn’t find quite as many bugs, but I was impressed with their responsiveness. They installed Mac OS X 10.6 within a few days of it being released, so they could test PerfectTablePlan on it. When I emailed them on  a Saturday about a last minute bug fix for Mac OS X 10.6 they tested the fix the same day. That is great service.

TestLab2 and QSG are based in Ukraine and India, respectively. At around $15/hour they are about a third the price of equivalent US/European companies I contacted (who might also outsource the work to Eastern Europe and India, for all I know). Some people believe outsourcing work to countries with lower costs of living is evil. I’m not one of them. I sell my software worldwide and I am also happy to buy my services worldwide, especially if I can get significantly better value for money by doing so. While there are rational arguments to be made about problems caused by differences in culture, language and time zone caused by outsourcing to other countries, I didn’t find any of these to be a major issue in this case. Most of the other arguments I have heard boil down to the simple ugly fact that some westerners feel they are entitled to a disproportionate share of the global pie. But I don’t see any reason why someone in Europe or North America is any more deserving of a job than someone in Ukraine or India.

With the help of these two companies I was able to put out a really solid PerfectTablePlan v4.1.0 release, despite the large number of new features. In fact, I am only just putting out a v4.1.1 with some bugs fixes several months later. I plan to use both companies again. I hope readers of this blog will give them some additional work to ensure they stay in business. But not so much that they don’t have time to do my next round of testing!

stackoverflow.com goes public

Jeff Atwood and Joel Spolsky’s programmer’s Q&A site stackoverflow.com has now gone from private beta to public beta today.

I have been one of the private beta testers. I find the badges a bit patronising (I’m a 42 year old professional, not a boy scout), but otherwise I have been very impressed with the site. I think it is going to be a great resource for developers – assuming they can control the group dynamics of a large number of developers (the ‘herding cats problem’) while keeping the spammers at bay. A lot of thought has gone into the reputation system, voting, badges etc so it will be interesting to see what behaviour emerges.

Go and take it for a spin. It has been designed to be ‘low friction’ – you don’t even need a login to get started.

Using defence in depth to produce high quality software

‘Defence in depth’ is a military strategy where the attacker is allowed to penetrate the defender’s lines, but is then gradually worn down by successive layers of defences. This strategy was famously used by the Soviet Army to halt the German blitzkrieg at the battle of Kursk, using a vast defensive network including trenches, minefields and gun emplacements. Defence in depth also has parallels in non-military applications. I use a defence in depth approach to detect bugs in my code. A bug has to pass through multiple layers of defences undetected before it can cause problems for my customers.

Layer 1: Compiler warnings

Compiler warnings can help to spot many potential bugs. Crank your compiler warnings up to maximum sensitivity to get the most benefit.

Layer 2: Static analysis

Static analysis takes over where compiler warnings leave off, examining code in great detail looking for potential errors. An example static analyser is Gimpel PC-Lint for C and C++. PC-Lint performs hundreds of checks for known issues in C/C++ code. The flip side of it’s thoroughness is that it can be difficult to spot real issues amongst the vast numbers of warnings and it can take some time to fine-tune the checking to a useful level.

Layer 3: Code review

A fresh set of eyes looking at your code will often spot problems that you didn’t see. There are various ways to go about this, including formal Fagan inspections, Extreme Programming style pair programming and informal reviews. There is quite a lot of documented evidence to suggest that this is one of the most effective ways to find bugs. It is also an excellent way to mentor less experienced programmers. But it is time consuming and can be hard on the ego of the person being reviewed. Also it isn’t really an option for solo developers

Layer 4: Self-checking

Of the vast space of states that a program can occupy, usually only a minority will be valid. E.g. it might makes no sense to set a zero or negative radius for a circle. We can check for invalid states in C/C++ with an assert() macro:

class Circle
{
    public:
        void setRadius( double radius );
    private:
        double m_radius;
}

void Circle::setRadius( double radius )
{
    assert( radius > 0.0 );
    m_radius = radius;
}

The program will now halt with a warning message if the radius is set inappropriately. This can be very helpful for finding bugs during testing. Assertions can also be useful for setting pre-conditions and post-conditions:

    void List::remove( Item* i )
    {
        assert( contains( i ) );
        ...
        assert( !contains( i ) );
    }

Or detecting when an unexpected branch is executed:

    switch ( shape )
    {
        case Shape::Square:
            ...
        break;

        case Shape::Rectangle:
            ...
        break;

        case Shape::Circle:
            ...
        break;

        case Shape::Ellipse:
            ...
        break;

        default:
            assert( false ); // shouldn't get here
        break;
    }

Assertions are not compiled into release versions of the software, which means they don’t incur any overhead in production code. But this also means:

  • Assertions are not a substitute for proper error handling. They should only be used to check for states that should never occur, regardless of the program input.
  • Calls to an assert() must not change the program state, or the debug and release versions will behave differently.

Different languages have different approaches, for example pre and post conditions are built into the Eiffel language.

Layer 5: Dynamic analysis

Dynamic checking usually involves automatically instrumenting the code in some way so that it’s runtime behaviour can be checked for potential problems such as: array bound violations, reading memory that hasn’t be written to and memory leaks. An example dynamic analyser is the excellent and free Valgrind for Linux. There are a few dynamic analysers for Windows, but they tend to be expensive. The only one I have tried in the last few years was Purify and it was flaky (do IBM/Rational actually use their own tools?).

Layer 6: Unit testing

Unit testing requires the creation of a test harness to execute various tests on a small unit of code (typically a class or function) and flag any errors. Ideally the unit tests should then be executed every time you make a change to the code. You can write your own test harnesses from scratch, but it probably makes more sense to use one of the existing frameworks, such as: NUnit (.NET), JUnit (Java), QUnit (Qt) etc.

According to the Test Driven Development approach you should write your unit tests before you write the code. This makes a lot of sense, but requires discipline.

Layer 7: Integration testing

Integration testing involves testing that different modules of the system work correctly together, particularly the interfaces between your code and hardware or third party libraries.

Layer 8: System testing

System testing is testing the system in it’s entirety, as delivered to the end-user. System testing can be done manually or automatically, using a test scripting tool.

Unit, integration and system testing should ideally be done using a coverage tool such as Coverage Validator to check that the testing is sufficiently thorough.

Layer 9: Regression testing

Regression testing involves running a series of tests and comparing the results to the same input data run on the previous release of the system. Any differences may be the result of bugs introduced since the last release. Regression testing works particularly well on systems that take a single input file and produce a single output file – the output file can just be diff’ed against the previous output.

Layer 10: Third party testing

Different users have different patterns of usage. You might prefer drag and drop, someone else might use right-click a lot and yet another person might prefer keyboard accelerators. So it would be unwise to release a system that has only ever been tested by the developer. Furthermore, the developer inevitably makes all sorts of assumptions about how the software will be used. Some of those assumptions will almost certainly be wrong.

There are a number of companies that can be paid by the day to do third party testing. I have used softwareexaminer.com in the past with some success.

Layer 11: Beta testing

End-user systems can vary in processor speed, memory, screen resolution, video card, font size, language choice, operating system version/update level and installed software. So it is necessary to test your software on a representative range of supported hardware + operating system + installed software. Typically this is done by recruiting users who are keen to try out new features, for example through a newsletter. Unfortunately it isn’t always easy to get good feedback from beta testers.

Layer 12: Crash reporting

If each of the above 11 layers of defence catches 50% of the bugs missed by the previous layer, we would expect only 1 bug in 2,048 to make it into production code undetected. Assuming your coding isn’t spectacularly sloppy in the first place, you should end up with very few bugs in your production code. But, inevitably, some will still slip through. You can catch the ones that crash your software with built-in crash reporting. This is less than ideal for the person whose software crashed. But it allows you to get detailed feedback on crashes and consequently get fixes out much faster.

I rolled my own crash reporting for Windows and MacOSX. On Windows the magic function call is SetUnhandledExceptionFilter. You can also sign up to the Windows Winqual program to receive crash reports via Windows’ own crash reporting. But, after my deeply demoralising encounter with Winqual as part of getting the “works with Vista” logo, I would rather take dance lessons from Steve Ballmer.

Test what you ship, ship what you test

A change of a single byte in your binaries could be the difference between a solid release and a release with a showstopper bug. Consequently you should only ship the binaries you have tested. Don’t ship the release version after only having tested the debug version and don’t ship your software after a bug fix without re-doing the QA, no matter how ‘trivial’ the fix. Sometimes it is better to ship with minor (but known) bugs than to try to fix these bugs and risk introducing new (and potentially much worse) bugs.

Cross-platform development

I find that shipping my software on Windows and MacOSX from a single code base has advantages for QA.

  • different tools with different strengths are available on each platform
  • the Gnu C++ compiler may warn about issues that the Visual Studio C++ compiler doesn’t (and vice versa)
  • a memory error that is intermittent and hard to track down on Windows might be much easier to find on MacOSX (and vice versa)

Conclusion

For the best results you need your layers of checks to be part of your day-to-day development, not something you do just before a release. This is best done by automating them as much as possible, e.g.:

  • setting the compiler to treat warnings as errors
  • performing static analysis and unit tests on code check-in
  • running regression tests on the latest version of the code every night

Also you should design your software in such a way that it is easy to test. E.g. building in log file output can make it much easier to perform regression tests.

Defence in depth can find a high percentage of bugs. But obviously the more bugs you start with the more bugs that will end up in your code. So it doesn’t remove the need for good coding practices. Quality can’t be ‘tested in’ to code afterwards.

I have used all 12 layers of defence above at some point in my career. Currently I am not using static analysis (I must update that PC-Lint licence), code review (I am a solo developer) and dynamic analysis (I don’t currently have a dynamic analyser for Windows or MacOSX). I could also do better on unit testing. But according to my crash reporting, the latest version of PerfectTablePlan has crashed just three times in the last 5000+ downloads (the same bug each time, somewhere deep down in the Qt print engine). Not all customer click the ‘Submit’ button to send the crash reports and crashes aren’t the only type of bug, but I think this is indicative of a good level of quality. It is probably a lot better than most of the other consumer software my customers use[1]. Assuming the crash reporting isn’t buggy, of course…

[1]Windows Explorer and Microsoft Office crash on a daily basis on my current machine.