I was a guest on episode 21 of Bootstrapped.fm, the podcast of Andrey Butov and Ian Landsman. The discussion was very wide-ranging, touching on SAAS vs web, the Qt development environment, the royal wedding, A/B testing, capoeira, Adwords, the history of shareware, my new training course and lots more besides. I really enjoyed it. Boostrapped.fm also has a thriving discussion forum at discuss.bootstrapped.fm.
The blog is being sponsored this month by TestLab², a software testing and QA company based in the Ukraine. I have used TestLab² on a number of occasions for third party testing of PerfectTablePlan releases on both Windows and Mac OS X. They found a number of bugs that I hadn’t been able to find on my own (testing your own software is always problematic) and gave me additional confidence that I hadn’t let any embarrassing bugs make it through into the final binaries. Their prices are very reasonable (from $20/hour) and I have always found them to be very professional and responsive (see my previous write-up on outsourcing testing). They also have access to operating systems that I don’t have set-up, e.g. Windows 8 and Mac OS X 10.8.
Quote “successful software” when you ask for an estimate and they will give you a 20% discount. This offer is valid for first-time customers, for the next 14 days only.
Every time I write a post for this blog I carefully check it for typos. I then get my wife to proof-read it. She always finds at least one typo. Often there will be whole words missing that my brain must have interpolated when I checked it. I read what I thought I had written. She is unencumbered by such preconceptions.
Similarly, it isn’t sufficient to do all your own testing on software you wrote, no matter how hard you try. You will tend to see what you intended to program, not what you actually programmed. Furthermore your users have different experiences, assumptions, and patterns of usage to you. Even in the unlikely event that you manage 100% code coverage in your testing, those pesky users won’t execute those lines of code in the same order you did. I have spent hours testing a program without finding a bug, only to see someone else break it within minutes or even seconds.
So it is essential to involve people other than the original programmer in testing, in addition to (but not instead of) the testing programmers do on their own code. This poses something of a challenge to one-man-bands such as my own. I don’t have other programmers, let alone QA staff, to call on. I can, and do, use volunteer customers for beta testing. But, in my experience, beta testing is not an effective substitute for professional testing:
- It is haphazard. I never hear from ~90% of my beta testers.
- You can’t control beta testers sufficiently, for example you can’t set them tight deadlines, make them concentrate on a particular feature or do their testing on a particular operating system
- The quality of bug reports from customers is often poor. Customers often don’t understand (or don’t have the patience) to describe a bug in enough detail for you to reproduce it.
- Professional testers know how to break software.
- The new release should be as polished as possible before any customers see it. Your beta testers will be some of your most enthusiastic customers. You don’t want to use up that goodwill by sending them buggy software.
Consequently I like to pay third party testers to test my own PerfectTablePlan product after I have finished my own testing and before I do any beta testing. Previously I have used softwareexaminer.com, but they are no longer in business. So I decided to try a couple of other offshore testing companies I had heard about:
The problem with paying a testing company is that it is hard to assess the quality of their work until it is too late. If they report few bugs it could because there are few bugs or because they didn’t do a very good job of testing. By using 2 companies to test the same software release I was also testing the testers (I didn’t tell them this).
I paid each company to do approximately 3 days testing on the Windows and Mac versions of PerfectTablePlan. I was very pleased with the results. Both companies found a useful number of bugs in the software. They were also able to test on platforms that I didn’t have access to at the time (64 bit Windows 7 and Mac OS X 10.6). I didn’t keep an exact score, but I would say that QSG found more bugs, while TestLab2 was more responsive.
QSG found some quite obscure bugs. They were even able to tell me how to reproduce a very rare and obscure bug that I had been trying to track down for months without success. Communications were sometimes a little slow (at least partly due to us being in different time zones) but it wasn’t a huge issue. My only real grumble is their billing. Despite several reminder emails from me I am still waiting to be invoiced for the work several months later. I like to pay my bills promptly and then forget about them.
TestLab2 didn’t find quite as many bugs, but I was impressed with their responsiveness. They installed Mac OS X 10.6 within a few days of it being released, so they could test PerfectTablePlan on it. When I emailed them on a Saturday about a last minute bug fix for Mac OS X 10.6 they tested the fix the same day. That is great service.
TestLab2 and QSG are based in Ukraine and India, respectively. At around $15/hour they are about a third the price of equivalent US/European companies I contacted (who might also outsource the work to Eastern Europe and India, for all I know). Some people believe outsourcing work to countries with lower costs of living is evil. I’m not one of them. I sell my software worldwide and I am also happy to buy my services worldwide, especially if I can get significantly better value for money by doing so. While there are rational arguments to be made about problems caused by differences in culture, language and time zone caused by outsourcing to other countries, I didn’t find any of these to be a major issue in this case. Most of the other arguments I have heard boil down to the simple ugly fact that some westerners feel they are entitled to a disproportionate share of the global pie. But I don’t see any reason why someone in Europe or North America is any more deserving of a job than someone in Ukraine or India.
With the help of these two companies I was able to put out a really solid PerfectTablePlan v4.1.0 release, despite the large number of new features. In fact, I am only just putting out a v4.1.1 with some bugs fixes several months later. I plan to use both companies again. I hope readers of this blog will give them some additional work to ensure they stay in business. But not so much that they don’t have time to do my next round of testing!
This can be incredibly useful. So far I have used it for:
- support – Sometimes email just doesn’t cut it. If your customer has Skype, you can use screen sharing to see exactly what your customer is doing while talking to them.
- remote usability testing – Usability testing is very important. But luring a stream of fresh victims to your office to take part is a logistical headache. If you use Skype screen sharing neither of you has to leave the comfort of your own computer. I have used it successully to do usability testing with people on the other side of the world.
Skype screen sharing has its limitation. The images are bit blurry, there is some latency and you can’t interact with the remote computer (as you can with services such as Copilot). But it is good enough for most purposes, and it’s free!
If you are going to be using Skype much, then I strongly recommend buying a USB headset. It is much more comfortable than holding a phone to your ear for extended periods and it keeps your hands free for typing. I use a Logitech headset and I have been quite happy with it. I sometimes get sweaty ears during a long call, but it seems a small price to pay.
- The instrumentation can use breakpoint functionality to get better line coverage on builds with debug information enabled.
- Previous sessions can be automatically merged into new sessions.
- The default colour scheme has been toned down.
- The flashing that happened when you resized the source window has gone.
- It is now possible to mark sections of code not to be instrumented. I haven’t had time to try this yet, as it was only introduced in v3.0.4. But it should be very useful as currently I have a lot of defensive code that should never be reached (see below). Instrumenting this code skews the coverage stats and makes it harder to spot lines that should have been executed, but weren’t.
There are still a few issues:
- I had problems trying to instrument release versions of my code.
- It still fails to instrument some lines (but not many).
- I had a couple of crashes during testing that don’t seem to have been caused by my software (although I can’t prove that).
But the technical support has been very responsive and new versions are released fairly frequently. Overall version 3 is a major improvement to a very useful tool. Certainly it helped me find a few bugs during the testing of version 4 of Perfect Table Plan on Windows. I just wish there was something comparable for MacOSX.
The sink is full of washing, I am wearing odd socks and I haven’t been out of the house in days. It must be time to put out that new release. But how can I be sure my testing hasn’t missed a hideously embarrassing bug? Maybe I introduced a major bug when I made that ‘cosmetic’ change at 2am?
In an ideal world I would just run a comprehensive automated regression test suite. Unfortunately it is difficult to automate graphical user interface (GUI) testing and the majority of lines of code in most applications are GUI. I estimate that the code for my own table planner software is at least 75% GUI code (not including generated code, which would push it even higher).
So I try to manually execute every line of my application before I release it. If I have to make any changes to the code, I start over again. This is very dull, but at least I have a tool to help me: Coverage Validator. Coverage Validator instruments code and shows, in real time, which lines have been executed. Click a few buttons on your application and watch the executed lines of code change colour from pink to yellow. Execute every line in the file and all the lines change colour to cyan. No recompilation or relinking is required and it doesn’t slow down the tested application too much. This real-time feedback is incredibly powerful for testing.
Unfortunately it also has a lot of shortcomings:
- The usability isn’t great. There is a confusing plethora of options for instrumenting your code that I would rather not have to know about.
- It isn’t able to ‘hook’ (instrument) all the lines of code. Whole blocks get missed out for reasons I don’t fully understand. Single line branches are particularly likely to be missed.
- The GUI isn’t great. For example, the display flashes horribly if you resize it.
- The automatic results merging is just plain weird. At the end of a session it can merge your coverage results into a previous session. This information isn’t much use to me at the end of a session. I want to merge previous results at the start of a session so I know which lines I haven’t tested.
- The GUI is quite ugly. They really need to update those tired old icons.
However being able to see line coverage information in real time is just so incredibly useful that I am prepared to put up with the many shortcomings. I just run my application alongside Coverage Validator and, file-by-file and function-by-function, I try to turn the lines of code yellow (or, better still, cyan). Every time I have used Coverage Validator I have found at least one potentially embarrassing bug that I hadn’t discovered by any other means. The support has also been responsive. It is just a pity about the flaws, without them this would be a ‘killer app’ for testing.
Coverage Validator works with C++, Delphi and VB on Windows NT4, 2000, 2003 and XP. A single licence costs $199. A free 30-day evaluation licence is available.
I am using it on Vista currently, and it seems to work fine.