Tag Archives: software

Why you can’t parse CSV with a regular expression

Regular expressions are a very useful tool in a programmer’s toolbox. But they can’t do everything. And one of the things they can’t do is to reliably parse CSV (comma separated value) files. This is because a regular expression doesn’t store state. You need a state machine (or something equivalent) to parse a CSV file.

For example, consider this (very short) CSV file (3 double quotes + 1 comma + 3 double quotes):

“””,”””

This is correctly interpreted as:

quote to start the data value + escaped quote + comma + escaped quote + quote to end the data value

E.g. a single value of:

“,”

How each character is interpreteted depends on what characters come before and after it. E.g. the first quote puts you into an ‘inside data’ state. The second quote puts you into a ‘might be an escaped for the following character or might be end of data’ state. The third quote puts you back into a ‘inside data’ state.

No matter how complicated a regex you come up with, it will always be possible to create a CSV file that your regex can’t correctly parse. And once the parsing goes wrong, everything after that point is probably garbage.

You can write a regex that can handle CSV file where you are guaranteed there are no commas, quotes or carriage returns in the data values. But commas, quotes or carriage returns in the data values are perfectly valid in CSV files. So it is only ever going to handle a subset of all the possible well-formed CSV files.

Note that you can parse a TSV (tab separated value) file with a regex, as TSV files are (generally!) not allowed to contain tabs or carriage returns in data and therefore don’t need escaping.

See also on Stackoverflow:

Using regular expressions to parse HTML: why not?

Adventures in content marketing

Back in 2011 I created eventcountdown.com. It had a snazzy downloadable, PerfectTablePlan-branded countdown clock for Windows and web-based countdown clock with ads for PerfectTablePlan. Both free. The idea was people searching for countdown clocks for events (such as their wedding) would find the site via Google, find out about PerfectTablePlan and a certain percentage would then buy my event seating planner software.

I paid other people to create the Windows and web versions of the countdown clock. The web-based clock was updated from time to time to add pre-built countdowns for events like superbowl, the olympics, christmas, thanksgiving etc. And I fielded the occasional support emails related to the Windows countdown clock.

This is the total traffic to the site from 2011 to 2023:

The peaks are mostly due to superbowl. The site got 38k hits in a single day just before superbowl 2019! The free Windows countdown clock also drew quite a lot of traffic. In total the site got some 1.7 million page views over 12 years. Only a small percentage of these visitors clicked through to PerfectTablePlan.com, but still a useful number. Perhaps some people were also prompted to investigate PerfectTablePlan by the branding on the downloadable clock. The site might have also had some SEO benefits for PerfectTablePlan.com. Who knows.

The eventcountdown.com website is now gone (the domain redirects to PerfectTablePlan.com). It didn’t seem worth the effort to keep adding events to the web countdown clock with the traffic now so low. Also both the website and windows clock were looking dated. But I think it was a worthwhile investment of my time and money.

I have also created various other contents pages and mini-sites over the years: articles on table planning, font collections, free clipart, place card templates etc. You can see similar trajectories for some of those.

The traffic seems to reach a peak after 3-7 years and then slowly decay away. Although I have shown them with the same vertical scale here, some generated a lot more traffic than others.

I did some basic on-page SEO for these content pages. For example, looking at Adwords keyword data to choose the page title and H1. But nothing beyond that. No paid promotion or backlink building campaigns.

I tried paying people to write articles related to events. But none of these ever generated any worthwhile traffic. Google could somehow smell the insincerity.

For my data cleaning software product I have been concentrating on ‘how to’ pages and supporting videos aimed at specific topics. These are intended to both help existing customers and to attract new traffic. For example, how to clean data. I have also been posting these videos on the Easy Data Transform YouTube channel. The numbers of hits monthly on the Youtube videos are relatively low, but they are quite targeted and hopefully will be generating traffic for years to come.

So content marketing take-aways based on my experience are:

  • Free content can be a useful way to bring free traffic to your website.
  • The amount of traffic you get is quite hit and miss. Some content has generated a lot more traffic than expected, some a lot less.
  • The content needs to be well targeted if you want to have any chance of converting it to sales.
  • Google will grow bored of it eventually. You might be able to increase the longevity by updating the content. I’ve not been very diligent with this, but even neglected content pages can generate useful traffic over 10+ year lifespan.

Renewing my authenticode digital certificate

The authenticode digital certificate I bought back in 2019 expired recently, so I had to get a new certificate (you can’t renew a certificate, as such, you just need to buy a new one). A few months before the expiry I emailed KSoftware.net, who I had bought previous digital certificates from and with whom I had always had a good experience in the past. No reply. I tried a couple more times, including the personal email of Mitchell, the founder. Nothing. Someone else told me they had had similar experiences. Their recent trustpilot ratings are a horror show. And the copyright date on their website is ‘2003 – 2021’. But they were still advertising on Google Adwords. I have no idea what is happening here. If you are reading this Mitchell, I hope you are ok.

With KSoftware out of the picture I looked elsewhere. Eventually I ended up buying a new Sectigo certificate from signmycode.com. I partly chose them because they offered a 5 year certficate and the less often I have to go through the ball ache of a getting a new certificate, the better. The experience was decidely mixed.

The good:

  • The prices seem reasonable, compared to other options.
  • Support was responsive. English didn’t appear to be their first language, but it was good enough.
  • I got my new certificate within a few days and have had no issues with it so far. The change in certificate seems to be set off a few customer’s anti-virus software, but that was to be expected.

The mediocre:

  • The online guidance and documentation on the process was mediocre, at best.
  • I was a bit confused about whether I had to click ‘Buy now’ or ‘Renew now’. It seems this is more marketing/SEO purposes and it doesn’t matter which you click.
  • I had to send them a photo of me holding a government ID. This felt pretty uncomfortable, but might be something mandated by the certificate companies.

The bad:

  • After I got my certificate I checked the expiry date and it was only 3 years. When I queried this I was rold that the ‘5 year certificate’ I thought I had bought is not a 5 year certificate. It is a 3 year certificate, then I have to apply for a new pre-paid 2 year certificate in 3 years time.

This is what you see when you click on ‘Buy now’:

When you see this, wouldn’t you expect to get a single 5 year certificate? If there was anything explaining that this was 2 separate certificates, I didn’t notice it. It certainly didn’t mention it on their home page. This feels deceptive to me.

Who knows if this company will still be there in 3 years time? I emailed them and told them I wanted to keep the new 3 year certificate and for them to refund the 2 year certificate. They said they would only refund the entire order and then I would have to start the whole process all over again. They also claimed:

“renewal validation is much more easy then buying a new certificate as most of the validation part is getting carry forward.”

We’ll see. Buyer beware.

See also: The great digital certificate ripoff?

** Update 08-Mar-2023 **

Michell of KSoftware has contact me to say that he is alive and kicking. Read the comments below for more details.

Winterfest 2022

Easy Data Transform and Hyper Plan Professional edition are both on sale for 25% off at Winterfest 2022. So now might be a good time to give them a try (both have free trials). There is also some other great products from other small vendors on sale, including Tinderbox, Scrivener and Devonthink. Some of the software is Mac only, but Easy Data Transform and Hyper Plan are available for both Mac and Windows (one license covers both OSs).

12 rules for software business happiness

Here are a few rules for happiness that I have learned (often the hard way) running a solo software business since 2005

Make sure your important stuff is backed-up automatically

Any sort of manual back-up is going to get forgotten. Back-up to more than one place, at least one of which is offsite.

Stay away from the bleeding edge

Stick with tried and trusty tools and technologies, where you can. JQuery will probably be here in another 10 years, but the latest and greatest Javascript framework might not.

Use good suppliers

You need your hosting company, payment processor and other critical suppliers to be rock solid. Think twice about going with a supplier just because they are cheap. Changing suppliers can be a pain, so ask around before trying a supplier.

Use version control for everything important

It matters less which version control system it is. Periodically making a copy of your source folder is not a version control system!

Don’t promise ship dates

Developers are notoriously bad at predicting dates. If you promise a date and get it wrong (and you will) then you either have to miss the date or cut corners. Neither is good.

Never send an email you might later regret

If you are starting to feel angry writing an email, then stop writing. Come back to it later. Or maybe write it, feel a bit better, then delete it without sending.

Write documentation as you go

Few people enjoy writing documentation. But if you leave all the documentation until you have finished programming, then you are likely to rush it and forget stuff.

Have a checklist

Automate where you can. Have checklists for everything else. Keep updating your checklists.

Get someone else to proof read everything

Typos are embarrassing, but it is impossible to proof read your own stuff. So get someone else to proof read any stuff that customers see: web pages, newsletters, documentation etc.

Never release changes just before going on holiday

You don’t want to have to be fire-fighting a new bug when you should be on the beach with your family/friends.

Don’t try to do everything yourself

You could spend weeks learning about taxes, web hosting, CSS or any number of other topics that aren’t central to your business. But why bother? Pay someone who already know this stuff.

Embrace imperfection

If you wait for perfection, then you are never going to ship anything. Just make sure each release is better than the last. Good enough is good enough.

Summerfest 2022

Easy Data Transform and Hyper Plan Professional edition are both on sale for 25% off at Summerfest 2022. So now might be a good time to give them a try (both have free trials). There is also some other great products from other small vendors on sale, including Tinderbox, Scrivener and Devonthink. Some of the software is Mac only, but Easy Data Transform and Hyper Plan are available for both Mac and Windows (one license covers both). Sale ends 12th July.

Why isn’t there a decent file format for tabular data?

Tabular data is everywhere. I support reading and writing tabular data in various formats in all 3 of my software application. It is an important part of my data transformation software. But all the tabular data formats suck. There doesn’t seem to be anything that is reasonably space efficient, simple and quick to parse and text based (not binary) so you can view and edit it with a standard editor.

Most tabular data currently gets exchanged as: CSV, Tab separated, XML, JSON or Excel. And they are all highly sub-optimal for the job.

CSV is a mess. One quote in the wrong place and the file is invalid. It is difficult to parse efficiently using multiple cores, due to the quoting (you can’t start parsing from part way through a file). Different quoting schemes are in use. You don’t know what encoding it is in. Use of separators and line endings are inconsistent (sometimes comma, sometimes semicolon). Writing a parser to handle all the different dialects is not at all trivial. Microsoft Excel and Apple Numbers don’t even agree on how to interpret some edge cases for CSV.

Tab separated is a bit better than CSV. But can’t store tabs and still has issues with line endings, encodings etc.

XML and JSON are tree structures and not suitable for efficiently storing tabular data (plus other issues).

There is Parquet. It is very efficient with it’s columnar storage and compression. But it is binary, so can’t be viewed or edited with standard tools, which is a pain.

Don’t even get me started on Excel’s proprietary, ghastly binary format.

Why can’t we have a format where:

  • Encoding is always UTF-8
  • Values stored in row major order (row 1, row2 etc)
  • Columns are separated by \u001F (ASCII unit separator)
  • Rows are separated by \u001E (ASCII record separator)
  • Er, that’s the entire specification.

No escaping. If you want to put \u001F or \u001E in your data – tough you can’t. Use a different format.

It would be reasonably compact, efficient to parse and easy to manually edit (Notepad++ shows the unit separator as a ‘US’ symbol). You could write a fast parser for it in minutes. Typing \u001F or \u001E in some editors might be a faff, but it is hardly a showstopper.

It could be called something like “unicode separated value” (hat tip to @fakeunicode on Twitter for the name) or “unit separated value” with file extension .usv. Maybe a different extension could used when values are stored in column major order (column1, column 2 etc).

Is there nothing like this already? Maybe there is and I just haven’t heard of it. If not, shouldn’t there be?

And yes I am aware of the relevant XKCD cartoon ( https://xkcd.com/927/ ).

** Edit 4-May-2022 **

“Javascript” -> “JSON” in para 5.

It has been pointed at the above will give you a single line of text in an editor, which is not great for human readability. A quick fix for this would be to make the record delimiter a \u001E character followed by an LF character. Any LF that comes immediately after an \u001E would be ignored when parsing. Any LF not immediately after an \u001E is part of the data. I don’t know about other editors, but it is easy to view and edit in Notepad++.

Positioning Software in a Crowded Market

This is a guest post from serial software entrepreneur Dennis Gurock.

Thinking about product positioning (and matching branding) is especially important if you build a product for a crowded market with many established competitors (and there are many reasons why this can be a good idea). We were in exactly this situation when we initially thought about building and marketing our new test management tool.

Positioning will allow you to better focus on a specific market segment to target, it makes it easier to build a clearer and stronger message to reach customers, and it helps develop the initial product vision and feature set.

What does successful positioning mean for software products? It can mean identifying a unique angle to focus on so you can stand out with your product among other products and competitors. Especially if you are entering a crowded market, this allows you to better communicate the key benefits and features you have to offer. It will help you reach the right customers and ensures that customers remember you when they look for a new product to try.

To come up with positioning for your new product, you can focus on a specific customer segment or niche that you think will be easier to market to or that you think is underserved by existing offerings. It can also help you limit the initial product scope, so you can go to market faster. Then rigorously optimizing for this initial customer segment allows you to establish a market presence and expand to other segments more easily later.

Why is positioning useful?

There are many benefits of coming up with and deciding on positioning for your new software product early on. Once you decide on the positioning, many marketing, product management and sales decisions become more straightforward.

  • Clear message & benefits: it is not easy to stand out in a crowded market. Positioning allows you to come up with clear messaging so you can explain and highlight unique selling points in few words.
  • Target and identify niche/marketing opportunities: it can be difficult to decide which marketing options to try, which campaigns to book and which niches to target. Focusing on a specific market segment based on the product positioning can be a great way to identify matching niches and opportunities.
  • Identify customer fit during sales: one of the most important aspects of the sales process is identifying and ensuring prospects are actually a great fit for your product. It’s wasted time for both you and for your prospects to invest a lot of effort evaluating and piloting a product if they will not benefit from it. Positioning can help you quickly filter and identify which customers to focus on.
  • Better focus on initial product vision: there are a lot of directions to choose when building a new product. If you don’t have a clear vision to guide you, it is easy to be distracted by different directions and work on too many things at the same time. Clear positioning makes it easy to focus product management on specific goals and use cases.
  • Easier to choose features: when you start working with customers, you will (hopefully) receive a lot of feedback on features you should add. Positioning helps you decide which of these features you should actually implement. Often times the most successful products are developed by following strong opinions and saying ‘No’ to many requests.

Examples of software product positioning

Let’s look at a few examples of companies that use positioning to market and build their products. All these examples are from industries and product categories with many existing competitors and products.

  • Testmo: we entered a crowded market with many established testing tools when we developed our new product. Most existing offerings either focus on manual testing, or they offer a complete ALM toolset to handle the entire development lifecycle. With Testmo we had other ideas and wanted to position it differently, focusing on unified testing. This means we combine test cases, automation and exploratory testing in a single platform. At the same time it allows us to limit the scope of the product. We won’t add our own issue tracking, or CI pipelines, or existing DevOps features. Instead of we focus on integrating with other tools customers already use.
  • Another example is the documentation and wiki product GitBook. They heavily focus on software developers and position themselves as the primary tool for developers to publish user docs and to document internal knowledge. With this positioning in mind, they can focus on features that primarily make sense for developers, such as Git synchronization, Markdown support and code snippets. It also allows them to more easily market directly to software developers with a clear message.
  • Then there’s the application monitoring service Checkly. There are many services and products that enable you to monitor apps and sites for downtime and notify you about issues. Checkly positions itself as a tool that enables end-to-end monitoring with flexible scripting. So it doesn’t just make simple web requests to see if a site is still live. It allows customers to write custom scripts to implement complex user flows and thereby not just check if a site is reachable, but also test the entire stack with the front-end, database, authentication and much more. This focus allows them to build more targeted features for advanced use cases and thereby provides more value to customers compared to simpler competitors.
  • The popular email marketing service Campaign Monitor also started with very focused positioning. In the first few years they concentrated on providing the best possible campaign tool for web designers and design agencies. This focus allowed them to invest more in features designers needed, such as white labeling, reusable themes and live email previews. Once they established their market presence, they started to expand their customer base to capture a larger part of the overall market for newsletter tools.

These are just some examples of companies and products that have benefited from clear positioning. Of course there are also countless of examples of companies choosing not to have such clear positioning. There is nothing wrong with this and you can certainly be very successful even if you ignore these points. But more often than not positioning is a useful tool to improve focus on specific goals and customer needs, which increases your chance to build a successful software business.

Dennis Gurock is one of the founders of Testmo, a QA testing tool that unifies test case management with exploratory testing and test automation in one platform. He has been working on products that help teams improve software quality for more than 15 years.

Making explainer videos for your software

If you want to find out how to do something, such as do a mail merge in Word or fix a leaky valve on a radiator, where do you look first? Probably Youtube. Videos are an excellent way to explain something. More bandwidth than text and more scaleable than a 1-to-1 demo.

I’ve done explainer videos for all 3 of my products. But I found it a real struggle. I would write a script and then try to read the script and do the screencast at the same time and do it all in one take. I would stutter and stumble and it would take multiple attempts. It took ages and results were passable at best. I got some better software to edit the stumbles out, so I didn’t have to do it in one go. But it still took me a fair few attempts and quite a bit of editing. It became one of my least favourite things to do and so I did less and less of it.

Recently, I came across these slides on video by Christian Genco. These and subsequent Twitter exchanges with Christian convinced me that I should stop being a perfectionist about video and just start cranking them out on the grounds that a ‘good enough’ video is better than no video at all (‘the perfect is the enemy of the good’) and I would get better at it over time. As Stalin supposedly said “Quantity has a quality all of it’s own”.

So I have ditched the scripts and the perfectionism and I’ve managed to create 13 short Easy Data Transform explainer videos in the last week or so. And I am getting faster at it and (hopefully) a bit more polished. I’m definitely not an expert on this (and probably never will be) but here are some tips I have picked up along the way:

  • Get some decent software. I use Camtasia on Windows and it seems pretty good.
Camtasia
  • Try to talk slower.
  • Try to sound upbeat (not easy if you are British and could voice double for Eeyore).
  • Try not to move the mouse and talk at the same time. This makes editing a lot easier. Some people like to do the audio and the visual separately, but that seems like too much hassle.
  • If you stumble, just take a deep breath, say it again and then edit the stumble out later.
  • Get a reasonable mic. I have a snowball mic on a cantilevered stand. I covered it with a thin cloth to try to reduce pops.
  • The occasional ‘um’ is fine.
  • Have a checklist of things to do for each video, so you don’t forget anything (such as disabling your phrase expander software or muting the phone).
My setup. Note the high tech use of rubber bands.

I’m lucky to have a very quiet office, so I don’t have much background noise to contend with.

Using Camtasia I can easily add intos and outros, edit out stumbles and add various effects, such a mouse position highlighting and movement smoothing. I just File>Save as the previous project so that I don’t have to re-add the intro and outro. Unsurprisingly, Camtasia have lots of explainer videos. I wish there was a way to automatically ‘ripple delete’ any sections where there is no audio and no mouse movement (if there is, I haven’t found it). Some people recommend descript.com. It looks interesting, but I haven’t tried it.

I did an A/B test of recordings with my Senheiser headset mic against my Snowball mic and the consensus was that the headset was ok but the the Snowball mic sound quality was better.

Some people prefer to use synthetic voices, instead of their own voice. While these synthetic voices have improved a lot, they never sound quite right to me. Also it must be time consuming to type out all the text. Or you can pay to have a professional voiceover done, but this is surprisingly expensive (around $100 per minute, last time I checked) and almost certainly more time consuming than doing it yourself.

Some people aren’t confident about speaking on videos because they are not native speakers of that language and have an accent. Personally accents don’t bother me at all. In fact I like hearing English spoken with a foreign accent, as long as I can understand it. Also I think there is an authenticity to hearing a creator talk about their product in their own voice.

I’m not a big fan of music on explainer videos, so I don’t add any.

I let Youtube generate automatic captions for people that want them (which could be people in busy offices and on trains and planes, as well as the hearing impaired). They aren’t perfect, but they are good enough.

My videos are aimed at least as much at finding new users as helping existing users. So I make sure I research keyword terms (mostly in Google Adwords) before I decide which videos to make and what to title them. Currently I am targetting very specific keyword searches, such as How to convert CSV to Markdown. Easy Data Transform can do a lot more than just format conversion, but from an SEO point of view it is better to target the phrases that people are actually searching for.

I upload the videos as 1080P (1920 x 1080 pixels) on to the Easy Data Transform Youtube channel and onto my screencast.com account (which I pay a yearly fee for). I then embed the screencast.com videos on relevant easydatatransform.com pages using IFRAME embed codes created by screencast.com. I don’t use the Youtube videos on my website, because I don’t want people to be distracted by Youtube ads and ‘you may also like’ recommendations. They might be showing a competitor! I don’t host the videos on the website itsself as I worry that might slow down the website. I also link to the videos in screencast.com from my help documentation, as appropriate.

Some people like to embed video of themselves in screencasts, in the hope of making it more engaging. But personally I want people to concentrate on my software, rather than being distracted by the horror of my face. And not having to comb my hair or look smart was part of what got me into running my own software business.

In the next few months I will be checking my analytics to see how many views these videos get and whether they increase the time on page and reduce the bounce rate.

If you can spare a few seconds to go to my Youtube page and ‘like’ a video ot two or subscribe, that would be a big help!

Note that some of the above doesn’t apply when you are creating a demo video for your home page, rather than an explainer video. Your main demo video should be slick and polished.