Easy Data Transform and Hyper Plan Professional edition are both on sale for 25% off at Winterfest 2022. So now might be a good time to give them a try (both have free trials). There is also some other great products from other small vendors on sale, including Tinderbox, Scrivener and Devonthink. Some of the software is Mac only, but Easy Data Transform and Hyper Plan are available for both Mac and Windows (one license covers both OSs).
I have been gradually improving my data wrangling tool, Easy Data Transform, putting out 70 public releases since 2019. While the product’s emphasis is on ease of use, rather than pure performance, I have been trying to make it fast as well, so it can cope with the multi-million row datasets customers like to throw at it. To see how I was doing, I did a simple benchmark of the most recent version of Easy Data Transform (v1.37.0) against several other desktop data wrangling tools. The benchmark did a read, sort, join and write of a 1 million row CSV file. I did the benchmarking on my Windows development PC and my Mac M1 laptop.
Here is an overview of the results:
Time by task (seconds), on Windows without Power Query (smaller is better):
I have left Excel Power Query off this graph, as it is so slow you can hardly see the other bars when it is included!
Time by task (seconds) on Mac (smaller is better):
Memory usage (MB), Windows vs Mac (smaller is better):
So Easy Data Transform is nearly as fast as it’s nearest competitor, Knime, on Windows and a fair bit faster on an M1 Mac. It is also uses a lot less memory than Knime. However we have got some way to go to catch up with the Pandas library for Python and the data.table package for R, when it comes to raw performance. Hopefully I can get nearer to their performance in time. I was forbidden from including benchmarks for Tableau Prep and Alteryx by their licensing terms, which seems unnecessarily restrictive.
Looking at just the Easy Data Transform results, it is interesting to notice that a newish Macbook Air M1 laptop is significantly faster than a desktop AMD Ryzen 7 desktop PC from a few years ago.
Apple has transitioned Macs from Intel to ARM (M1/M2) chips. In the process it has provided an emulation layer (Rosetta2) to ensure that the new ARM Macs can still run applications created for Intel Macs. The emulation works very well, but is quoted to be some 20% slower than running native ARM binaries. That may not seem like a lot, but it is significant on processor intensive applications such as my own data wrangling software, which often processes datasets with millions of rows through complex sequences of merging, splitting, reformatting, filtering and reshaping. Also people who have just spent a small fortune on a shiny new ARM Mac can get grumpy about not having a native ARM binary to run on it. So I have been investigating moving Easy Data Transform from an Intel binary to a Universal (‘fat'[1]) binary containing both Intel and ARM binaries. This is a process familiar from moving my seating planner software for Mac from PowerPC to Intel chips some years ago. Hopefully I will have retired before the next chip change on the Mac.
My software is built on-top of the excellent Qt cross-platfom framework. Qt announced support for Mac Universal binaries in Qt 6.2 and Qt 5.15.9. I am sticking with Qt 5 for now, because it better supports multiple text encodings and because I don’t see any particular advantage to switching to Qt 6 yet. But, there is a wrinkle. Qt 5.15.3 and later are only available to Qt customers with commercial licenses. I want to use the QtCharts component in Easy Data Transform v2, and QtCharts requires a commercial license (or GPL, which is a no-go for me). I also want access to all the latest bug fixes for Qt 5. So I decided to switch from the free LGPL license and buy a commercial Qt license. Thankfully I was eligible for the Qt small business license which is currently $499 per year. The push towards commercial licensing is controversial with Qt developers, but I really appreciate Qt and all the work that goes into it, so I am happy to support the business (not enough to pay the eye-watering fee for a full enterprise license though!).
Moving from producing an Intel binary using LGPL Qt to producing a Universal binary using commercial Qt involved several major stumbling points that took me hours and a lot of googling to sort out. I’m going to spell them out here to save you that pain. You’re welcome.
The latest Qt 5 LTS releases are not available via the Qt maintenance tool if you have open source Qt installed. After you buy your commercial licence you need to delete your open source installation and all the associated license files. Here is the information I got from Qt support:
I assume that you were previously using open source version, is that correct?
Qt 5.15.10 should be available through the maintenance tool but it is required to remove the old open source installation completely and also remove the open source license files from your system.
So, first step is to remove the old Qt installation completely. Then remove the old open source licenses which might exist. Instructions for removing the license files:
****************************
Unified installer/maintenancetool/qtcreator will save all licenses (downloaded from the used Qt Account) inside the new qtlicenses.ini file. You need to remove the following files to fully reset the license information.
Windows
"C:/Users/%USERNAME%/AppData/Roaming/Qt/qtlicenses.ini"
"C:/Users/%USERNAME%/AppData/Roaming/Qt/qtaccount.ini"
Linux
"/home/$USERNAME/.local/share/Qt/qtlicenses.ini"
"/home/$USERNAME/.local/share/Qt/qtaccount.ini"
OS X
"/Users/$USERNAME/Library/Application Support/Qt/qtlicenses.ini"
"/Users/$USERNAME/Library/Application Support/Qt/qtaccount.ini"
As a side note: If the files above cannot be found $HOME/.qt-license(Linux/macOS) or %USERPROFILE%\.qt-license(Windows) file is used as a fallback. .qt-license file can be downloaded from Qt Account. https://account.qt.io/licenses
Be sure to name the Qt license file as ".qt-license" and not for example ".qt-license.txt".
***********************************************************************
After removing the old installation and the license files, please download the new online installer via your commercial Qt Account.
You can login there at:
https://login.qt.io/login
After installing Qt with commercial license, it should be able to find the Qt 5.15.10 also through the maintenance tool in addition to online installer.
Then you need to download the commercial installer from your online Qt account and reinstall all the Qt versions you need. Gigabytes of it. Time to drink some coffee. A lot of coffee.
In your .pro file you need to add:
macx {
QMAKE_APPLE_DEVICE_ARCHS = x86_64 arm64
}
Note that the above doubles the build time of your application, so you probably don’t want it set for day to day development.
You can use macdeployqt to create your deployable Universal .app but, and this is the critical step that took me hours to work out, you need to use <QtDir>/macos/bin/macdeployqt not <QtDir>/clang_64/bin/macdeployqt . Doh!
You can check the .app is Universal using the lipo command, e.g.:
I was able to use my existing practise of copying extra files (third party libraries, help etc) into the .app file and then digitally signing everything using codesign –deep [2]. Thankfully the only third party library I use apart from Qt (the excellent libXL library for Excel) is available as a Universal framework.
I did all the above on an Intel iMac using the latest Qt 5 LTS release (Qt 5.15.10) and XCode 13.4 on macOS 12. I then tested it on an ARM MacBook Air. No doubt you can also build Universal binaries on an ARM Mac.
Unsurprisingly the Universal app is substantially larger than the Intel-only version. My Easy Data Transform .dmg file (which also includes a lot of help documentation) went from ~56 MB to ~69 MB. However that is still positively anorexic compared to many bloated modern apps (looking at you Electron).
A couple of tests I did on an ARM MacBook Air showed ~16% improvement in performance. For example joining two 500,000 row x 10 column tables went from 4.5 seconds to 3.8 seconds. Obviously the performance improvement depends on the task and the system. One customer reported batch processing 3,541 JSON Files and writing the results to CSV went from 12.8 to 8.1 seconds, a 37% improvement.
[1] I’m not judging.
[2] Apparently the use of –deep is frowned on by Apple. But it works (for now anyway). Bite me, Apple.
Easy Data Transform and Hyper Plan Professional edition are both on sale for 25% off at Summerfest 2022. So now might be a good time to give them a try (both have free trials). There is also some other great products from other small vendors on sale, including Tinderbox, Scrivener and Devonthink. Some of the software is Mac only, but Easy Data Transform and Hyper Plan are available for both Mac and Windows (one license covers both). Sale ends 12th July.
Tabular data is everywhere. I support reading and writing tabular data in various formats in all 3 of my software application. It is an important part of my data transformation software. But all the tabular data formats suck. There doesn’t seem to be anything that is reasonably space efficient, simple and quick to parse and text based (not binary) so you can view and edit it with a standard editor.
Most tabular data currently gets exchanged as: CSV, Tab separated, XML, JSON or Excel. And they are all highly sub-optimal for the job.
CSV is a mess. One quote in the wrong place and the file is invalid. It is difficult to parse efficiently using multiple cores, due to the quoting (you can’t start parsing from part way through a file). Different quoting schemes are in use. You don’t know what encoding it is in. Use of separators and line endings are inconsistent (sometimes comma, sometimes semicolon). Writing a parser to handle all the different dialects is not at all trivial. Microsoft Excel and Apple Numbers don’t even agree on how to interpret some edge cases for CSV.
Tab separated is a bit better than CSV. But can’t store tabs and still has issues with line endings, encodings etc.
XML and JSON are tree structures and not suitable for efficiently storing tabular data (plus other issues).
There is Parquet. It is very efficient with it’s columnar storage and compression. But it is binary, so can’t be viewed or edited with standard tools, which is a pain.
Don’t even get me started on Excel’s proprietary, ghastly binary format.
Why can’t we have a format where:
Encoding is always UTF-8
Values stored in row major order (row 1, row2 etc)
Columns are separated by \u001F (ASCII unit separator)
Rows are separated by \u001E (ASCII record separator)
Er, that’s the entire specification.
No escaping. If you want to put \u001F or \u001E in your data – tough you can’t. Use a different format.
It would be reasonably compact, efficient to parse and easy to manually edit (Notepad++ shows the unit separator as a ‘US’ symbol). You could write a fast parser for it in minutes. Typing \u001F or \u001E in some editors might be a faff, but it is hardly a showstopper.
It could be called something like “unicode separated value” (hat tip to @fakeunicode on Twitter for the name) or “unit separated value” with file extension .usv. Maybe a different extension could used when values are stored in column major order (column1, column 2 etc).
Is there nothing like this already? Maybe there is and I just haven’t heard of it. If not, shouldn’t there be?
It has been pointed at the above will give you a single line of text in an editor, which is not great for human readability. A quick fix for this would be to make the record delimiter a \u001E character followed by an LF character. Any LF that comes immediately after an \u001E would be ignored when parsing. Any LF not immediately after an \u001E is part of the data. I don’t know about other editors, but it is easy to view and edit in Notepad++.
If you want to find out how to do something, such as do a mail merge in Word or fix a leaky valve on a radiator, where do you look first? Probably Youtube. Videos are an excellent way to explain something. More bandwidth than text and more scaleable than a 1-to-1 demo.
I’ve done explainer videos for all 3 of my products. But I found it a real struggle. I would write a script and then try to read the script and do the screencast at the same time and do it all in one take. I would stutter and stumble and it would take multiple attempts. It took ages and results were passable at best. I got some better software to edit the stumbles out, so I didn’t have to do it in one go. But it still took me a fair few attempts and quite a bit of editing. It became one of my least favourite things to do and so I did less and less of it.
Recently, I came across these slides on video by Christian Genco. These and subsequent Twitter exchanges with Christian convinced me that I should stop being a perfectionist about video and just start cranking them out on the grounds that a ‘good enough’ video is better than no video at all (‘the perfect is the enemy of the good’) and I would get better at it over time. As Stalin supposedly said “Quantity has a quality all of it’s own”.
So I have ditched the scripts and the perfectionism and I’ve managed to create 13 short Easy Data Transform explainer videos in the last week or so. And I am getting faster at it and (hopefully) a bit more polished. I’m definitely not an expert on this (and probably never will be) but here are some tips I have picked up along the way:
Get some decent software. I use Camtasia on Windows and it seems pretty good.
Camtasia
Try to talk slower.
Try to sound upbeat (not easy if you are British and could voice double for Eeyore).
Try not to move the mouse and talk at the same time. This makes editing a lot easier. Some people like to do the audio and the visual separately, but that seems like too much hassle.
If you stumble, just take a deep breath, say it again and then edit the stumble out later.
Get a reasonable mic. I have a snowball mic on a cantilevered stand. I covered it with a thin cloth to try to reduce pops.
The occasional ‘um’ is fine.
Have a checklist of things to do for each video, so you don’t forget anything (such as disabling your phrase expander software or muting the phone).
My setup. Note the high tech use of rubber bands.
I’m lucky to have a very quiet office, so I don’t have much background noise to contend with.
Using Camtasia I can easily add intos and outros, edit out stumbles and add various effects, such a mouse position highlighting and movement smoothing. I just File>Save as the previous project so that I don’t have to re-add the intro and outro. Unsurprisingly, Camtasia have lots of explainer videos. I wish there was a way to automatically ‘ripple delete’ any sections where there is no audio and no mouse movement (if there is, I haven’t found it). Some people recommend descript.com. It looks interesting, but I haven’t tried it.
I did an A/B test of recordings with my Senheiser headset mic against my Snowball mic and the consensus was that the headset was ok but the the Snowball mic sound quality was better.
Some people prefer to use synthetic voices, instead of their own voice. While these synthetic voices have improved a lot, they never sound quite right to me. Also it must be time consuming to type out all the text. Or you can pay to have a professional voiceover done, but this is surprisingly expensive (around $100 per minute, last time I checked) and almost certainly more time consuming than doing it yourself.
Some people aren’t confident about speaking on videos because they are not native speakers of that language and have an accent. Personally accents don’t bother me at all. In fact I like hearing English spoken with a foreign accent, as long as I can understand it. Also I think there is an authenticity to hearing a creator talk about their product in their own voice.
I’m not a big fan of music on explainer videos, so I don’t add any.
I let Youtube generate automatic captions for people that want them (which could be people in busy offices and on trains and planes, as well as the hearing impaired). They aren’t perfect, but they are good enough.
My videos are aimed at least as much at finding new users as helping existing users. So I make sure I research keyword terms (mostly in Google Adwords) before I decide which videos to make and what to title them. Currently I am targetting very specific keyword searches, such as How to convert CSV to Markdown. Easy Data Transform can do a lot more than just format conversion, but from an SEO point of view it is better to target the phrases that people are actually searching for.
I upload the videos as 1080P (1920 x 1080 pixels) on to the Easy Data Transform Youtube channel and onto my screencast.com account (which I pay a yearly fee for). I then embed the screencast.com videos on relevant easydatatransform.com pages using IFRAME embed codes created by screencast.com. I don’t use the Youtube videos on my website, because I don’t want people to be distracted by Youtube ads and ‘you may also like’ recommendations. They might be showing a competitor! I don’t host the videos on the website itsself as I worry that might slow down the website. I also link to the videos in screencast.com from my help documentation, as appropriate.
Some people like to embed video of themselves in screencasts, in the hope of making it more engaging. But personally I want people to concentrate on my software, rather than being distracted by the horror of my face. And not having to comb my hair or look smart was part of what got me into running my own software business.
In the next few months I will be checking my analytics to see how many views these videos get and whether they increase the time on page and reduce the bounce rate.
If you can spare a few seconds to go to my Youtube page and ‘like’ a video ot two or subscribe, that would be a big help!
Note that some of the above doesn’t apply when you are creating a demo video for your home page, rather than an explainer video. Your main demo video should be slick and polished.
Easy Data Transform and Hyper Plan Professional edition are both on sale for 25% off at Winterfest 2021. There is also some other great products from other small vendors on sale, including Tinderbox, Scrivener and Devonthink. Some of the software is Mac only, but Easy Data Transform and Hyper Plan are available for both Mac and Windows (one license covers both). Sale ends 11th January.
I started selling software online 16 years ago. Until this year I never had a forum for any of my products. I handled customer support for PerfectTablePlan and Hyper Plan by email and kept customers up-to-date with an opt-in email newsletter. But I rethought this position with my latest product, Easy Data Transform and started a forum at forum.easydatatransform.com in December 2020.
My ISP offered various forum software packages, but I really wanted Discourse, as I consider it head and shoulders above all the other forum software I have interacted with as a user (even if I find the badge system a bit patronising). I didn’t want the hassle of setting up and patching a Discourse server, so created the forum through www.communiteq.com (previously discoursehosting.com). It was suprisingly easy to set-up. And it gives the option to export everything, in case I want to part ways with them. The sheer number of options in Discourse are quite daunting, but I stuck with the defaults for the most part.
Some people use Facebook Groups for their product forums. Ugh. You have almost no control of such a forum. Facebook could even be showing ads for your competitors on your forums. Or they could just decide to shut you down and delete all the content. That is before we get on to the fact that Facebook make their money monetising hatred and abusing our privacy at an industrial scale. No thanks.
The advantages of a forum are:
Letting customers talk to each other, and post content helps to create a community around the product. Which, in turn, can add a lot of value to your product.
Customers can help each other with support questions. Sometimes they will answer before you are able to or will give a different perspective. Or even give a better answer.
If a customer asks a question that has already been asked, you can send them a link to the appropriate forum page.
It is a quick and easy channel to communicate with customers. I can post a link to a new snapshot release in a few minutes. This is much quicker than sending out an email newsletter. It is also more interactive as customers can respond on the forum and see each other’s responses.
A lively forum is ‘social proof’ that your product is worth buying.
A forum with lots of content should have a large SEO footprint.
The disadvantages of a forum are:
The time to maintain it. A forum that is broken or full of spam and unanswered questions is worse than no forum.
Disgruntled customers potentially airing their grievances in public.
The cost of the forum.
An empty forum looks bad.
Bad actors can be a pain. For example, people posting links to spam or competing products.
It probably only takes me 1-2 hours per week to post on the forum at present. Some of that is time I would have spent answering support emails. If that rises substantially then I may have to delegate it.
I try very hard to provide a good product, with good support and haven’t had any issues with negativity, so far. But I know from my experiences moderating Joel Spolsky’s Business of Software forum that moderating a busy forum can be tricky, time-consuming and emotionally draining.
The cost of the forum is currently around $20 per month, so pretty low. That may climb, but hopefully sales will be climbing as well.
I was a bit worried about whether the forum was going to look empty. I warned customers that the forum was an experiment and would be closed if there wasn’t enough activity, to manage their expectations. I also created a ‘sock puppet’ account and ‘seeded’ the forum with a few support questions that I had been previously asked by email (with the permission of those that asked) and then posted answers. But I only did this a handful of times and then the forum started to take off.
I have heard stories of people getting 1000+ spam posts a day on their forum. But I haven’t had any issues with bad actors, so far. I’m not sure how much of that is down to Discourse and how much of it is down to luck. But, no doubt issues will occur at some point.
I still have my product newsletter, which I send out every few weeks when there is a new production release.
Overall I am pretty happy with how the forum is going. Should you have a forum for your product? As always, it depends. I think you should consider it if:
Your customer base isn’t tiny.
You want to interact with your customers and get feedback. This might be less the case with mature products.
You have the time and energy to police and maintain it.
Your product is relatively open ended or complex. For example, if your product just checks whether website are up or down, there is probably a very limited amount you can discuss.
The colours used in Easy Data Transform make no difference to the output. But the colours are an important part of a user interface, especially when you using a tool for significant amounts of time. First impressions of the user interface are also important from a commercial point of view.
But colour is a very personal thing. Some people are colour-blind. Some people prefer light palettes and others dark palettes. Some people like lots of contrast and other don’t. So I am going to allow the user to fully customize the Center pane colours in Easy Data Transform.
I also want to include some standard colour schemes, to get people started. Looking around at other software it seems that the ‘modern’ trend is for pastel colours, invisible borders and subtle shadows. This looks lovely, but it is a bit low contrast for my tired old eyes. So I have tried to create a range of designs in that hope that everyone will like at least one. Below are the standard schemes I have come up with so far. They all stick with the convention pink=input, blue=transform, green=output.
Which is your favourite (click the images to enlarge).
Is there a tool that you use day to day that has particular nice colour scheme?
I hope to also add an optional dark theme for the rest of the UI in due course (Qt allowing).
Microsoft Clarity is a new service that allows you to see, in detail, how visitors are interacting with your website. It includes:
heat maps, showing where visitors are clicking or touching or how far they are scrolling
recordings of visitor sessions, including mouse movement, clicks, touches and scrolling
You just need to get a Javascript snippet from clarity.microsoft.com (for which you need a Microsoft login) and paste it into the header of each page. You can then login to clarity.microsoft.com at a later time to see your results. The service is free.
I tried it on my www.easydatatransform.com website. Here you can see a click heatmap for the buy page:
People are clicking all over non-hyperlinked text. Hmm. Perhaps they somehow couldn’t the see the effing enormous blue button next to it? Notice that numbers are starred out to avoid personal information, such as credit card numbers.
You can also see how far visitors scroll down the page with scroll heatmaps:
So I can see that the buy button is appearing well above the fold.
You can also watch recodings of people interacting with the website, showing their mouse movements, clicks, touches and scrolling. This is where things start to feel a bit stalkerish. You don’t get any identifiable information on the visitor beyond their country, browser and their operating system and I’m ok with people watching me interact with their websites like this. But it still feels a bit voyeuristic. The results are also a bit strange. Some people just click all over the place and highlight random text (touches are tracked separately from clicks). There is a distinct danger that you could watch hours of sessions and come away without much actionable information.
You can filter the information in various ways, including by country or referring website. You can even filter to see sessions with ‘Rage clicks’ (where the user has clicked or tapped repeatedly in the same area).
Watching a few sessions with ‘rage clicks’ I could see that some people indeed seem completely unable to see the effing enormous blue ‘buy now’ button on the buy page. So I have also added a text hyperlink where most people are clicking in the text and will probably try changing the button colour. Perhaps to shocking pink!
Running Pingdom Website Speed Test on the Easy Data Transform home page, both with and without Clarity a few minutes apart, I can see that it had some effect on speed, but not too much.
Without Clarity script: 175.2kb of scripts, 7 script requests.
With Clarity script: 194.3kb of scripts, 9 script requests.
The load time was actually 0.1s faster with Clarity. That is probably just an anomaly.
I have disabled Clarity for now. But may reenable it after I have made some changes to the website, to see the effect of the changes. Overall I was quite impressed with the service and it was surprisingly easy to set-up. But the cynic in me does wonder what exactly Microsoft is getting out of it.