My main development PC is now 5 years old and the end of life for Windows 10 is looming. I might be upgrade it to Windows 11 (there is apparently a BIOS hack if your chip doesn’t have the required TPM module), but it is quite crufty from 5 years of daily use. And it recently crashed and spent 20 minutes doing a Windows spontaneous repair, which is not very confidence inspriring. Plus the thought of a faster processor, more memory and a bigger SSD is always alluring. Time for a spanky new Windows 11 PC.
I ordered a PC to my own spec from pcspecialist.co.uk using their online configurator. I have used them a few times previously and have been suitably impressed with the service.
The new PC has:
Windows 11 Home
AMD Ryzen 9 9900X 12-Core processor
NVidia 3050 graphics card
64 GB DDR5 RAM
4 TB Samsung PRO M.2 SSD
2 x Seagate Barracuda 4TB HDDs
Corsair Gold Ultra Quiet 650W power supply
Reliability in the key issue for me, so I won’t be messing around with overclocking or other tweaks.
I’ve had a power supply blow up and take out the motherboard before, so I went for a branded power supply.
I didn’t see any real need for Windows 11Pro.
I wanted a quiet case that would sit under my desk, rather than the bling LED disco cases offered by PC Specialist. Or, even worse, a white case (god no). So I ordered a black mid size Fractal Design Define 7 case and had it delivered to them for the build.
The finished PC turned up after a couple of weeks. They have did a nice job, with some neat cabling.
Now it is just the tedious job of setting it all up. Windows offered to copy across the settings from my previous machine, but I wanted a cruft-free, clean install. So I manually installed everything from scratch:
Where possible I tried to download software direct from the manufacturers website. In a few cases where I didn’t want to pay to upgrade, and the old version wasn’t available, I used old downloads that I had kept.
I needed Microsoft Visual Studio 2019, rather than 2022, due to compatibility issues with Qt. The 2019 version is not easy to find online, but is currently still available.
The Easy Data Transform compile time has gone from 51 seconds, on the old PC, to 26 seconds, on the new PC.
An Easy Data Transform benchmark that inputs, joins, sorts and outputs a million row dataset, has gone from 14.3 seconds , on the old PC, to 10.3 seconds, on the new PC.
So a significant speed improvement.
Currently I have 3 PCs and 1 Mac, 3 monitors, 4 mice and 4 keyboards. It is a mess. I have tried a physical KVM switch in the past, but it felt very clunky. Following a tipoff from a friend, I am going to investigate www.sharemouse.com as a way to make this more manageable. Do you have a good way to manage multiple monitors, mice and keyboards? Please let me know in the comments.
Anyone who has a blog will be used to endless emails along the line of:
“Hey, I love your blog. I particularly love what you said about <last blog post title>. Please can I post some irrelevant and worthless garbage on it? All I ask in return for my auto-generated drivel, is some backlinks to a mafia-run gambling website.”
No. No. NO.
Who knows how much time I have spent over the last 19 year deleting crap like this.
But this email, which turned up today, stood out for the particularly low effort that went into it.
I wonder how many people this was emailed to? Hundreds? Thousands? Hundreds of thousands? What a waste of people’s attention and time. The most precious thing we have.
I see a future where more and more of people’s attention is diverted into dealing with low-effort, auto-generated garbage like this. An arms race where the scumbags have all the advantages.
Slow handclap for ‘Giovanni’. Your parents must be very proud.
I wanted to be able to read and write Excel file on Windows and Mac from my C++/Qt application, whether Excel is installed or not. I would rather commit suicide with a cheese grater, than try to write my own code to parse whatever horrific format Excel is written in. So I looked around for a library.
I ended up buying a licence for LibXL, from XLWare, back in 2019. It has been working great ever since. I now also use it in PerfectTablePlan v7.
Things to like:
Available as a library for Windows, Mac, Linux and iOS (I have only used it for Windows and Mac, so far).
Accessible from lots of languages, including: C, C++, .Net, Delphi, PHP, Python, PowerBASIC and Fortran.
Example code is available in C++, C, C# and Delphi.
Good support.
Regular updates.
Reasonable pricing.
No per-user fees.
The API is a little low-level for my taste, but I guess that is inevitable when you support C as well as C++. Reading and writing is slow compared to reading and writing the same data to/from a CSV file. But, no doubt, that is due to the limitations of the Excel file format.
I don’t have any affiliation with LibXL beyond being a paying customer, and I haven’t been asked to write this. I just wanted to give a shout-out to the developer, Dmytro, for his sterling work. Long may it continue.
I released Easy Data Transform v2 today. After no fewer than 80 (!) v1 production releases since 2019, this is the first paid upgrade.
Major improvements include:
Schema versioning, so you can automatically handle changes to the column structure of an input (e.g. additional or missing columns).
A new Verify transform so you can check a dataset has the expected values.
Currently there are 48 different verification checks you can make:
At least 1 non-empty value
Contains
Don’t allow listed values
Ends with
Integer except listed special value(s)
Is local file
Is local folder
Is lower case
Is sentence case
Is title case
Is upper case
Is valid EAN13
Is valid email
Is valid telephone number
Is valid UPC-A
Match column name
Matches regular expression
Maximum characters
Maximum number of columns
Maximum number of rows
Maximum value
Minimum characters
Minimum number of columns
Minimum number of rows
Minimum value
No blank values
No carriage returns
No currency
No digits
No double spaces
No duplicate column names
No duplicate values
No empty rows
No empty values
No gaps in values
No leading or trailing whitespace
No line feeds
No non-ASCII
No non-printable
No punctuation
No symbols
No Tab characters
No whitespace
Numeric except listed special value(s)
Only allow listed values
Require listed values
Starts with
Valid date in format
You can see any fails visually, with colour coding by severity:
Side-by-side comparison of dataset headers:
Side-by-side comparison of dataset data values:
Lots of extra matching options for the Lookup transform:
Allowing you to do exotic lookups such as:
Plus lots of other changes.
In v1 there were issues related to how column-related changes cascaded through the system. This was the hardest thing to get right, and it took a fairly big redesign to fix all the issues. As a bonus, you can now disconnect and reconnect nodes, and it remembers all the column-based options (within certain limits). These changes make Easy Data Transform feel much more robust to use, as you can now make lots of changes without worrying too much about breaking things further downstream.
Easy Data Transform now supports:
9 input formats (including various CSV variants, Excel, XML and JSON)
66 different data transforms (such as Join, Filter, Pivot, Sample and Lookup)
11 output formats (including various CSV variants, Excel, XML and JSON)
56 text encodings
This allows you to snap together a sequence of nodes like Lego, to very quickly transform or analyse your data. Unlike a code-based approach (such as R or Python) or a command line tool, it is extremely visual, with pretty-much instant feedback every time you make a change. Plus, no pesky syntax to remember.
Eating my own dogfood, using Easy Data Transform to create an email marketing campaign from various disparate data sources (mailing lists, licence key databases etc).
Easy Data Transform is all written in C++ with memory compression and reference counting, so it is fast and memory efficient and can handle multi-million row datasets with no problem.
While many of my competitors are transitioning to the web, Easy Data Transform remains a local tool for Windows and Mac. This has several major advantages:
Your sensitive data stays on your computer.
Less latency.
I don’t have to pay your compute and bandwidth costs, which means I can charge an affordable one-time fee for a perpetual licence.
I think privacy is only going to become ever more of a concern as rampaging AIs try to scrape every single piece of data they can find.
Usage-based fees for online data tools are no small matter. For a range of usage fee horror stories, such as enabling debug logging in a large production ETL pipeline resulting in $100k of extra costs in a week, see this Reddit post. Some of my customers have processed more than a billion rows in Easy Data Transform. Not bad for $99!
It has been a lot of hard work, but I am please with how far Easy Data Transform has come. I think Easy Data Transform is now a comprehensive, fast and robust tool for file-based data wrangling. If you have some data to wrangle, give it a try! It is only $99+tax ($40+tax if you are upgrading from v1) and there is a fully functional, 7 day free trial here:
I am very grateful to my customers, who have been a big help in providing feedback. This has improved the product no end. Many heads are better than one!
The next big step is going to be adding the ability to talk directly to databases, REST APIs and other data sources. I also hope at some point to add the ability to visualize data using graphs and charts. Watch this space!
I like hot sauces. My favourite is Ring of fire, a hot sauce that I first encountered in the US. It is a mix of habanero and serrano chillis, tomatoes, vinegar and spices. It is very tasty and (despite the name) not super hot.
However, it has become increasingly difficult and expensive to get ‘Ring of fire’ in the UK. And most of the hot sauces available in UK shops are lacking in either flavour or heat, or often both. So I decided to have a go at creating my own hot sauce. It is surprisingly easy. The basic process is:
Lacto-ferment chillis with your choice of veg and/or fruit in a brine solution for a couple of weeks at room temperature.
Chuck away most of the brine.
Add vinegar.
Blend it.
Simmer it in a pan to thicken.
Bottle.
Lacto-fermentation in brine enhances the flavour and should kill off any bad bacteria. Typically, you want about 2-3% salt to water. Ideally you should also have an airlock to vent any gases created in the fermentation. If you don’t vent the gases, there might be an explosion! You can buy fermentation jars with airlocks or just buy the airlock and drill a hole into an existing jar lid.
Simmering is optional. But I found that it improved the taste and consistency. It also kills off the fermentation. And you don’t really want it fermenting once it is bottled, as this could get messy when you open the lid.
So far I have tried 3 main recipes:
Lacto-fermented green chillis and onions, mixed with tinned tomato. This was really nice. Comparable to the ‘Ring of fire’ I was trying to emulate.
Lacto-fermented scotch bonnet chillis and pineapple, mixed with tinned mango. This was amazing. Even better than ‘Ring of fire’ in my modest opinion. Beginner’s luck, perhaps.
Lacto-fermented scotch bonnet chillis and banana. A hot, pink, textureless sludge. Ghastly. Went straight in the bin.
Here is my scotch bonnet, pineapple and mango sauce (rocket themed, of course).
Making the sauce yourself also means that you can tweak the flavour, heat and acidity to your own preferences.
A word of caution. If you are careless, you could end up with botulism, one of the deadliest toxin known to man! So make sure the fermentation vessel is airtight and everything is clean. A white yeast forming on the surface is probably ok. Anything furry forming is definitely not ok, and you need to throw it all away and start again. Note that garlic cloves can turn blue or green during lacto-fermentation and this is ok (I only found this out after I had thrown away a batch).
If your sauce has a sufficiently high salt and/or acid level it shouldn’t grow anything nasty. But you should keep it in the fridge and use within a few weeks to be on the safe side.
So far I have used shop bought chillis. But I am now growing my own chillis as well.
There are plenty of good videos on Youtube about making hot sauces and growing chillis. I recommend Chillichump (recommended to me by my friend John Moodie).
Nobody likes getting an email message telling that that the end result of all their hard work is a piece of garbage (or worse). It is a bit of a shock, when it happens the first time. One negative piece of feedback can easily offset 10 positive ones. But, hurt feelings aside, it may not be all bad.
For a start, that person actually cared enough about your product to take the time to contact you. That is not something to be taken lightly. A large number of products fail because they solve a problem that no-one cares about. Apathy is very hard to iterate on. At least you are getting some feedback. Assuming the comments aren’t completely toxic, it might be worth replying. Sometimes you can turn someone who really hates your software into a fan. Like one of those romantic comedies where an odd couple who really dislike each other end up falling in love. Indifference is much harder to work with. The people who don’t care about your product enough to communicate with you, are the dark matter of business. Non-interacting. Mysterious. Unknowable.
Negative emails may also contain a kernal of useful information, if you can look past their, sometimes less than diplomatic, phrasing. I remember having the user interface of an early version of PerfectTablePlan torn apart in a forum. Once I put my wounded pride to one side, I could see they had a point and I ended up designing a much better user interface.
In some cases the person contacting you might just be having a bad day. Their car broke down. They are going through a messy divorce. The boss shouted at them. Your product just happened to be the nearest cat they could kick. Don’t take it personally. You need a thick skin if you are to survive in business.
But sometimes there is a fundamental clash between how someone sees the world vs the model of the world embodied in your product. I once got so angry with Microsoft Project, due to this sort of clash of weltanschauung, that I came close to throwing the computer out of a window. So I understand how frustrating this can be. In this case, it is just the wrong product for them. If they have bought a licence, refund them and move on.
While polarisation is bad for society, it can good for a product. Consider a simple thought experiment. A large number of products are competing for sales in a market. Bland Co’s product is competent but unexciting. It is in everyone’s top 10, but no-one’s first choice. Exciting Co’s product is more polarizing, last choice for many, but first choice for some. Which would you rather be? Exiting Co, surely? No-one buys their second choice. Better to be selling Marmite than one of ten types of nearly identical peanut butter. So don’t be too worried about doing things that polarize opinion. For example, I think it is amusing to use a skull and crossbones icon in my seating software to show that 2 people shouldn’t be sat together. Some people have told me that they really like this. Others have told me it is ‘unprofessional’. I’m not going to change it.
Obviously we would like everyone to love our products as much as we do. But that just isn’t going to happen. You can’t please all of the people, all of the time. And, if you try, you’ll probably ending pleasing no-one. Some of the people, most of the time is probably the best you can hope for.
I’m sure we are all familiar with the idea of a technological singularity. Humans create an AI that is smart enough to create an even smarter successor. That successor then creates an even smarter successor. The process accelerates through a positive feedback loop, until we reach a technological singularity, where puny human intelligence is quickly left far behind.
Some people seem to think that Large Language Models could be the start of this process. We train the LLMs on vast corpuses of human knowledge. The LLMs then help humans create new knowledge, which is then used to train the next generation of LLMs. Singularity, here we come!
But I don’t think so. Human nature being what it is, LLMs are inevitably going to be used to churn out vast amount of low quality ‘content’ for SEO and other commercial purposes. LLM nature being what it is, a lot of this content is going to be hallucinated. In otherwords, bullshit. Given that LLMs can generate content vastly faster than humans can, we could quickly end up with an Internet that is mostly bullshit. Which will then be used to train the next generation of LLM. We will eventually reach a bullshit singularlity, where it is almost impossible to work out whether anything on the Internet is true. Enshittification at scale. Well done us.
Visual programming tools (also called ‘no-code’ or ‘low-code’) have been getting a lot of press recently. This, in turn, has generated a lot of discussion about whether visual or text based programming (coding) is ‘best’. As someone who uses text programming (C++) to create a visual programming data wrangling tool (Easy Data Transform) I have some skin in this game and have thought about it quite a bit.
At some level, everything is visual. Text is still visual (glyphs). By visual programming here I specifically mean software that allows you to program using nodes (boxes) and vertexes (arrows), laid out on a virtual canvas using drag and drop.
A famous example of this sort of drag and drop visual programming is Yahoo Pipes:
But there are many others, including my own Easy Data Transform:
Note that I’m not talking about Excel, Scratch or drag and drop GUI designers. Although some of the discussion might apply to them.
By text programming, I mean mainstream programming languages such as Python, Javascript or C++, and associated tools. Here is the QtCreator Interactive Development Environment (IDE) that I use to write C++ in, to create Easy Data Transform:
The advantages of visual programming are:
Intuitive. Humans are very visual creatures. A lot of our brain is given over to visual processing and our visual processing bandwidth is high. Look at pretty much any whiteboard, at any company, and there is a good chance you will see boxes and arrows. Even in non-techie companies.
Quicker to get started. Drag and drop tools can allow you to start solving problems in minutes.
Higher level abstractions. Which means you can work faster (assuming they are the right abstractions).
Less hidden state. The connections between nodes are shown on screen, rather than you having to build an internal model in your own memory.
Less configuration. The system components work together without modification.
No syntax to remember. Which means it is less arcane for people who aren’t experienced programmers.
Less run-time errors, because the system generally won’t let you do anything invalid. You don’t have to worry about getting function names or parameter ordering and types right.
Immediate feedback on every action. No need to compile and run.
The advantages of text programming are:
Denser representation of information.
Greater flexibility. Easier to do things like looping and recursion.
Better tooling. There is a vast ecosystem of tools for manipulating text, such as editors and version control systems.
Less lock-in. You can generally move your C++ or Python code from one IDE to another without much problem.
More opportunities for optimization. Because you have lower-level access there is more scope to optimize speed and/or memory as required.
The advantages and disadvantages of each are two sides of the same coin. A higher level of abstraction makes things simpler, but also reduces the expressiveness and flexibility. The explicit showing of connections can make things clearer, but can also increase on-screen clutter.
The typical complaints you hear online about visual programming systems are:
It makes 95% of things easy and 5% of things impossible
Visual programming systems are not as flexible. However many visual programming systems will let you drop down into text programming, when required, to implement that additional 5%.
Jokes aside, I think this hybrid approach does a lot to combine the strengths of both approaches.
It doesn’t scale to complex systems
Managing complex systems has been much improved over the years in text programming, using techniques such as hierarchy and encapsulation. But there is no reason these same techniques can’t also be applied to visual programming.
It isn’t high enough performance
The creators of a visual programming system are making a lot of design decisions for you. If you need to tune a system for high performance on a particular problem, then you probably need the low level control that text based programming allows. But with most problems you probably don’t care if it takes a few extra seconds to run, if you can do the programming in a fraction of the time. Also, a lot of visual programming systems are pretty fast. Easy Data Transform can join 2 one million row datasets on a laptop in ~5 seconds, which is faster than base R.
I’m sure we’ve all seen examples of spaghetti diagrams. But you can also create horrible spaghetti code with text programming. Also, being able to immediately see that a visual program has been sloppily constructed might serve as a useful cue.
If you are careful to layout your nodes, you can keep things manageable (ravioli, rather than spaghetti). But it starts to become tricky when you have 50+ nodes with a moderate to high degree of connectivity, especially if there is no support for hierarchy (nodes within nodes).
Automatic layout of graphs for easier comprehension (e.g. to minimize line crossings) is hard (NP-complete, in the same class of problems as the ‘travelling salesman’).
No support for versioning
It is possible to version visual programming tools if they store the information in a text based file (e.g XML). Trying to diff raw XML isn’t ideal, but some visual based programming tools do have built-in diff and merge tools.
It isn’t searchable
There is no reason why visual programming tools should not be searchable.
Too much mousing
Professional programmers love their keyboard shortcuts. But there is no reason why visual programming tools can’t also make good use of keyboard shortcuts.
Vendor lock-in
Many visual programming tools are proprietary, which means the cost can be high for switching from one to another. So, if you are going to invest time and/or money heavily in a visual programming tool, take time to make a good choice and consider how you could move away from it if you need to. If you are doing quick and dirty one-offs to solve a particular problem that you don’t need to solve again, then this doesn’t really matter.
‘No code’ just means ‘someone else’s code’
If you are using Python+Pandas or R instead of Easy Data Transform, then you are also balancing on top of an enormous pile of someone else’s code.
We are experts, we don’t need no stinkin drag and drop
If you are an experienced text programmer, then you aren’t really the target market for these tools. Easy Data Transform is aimed at the analyst or business guy trying to wrangle a motley collection of Excel and CSV files, not the professional data scientist who dreams in R or Pandas. However even a professional code jockey might find visual tools faster for some jobs.
Both visual and text programming have their places. Visual programming is excellent for exploratory work and prototyping. Text based programming is almost always a better choice for experts creating production systems where performance is important. When I want to analyse some sales data, I use Easy Data Transform. But when I work on Easy Data Transform itself, I use C++.
Text programming is more mature than visual programming. FORTRAN appeared in the 1950s. Applications with graphical user interfaces only started becoming mainstream in the 1980s. Some of the shortcomings with visual programming reflect it’s relative lack of maturity and I think we can expect to see continued improvements in the tooling associated with visual programming.
Visual programming works best in specific domains, such as:
3d graphics and animations
image processing
audio processing
game design
data wrangling
These domains tend to have:
A single, well defined data type. Such as a table of data (dataframe) for data wrangling.
Well defined abstractions. Such as join, to merge 2 tables of data using a common key column.
A relatively straightforward control flow. Typically a step-by-step pipeline, without loops, recursion or complex control flow.
Visual programming has been much less successful when applied to generic programming, where you need lots of different data types, a wide range of abstractions and potentially complex control flow.
I’ve been a professional software developer since 1987. People (mostly in marketing) have talked about replacing code and programmers with point and click tools for much of that time. That is clearly not going to happen. Text programming is the best approach for some kinds of problems and will remain so for the foreseeable future. But domain-specific visual programming can be very powerful and has a much lower barrier to entry. Visual programming empowers people to do things that might be out of their reach with text programming and might never get done if they have to wait for the IT department to do it.
So, unsurprisingly, the answer to ‘which is better?’ is very much ‘it depends’. Both have their place and neither is going away.
Winterfest 2023 is on. Loads of quality software for Mac and Windows from independent vendors, at a discount. This includes my own Easy Data Transform and Hyper Plan, which are on sale with a 25% discount.
This sounds like a question a programmer might ask after one medicinal cigarette too many. The computer science equivalent of “what is the sounds of one hand clapping?”. But it is a question I have to decide the answer to.
I am adding indexOf() and lastIndexOf() operations to the Calculate transform of my data wrangling (ETL) software (Easy Data Transform). This will allow users to find the offset of one string inside another, counting from the start or the end of the string. Easy Data Transform is written in C++ and uses the Qt QString class for strings. There are indexOf() and lastIndexOf() methods for QString, so I thought this would be an easy job to wrap that functionality. Maybe 15 minutes to program it, write a test case and document it.
Obviously it wasn’t that easy, otherwise I couldn’t be writing this blog post.
First of all, what is the index of “a” in “abc”? 0, obviously. QString( “abc” ).indexOf( “a” ) returns 0. Duh. Well only if you are a (non-Fortran) programmer. Ask a non-programmer (such as my wife) and they will say: 1, obviously. It is the first character. Duh. Excel FIND( “a”, “abc” ) returns 1.
Ok, most of my customers, aren’t programmers. I can use 1 based indexing.
But then things get more tricky.
What is the index of an empty string in “abc”? 1 maybe, using 1-based indexing or maybe empty is not a valid value to pass.
What is the index of an empty string in an empty string? Hmm. I guess the empty string does contain an empty string, but at what index? 1 maybe, using 1-based indexing, except there isn’t a first position in the string. Again, maybe empty is not a valid value to pass.
I looked at the Qt C++ QString, Javascript string and Excel FIND() function for answers. But they each give different answers and some of them aren’t even internally consistent. This is a simple comparison of the first index or last index of text v1 in text v2 in each (Excel doesn’t have an equivalent of lastIndexOf() that I am aware of):
Changing these to make the all the valid results 1-based and setting invalid results to -1, for easy comparison:
So:
Javascript disagrees with C++ QString and Excel on whether the first index of an empty string in an empty string is valid.
Javascript disagrees with C++ QString on whether the last index of an empty string in a non-empty string is the index of the last character or 1 after the last character.
C++ QString thinks the first index of an empty string in an empty string is the first character, but the last index of an empty string in an empty string is invalid.
It seems surprisingly difficult to come up with something intuitive and consistent! I think I am probably going to return an error message if either or both values are empty. This seems to me to be the only unambiguous and consistent approach.
I could return a 0 for a non-match or when one or both values are empty, but I think it is important to return different results in these 2 different cases. Also, not found and invalid feel qualitatively different to a calculated index to me, so shouldn’t be just another number. What do you think?
*** Update 14-Dec-2023 ***
I’ve been around the houses a bit more following feedback on this blog, the Easy Data Transform forum and hacker news and this what I have decided:
IndexOf() v1 in v2:
v1
v2
IndexOf(v1,v2)
1
aba
aba
1
a
a
1
a
aba
1
x
y
world
hello world
7
This is the same as Excel FIND() and differs from Javascript indexOf() (ignoring the difference in 0 or 1 based indexing) only for “”.indexOf(“”) which returns -1 in Javascript.
LastIndexOf() v1 in v2:
v1
v2
LastIndexOf(v1,v2)
1
aba
aba
4
a
a
1
a
aba
3
x
y
world
hello world
7
This differs from Javascript lastIndexOf() (ignoring difference in 0 or 1 based indexing) only for “”.indexOf(“”) which returns -1 in Javascript.
Conceptually the index is the 1-based index of the first (IndexOf) or last (LastIndexOf) position where, if the V1 is removed from the found position, it would have to be re-inserted in order to revert to V2. Thanks to layer8 on Hacker News for clarifying this.
Javascript and C++ QString return an integer and both use -1 as a placeholder value. But Easy Data Transform is returning a string (that can be interpreted as a number, depending on the transform) so we aren’t bound to using a numeric value. So I have left it blank where there is no valid result.
Now I’ve spent enough time down this rabbit hole and need to get on with something else! If you don’t like it you can always add an If with Calculate or use a Javascript transform to get the result you prefer.
*** Update 15-Dec-2023 ***
Quite a bit of debate on this topic on Hacker News.