I have been gradually improving my data wrangling tool, Easy Data Transform, putting out 70 public releases since 2019. While the product’s emphasis is on ease of use, rather than pure performance, I have been trying to make it fast as well, so it can cope with the multi-million row datasets customers like to throw at it. To see how I was doing, I did a simple benchmark of the most recent version of Easy Data Transform (v1.37.0) against several other desktop data wrangling tools. The benchmark did a read, sort, join and write of a 1 million row CSV file. I did the benchmarking on my Windows development PC and my Mac M1 laptop.
Here is an overview of the results:
Time by task (seconds), on Windows without Power Query (smaller is better):
I have left Excel Power Query off this graph, as it is so slow you can hardly see the other bars when it is included!
Time by task (seconds) on Mac (smaller is better):
Memory usage (MB), Windows vs Mac (smaller is better):
So Easy Data Transform is nearly as fast as it’s nearest competitor, Knime, on Windows and a fair bit faster on an M1 Mac. It is also uses a lot less memory than Knime. However we have got some way to go to catch up with the Pandas library for Python and the data.table package for R, when it comes to raw performance. Hopefully I can get nearer to their performance in time. I was forbidden from including benchmarks for Tableau Prep and Alteryx by their licensing terms, which seems unnecessarily restrictive.
Looking at just the Easy Data Transform results, it is interesting to notice that a newish Macbook Air M1 laptop is significantly faster than a desktop AMD Ryzen 7 desktop PC from a few years ago.
See the full comparison:
Comparison of data wrangling/ETL tools : R, Pandas, Knime, Power Query, Tableau Prep, Alteryx and Easy Data Transform, with benchmarks
Got some data to clean, merge, reshape or analyze? Why not download a free trial of Easy Data Transform ? No sign up required.
“Macbook Air M1 laptop is significantly faster than a desktop AMD Ryzen 7 desktop PC from a few years ago.” – 1) can i run exacly same benchmark on my PC? To see and share results? Can you share that 1 mln CSV? 2) EDT is single or multi-core app? If single, maybe my cheaaap i3 10th gen. can beat Mac, I’m curios.
Easy Data Transform processing currently only uses a single thread.
The details and data to replicate the test is at https://www.easydatatransform.com/data_wrangling_etl_tools.html .
OK, here my results on cheap (under 100eur) CPU and Sata SSD: https://pastebin.com/t5zkXFCa
But why patrial results do not sum to “4 item(s) processed in xxxx second(s)”?
Should I look at “4 item(s) processed in xxxx second(s)” OR sum steps:
=14.11 total but “4 item(s) processed in 18.279 second(s)”
This is clearly visible between 1st and 2nd run, difference almost 2.5sec total, but difference in steps are close to each: ~0.1sec + 0.1sec + 0.3sec + 0sec.
>14.11 total but “4 item(s) processed in 18.279 second(s)”
This is because there is various housekeeping besides doing the transforms (e.g. updating the GUI).
I summed the 4 individual times. So your benchmark time is 14.11s. This is similar to my Windows time (12.71s) but quite a bit slower than my M1 mac Air (8.31s).
>This is clearly visible between 1st and 2nd run, difference almost 2.5sec total,
This is probably the time taken to free memory in the previous run.