Easy Data Transform progress

I have been gradually improving my data wrangling tool, Easy Data Transform, putting out 70 public releases since 2019. While the product’s emphasis is on ease of use, rather than pure performance, I have been trying to make it fast as well, so it can cope with the multi-million row datasets customers like to throw at it. To see how I was doing, I did a simple benchmark of the most recent version of Easy Data Transform (v1.37.0) against several other desktop data wrangling tools. The benchmark did a read, sort, join and write of a 1 million row CSV file. I did the benchmarking on my Windows development PC and my Mac M1 laptop.

Easy Data Transform screenshot

Here is an overview of the results:

Time by task (seconds), on Windows without Power Query (smaller is better):

data wrangling/ETL benchmark Windows

I have left Excel Power Query off this graph, as it is so slow you can hardly see the other bars when it is included!

Time by task (seconds) on Mac (smaller is better):

data wrangling/ETL benchmark M1 Mac

Memory usage (MB), Windows vs Mac (smaller is better):

data wrangling/ETL benchmark memory Windows vs Mac

So Easy Data Transform is nearly as fast as it’s nearest competitor, Knime, on Windows and a fair bit faster on an M1 Mac. It is also uses a lot less memory than Knime. However we have got some way to go to catch up with the Pandas library for Python and the data.table package for R, when it comes to raw performance. Hopefully I can get nearer to their performance in time. I was forbidden from including benchmarks for Tableau Prep and Alteryx by their licensing terms, which seems unnecessarily restrictive.

Looking at just the Easy Data Transform results, it is interesting to notice that a newish Macbook Air M1 laptop is significantly faster than a desktop AMD Ryzen 7 desktop PC from a few years ago.

Windows vs Mac M1 benchmark

See the full comparison:

Comparison of data wrangling/ETL tools : R, Pandas, Knime, Power Query, Tableau Prep, Alteryx and Easy Data Transform, with benchmarks

Got some data to clean, merge, reshape or analyze? Why not download a free trial of Easy Data Transform ? No sign up required.

4 thoughts on “Easy Data Transform progress

  1. RP

    “Macbook Air M1 laptop is significantly faster than a desktop AMD Ryzen 7 desktop PC from a few years ago.” – 1) can i run exacly same benchmark on my PC? To see and share results? Can you share that 1 mln CSV? 2) EDT is single or multi-core app? If single, maybe my cheaaap i3 10th gen. can beat Mac, I’m curios.

    Reply
  2. RP

    OK, here my results on cheap (under 100eur) CPU and Sata SSD: https://pastebin.com/t5zkXFCa
    But why patrial results do not sum to “4 item(s) processed in xxxx second(s)”?
    Should I look at “4 item(s) processed in xxxx second(s)” OR sum steps:

    3.524
    1.344
    7.491
    1.751
    =14.11 total but “4 item(s) processed in 18.279 second(s)”
    ?
    This is clearly visible between 1st and 2nd run, difference almost 2.5sec total, but difference in steps are close to each: ~0.1sec + 0.1sec + 0.3sec + 0sec.

    Reply
    1. Andy Brice Post author

      >14.11 total but “4 item(s) processed in 18.279 second(s)”

      This is because there is various housekeeping besides doing the transforms (e.g. updating the GUI).

      I summed the 4 individual times. So your benchmark time is 14.11s. This is similar to my Windows time (12.71s) but quite a bit slower than my M1 mac Air (8.31s).

      >This is clearly visible between 1st and 2nd run, difference almost 2.5sec total,

      This is probably the time taken to free memory in the previous run.

      Reply

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s