Gluon Digital

Offical carrier of the Strongforce

Why I absolutely HATE the Apex Data Loader.

Why I absolutely HATE the Apex Data Loader.
David Masri         Author:
David Masri
Founder & CEO

I really hate the Salesforce's Apex Data Loader, I really do. Not because it's a bad app, it's quite a good one (at least as far as utilities go), the reason I hate it is simply because it exists. Let me explain, but first a bit of background.

I recently authored an article on Salesforce Ben titled: Choosing the Right ETL Tool or Middleware for Your Salesforce Data Migration or Integration , in that article I noted:

"The APEX data loader is almost never the right tool for the job because it doesn't have any data transformation capabilities and is relatively difficult to automate."

In my book,(Developing Data Migrations and Integrations with Salesforce. ), I introduce the data loader as follows:

"The Apex Data Loader is a free utility provided by Salesforce. It's great for testing the SOAP and Bulk REST APIs under various configurations, as well as doing one-off updates and exports. I do not recommend it for a full-blown data migration or integration because of its lack of features as an ETL tool, not because of any limitations in working with the Salesforce APIs."

When discussing Best Practices (Chapter 6 of my book), Best Practice #20 is "Use Proper Tools" here is an excerpt:

Best Practice 20: Use Proper Tools

When choosing your tool set, make sure it meets all your needs from a feature point of view (for example, supports the Merge API call if needed), and can help you achieve all six attributes discussed in Chapters 4 and 5.

I want to stress (again) that the Apex Data Loader is not the right tool for any serious migration or integration work. It has no capability for data transformations, and although you can use it to connect to Java Database Connectivity (JDBC) data sources, doing so is tedious and error prone. In addition, all the success and error data can only be outputed to .csv files. Using .csv files with Excel as a data transformation tool is also not a good idea. Excel does not produce clean .csv files.*

Excel does not support relational joins, and vertical lookups (VLookup) are slow. Excel simply stops responding when dealing with large files. Most importantly, using Excel reinforces bad habits of transforming data manually rather than automating it. I am convinced that the reason this combination of Excel and .csv files is so common for data migrations is because people tend to know of the Apex Data Loader and instinctively want to use it. (It’s free! And data migrations are one-time run code!**) Then, when they want to edit the .csv files, they double-click and it opens in Excel. Don’t fall into this trap.

*I'm not saying you can't code a migration or integration using the Apex Data Loader that meets all the attributes I listed in Chapters 4 and 5. You absolutely can, and I have. You can use the command line to automate it fully, and some scripting language for all transformation code. I'm just saying it's not the right tool for the job. [Again, you can see my article on on choosing the right tool]

[**Data Migrations are NOT one time run, and thinking that it is the biggest mistakes you can make when migrating data to Salesforce.]

So, you can see from my previous statements that I'm not a fan, but why do I hate it? Because it encourages an error prone manual process. It encourages the violation of every one of the "six attributes of a good data migration" I outline in my book. Every single time I have been brought in to clean up a data migration mess it has been an Apex data loader migration. To stress how bad Data Loader data migrations are, I recently diagrammed out what the data migration process using the Apex Data Loader looks like, here it is:

The data migration process using the Apex Data Loader

This is not a joke! That is the actual process the vast majority of data migrations to Salesforce looks like! It's insane! Yet, people wonder why many Salesforce projects fail (or at least struggle to go live) simply because of the data migration process.

But here is the worse part of that process, and why I hate it so much: because it works for small data migrations, and because It works for small migrations –people keep doing it, and begin to scale up, as you scale up it begins to fail, and then it fails catastrophically.

Here are a just few of the problems that arise with this nut-so process:

  • Excel begins to crawl - and crash repeatedly
  • Excel Drops all the leading 0's from your Zip Codes
  • You run in to needed transformations that are too complex, so you tell your Client "We can't do that"
  • It's Incredibly time consuming, you need to take detailed notes on the 45 transformations you need to make on each file, and you wind up with an 8-page migration checklist
  • Every time your client generates the CSV files, they come out different
  • Your client can't deliver clean CSVs (There are quotes and line breaks where there shouldn't be, text that is not qualified, and when it is, quotes are not properly escaped)
  • Every round of QA requires that you manually redo the transformations manually
  • You end up with an 8 page "Migration Run Book" - and somehow every time you run the same migration new defects are found or defects that were previously fixed are back
  • All your rounds of QA\UAT are basically worthless because there is no code to test, you are testing a human manual process.
  • Defects are found months after go-live and there's not much you can do about it. (Many of these defects are defects that where reported before, and they were fixed, but somehow the fix never made it to Salesforce during the last manual run)
  • Minor Defects take hours to fix and often Require a ton of data analysis

I really wish people would exclusively use the APEX data loader for its intended purpose, that is one off data extractions, and simple loads as part of basic administration (Importing leads or campaign members in bulk). But as long as it exists, people will be using it to perform data migrations so complex they are doomed to fail. And that's why I hate it.

Needless to say, some people just never learn and will insist on using it anyway. They will argue with me that just because some people abuse the Data Loader, that's not the Data Loaders' fault! Here is my response from the FAQ section of my book (Chapter 13):

Why can't I use the Data Loader? I love the Apex data loader. Look, you absolutely can build data migrations and even data integrations using the Apex Data Loader while following every single best practice in this book. It's just simply not designed to be used that way and is an error-prone process. But if you insist, I recommend that you:

  • Learn the Data Loader command line and automate it fully using Windows Script (or batch) files
  • Learn how to connect the Data Loader to an ODBC source as opposed to .csv files
  • Use some scripting language to manipulate data, which can also be called from Windows Script files and be fully automated
  • Note that even though you can connect the Data Loader to an ODBC source, the success and error files can only be outputted to a .csv file. Your scripts should be able to pick these up and process them.
  • I want to stress that the difficulties you face in automating the Apex Data Loader, particularly in automating your data transformations, are not an excuse to introduce manual steps.

    I get that the Apex Data Loader is a necessary evil, but I beg you, please stop using the Apex Data Loader for your data migrations.

    This article is adapted from my book: Developing Data Migrations and Integrations with Salesforce.