Founder & CEO
Anytime you move data from one system to another or from one format to another, you run the risk that something will be lost or modified in the translation. This situation is a classic case of leaky abstractions. If the data is delivered as .csv files and you open them in Excel, Excel drops the leading zeros or may convert text to numbers, adding decimals. Oracle may convert NULL dates to January 1, 1900. Exporting to .csv files may result in files that are not formatted properly (particularly if you have commas, double quotes or line breaks in the data). You may lose text formatting. Your ETL tool may limit large text fields to 4,000 characters. The risk of this happening may be low, but with each layer, the risk adds up.
Ask your client (or IT group) to deliver the data as close to the original format as possible. If the source system is MS SQL Server, you want an MS SQL Server backup or access to the server directly. If it's an Oracle database, then you want the data in Oracle, and so on. If your client can't do this, explain the risks involved.
If you must move the data out of the original environment, automate them as part of the overall process and run all your code with the data starting at the originating source environment. In this way, when the project undergoes QA and UAT, you are testing that these translations did not affect overall data quality negatively.
Leaky abstractions are another reason I recommend you fix code not data and that your migrations be fully automated. (All of our data migrations here at CapGemini Invent are push-button).
This article is adapted from my book: Developing Data Migrations and Integrations with Salesforce.