Founder & CEO
For each of these cases, it's a mistake to try and fix the data manually. You will get updated data sets and will have to redo this work, and then you will have to re-QA it. Even if your users fix the data directly in the source, there is no guarantee that that new data won't be created badly (unless new validation code is put into the source system). Not to mention the high probability of human error during the production migration!
For integrations, fixing the source data and adding validation code to the source system is the best way to go, because if that system owns the data, it should also own the validation. If this can’t be done, then fix it in your transformation code. For migrations, it's probably not worth the effort to modify code that's being retired, so doing this in your transformation code is best.
There are some things that just can't be done in code, and a data clean-up project may be warranted. This can be done either in the source system before go-live or in Salesforce after.*
Where Possible, Fix code not data.
*Based on my past experiences, any plan to clean up data after go live, rarely comes to fruition
This article is adapted from my book: Developing Data Migrations and Integrations with Salesforce.