Founder & CEO
Most Salesforce integration specialists know to always mark data loaded to Salesforce with an external Id even if they are performing an Insert or Update as opposed to an Upsert. This makes perfectly good sense; an external Id gives you a mechanism to track back specific records in Salesforce to specific records in some other system, plus if you enforce uniqueness on it, it will prevent you from accidentally inserting a duplicate. But not many people think about marking records with a JobId that identifies exactly which data load job created the record.
I recommend you create a field for the JobId on every object being inserted or updated by your migration or integration and populate it with an Id that indicates the job that performed the action (Insert or Update). For migrations, the job Id should be a configurable value that you can set with every run. For integrations, have some mechanism of getting a JobId from the ETL tool, middleware, or scheduler you are using. If you cannot come up with one, it's very easy to code this functionality yourself.
Data integration/migration defects often impact all records inserted or updated during the job run, to fix the defects it's important to be able to easily identify all the records touched by the job run. A JobId gives you a quick and easy way to identify records that were updated during the last run. The alternative to doing this is to filter records by user and modify date, which is not very precise. (Other users could have updated records during or after the job run.)
This practice also gives you a roundabout way of performing a second migration without having to wipe out all the data first. Suppose on February 1 you received the production data to be migrated and you did so successfully, marking each record with a JobId of 2-1-2018_Migration. Then, for whatever reason, it was decided to push back the go-live date to March 1. Normally, you would have to do a full delete of all data in Salesforce, then redo the migration (because migration code generally does not perform deletions and users may have added records from the source system during the month of February). If you have JobIds, you can perform your migration, marking all records with a job ID of 3-1-2018_Migration, then go back and delete any remaining records with the 2-1-2018_Migration JobId. (If the JobId was not updated, they were not in the record set).
Every Record created by an automated batch process should a Job Id.
This article is adapted from my book: Developing Data Migrations and Integrations with Salesforce.