Founder & CEO
It's no secret that CRM systems are often plagued with bad data. Recently I asked myself "Are CRM systems more susceptible to bad data then other Systems, and if so, why?". I think the answer is clearly YES. If I'm right (and I am), all this means is that when it comes to CRM systems you need a much stronger focus and resolve to keep data clean, using fairly traditional approaches (maybe I'll write about that some other time).
I do think it's important to understand why CRM systems are so susceptible to data quality issues. So, without further ado, here are what I believe are at the root of CRM data quality issues:
CRM systems rely on manual data entry, much more so then most other systems. Often there is a lot of unknowns in the data being entered, so admins are forced to be laxed in making fields required. You can't force a Salesperson to grill a lead on how many employees they have (in the company or target dept) - they need to get that info naturally, over the course of time. If you make the field required, the Salesperson will either not enter the lead, or just make up a number, both of which are worse than leaving the field blank. Once the lead is entered, Salespeople rarely go in and update trivial things such as "Number of employees" - unless it's directly related to calculating the size of the opportunity.
They do however enter lots of notes, which bring us to cause number 2.
CRM Systems capture notes everywhere, at every level - Accounts, Contacts, Opportunities, Activities, Cases, and, well you get the idea. The problem is that it's incredibly difficult to report on free text fields. In fact, it's only very recently become possible with the recent improvements of AI Technologies, just a few years back any serious reporting on free text was out of the question. Data Analysts HATE free text fields, because its essentially operational data only - they are needed to run your operation, but you can't really report on them. I once had a client who ran a large 400 person call center, she wanted to ban all free text fields from case records. How was she to improve her case "time to resolve" if she didn't have a solid record of what her 400 call center agents where dealing with on a day to day basis.
There was a time, not too long ago, before SAAS where the people in charge of a data model could not give two sugars about the presentation layer (UI). The DBA had final say on the data model and he or she was only concerned about 2 things: Data Quality and Data Read\Write performance. If you went to your DBA and asked for the data model to be changed because you want to use "Clicks not Code", you would expect to be met with ridicule.
Though it maybe true that many DBAs have a reputation for being hard headed, this is not the reason you would be met with ridicule, the reason is that your data will always outlive the system that owns it, often by decades. DBAs know this and know that the data will be around long after the system is shut down. So, they model the data in a way that reflects the needs (or natural attributes) of the data, not the delivery mechanism.
My point here is that when we are modeling our CRM systems (particularly SAAS\cloud CRM systems) we tend to model data based on the UI – hell, even the Salesforce datatypes are named after UI controls (Email; Phone, Checkbox etc.). This is something that would traditionally never be allowed, because traditionally we would never put the needs of the UI layer above the needs of the data.
There are a lot of benefits to putting the UI needs ahead of the data needs, we gain a lot in terms of making the system easy to use, build and extend. But in doing so, we are sacrificing the ability to enforce data quality to some degree. This is exactly why CRM ERDs or often so difficult to read (See the ERD below of the Salesforce Sales Objects).
One of the more serious consequences of modeling your data based on the UI needs is the resulting heavy use of circular references.
Circular references cause an issue where to have two routes get the same data and you can get a different data depending on the route you take - so you have a conflict, a data integrity issue.
For example: take a look at the ERD below, this is an ERD of the Salesforce Sales Objects, suppose I wanted to know what Account belongs to an Opportunity, how many ways can I get this info?
Maybe you can argue that these should not always have the same result (so those examples are a non-issue), and you can argue that when they should have the same result Salesforce has code to enforce the data integrity. And you may be right, but here's my point, CRM systems actively encourage these kinds of circular relationships, and when custom code is written with circular references, its rarely enforced. Add in record types and all sorts of bad things can happen if you are not careful.
This issue is not unique to Salesforce, when I am migrating data from other CRMs to Salesforce I often have a long list of data conflict resolution questions, where if the referential integrity was enforced the conflict would not exist to begin with.
Note that circular references are not necessarily bad design, they are usually done for performance reasons. But when building them people often don't put enough thought into putting the proper controls in place to enforce the referential integrity.
For fun, take a second look at the ERD, how many circular references can you find? Now check out the rest of the Salesforce ERDs!
When designing your data model, you need to add checks to maintain data integrity across circular references, not just when creating data, but when updating and deleting too.
According to Informatica, CRM data decays (goes stale) as at rate of at least 30% per year. They say it's because:
"Data decays because lives change all the time. It's not only people who change contact information. Businesses also change. They buy other companies. Or merge. Email conventions change. New offices open up. Others close down. And then there are changes set by local or national governments. Streets get renamed. Area codes change. Countries change their postal address conventions (we know, we monitor 241 of them). In your own life, you may or may not see many of these things in any given year, but the data proves that, in large populations, they happen many times a day."
CRM systems are by nature snapshot in time systems, where the "in time" part is always now. Most other systems are used to track historical data, they capture things as they happen and then log them away. You order something, the order goes to the Warehouse, they bill you, it ships, you get it. Those things never change once they happen, they are true now and will always be true. This is simply not the case for CRM data. And to make matters worse, the CRM data can change at will and no one will tell you about it. Just because I change jobs doesn't mean you know about it.
You should keep track of the last time data on a record was verified so at least you know what data is fresh (if not what is stale).
Now, before you hang me, please hear me out. The fact of the matter is that CRM systems are generally not mission critical systems. Sales, Marketing and Customer Service can shut down for periods of times that are intolerable for systems that support other areas of a business.
Don't think this is true? Consider the Following:
Not only that, but even when CRM systems are working properly they are particularly vulnerable to bad data, because CRM data is generated to benefit the users, not because of some external need. You have situation where Salespeople "get lazy" and simply don't enter data (Contact info, Activity data, Opportunities being worked). Most non-CRM systems generate data because of some need external to the user, so that external need enforces precision.
Consider the Following:
Have a question you would like to as see a part of my FAQ blog series? Email it to me: Dave@Gluon.Digital
This article is adapted from my book: Developing Data Migrations and Integrations with Salesforce.