When Salesforce is life!

Get Started with Salesforce Data Cleansing

Spread the love

Il’ya Dudkin is the content manager and Salesforce enthusiast at  datagroomr.com. He has more than 3 years of experience writing about Salesforce adoption, duplicate detection issues and system integrations with MuleSoft. He also works with IT outsourcing companies to facilitate the adoption of new Salesforce apps and increase user acquisition and loyalty. 


Simply getting started with cleaning up the data in Salesforce may be a daunting challenge especially for companies that have hundreds of thousands of records or even millions. It is important to know that even if duplicates are severely hindering your marketing and sales efforts, you can bring all of the issues you are having under control and improve the overall quality of the data. If you are like most organizations and feel like the data you currently have is preventing you from capitalizing on business opportunities, we have some steps that you can take today to start the process of data cleansing. 

Know Where Salesforce Falls Short

While your investment in Salesforce may be hefty, the deduplication functionality in the off-the-shelf product is fairly limited. For example, there is no way to conduct a cross-object duplicate search. This means that your new lead may already be in your contacts and vice-versa. Also, a lot of companies have custom objects beyond the standard Lead, Contacts, and Accounts and Salesforce by itself will not be able to check those for you. If you are working with large volumes of data i.e. hundreds of thousands or even millions of records, the duplicate jobs performed by Salesforce will not be enough. In fact, Salesforce itself admits this issue in the Trailblazer Community

Keep in mind that these are only some of the shortfalls of Salesforce’s built-in deduplication features. You can find more details about why the off-the-shelf product alone is not enough to catch all of the duplicates in this article. However, now that you are aware of the limitations of Salesforce in the deduping area, you will be in a better position to choose a third-party product that meets all of your needs. 

Choosing a Deduping Tool 

If you search the AppExchange for a deduping app, you will be inundated with lots of various products that all have their individual merits. However, each company has its own individual needs which narrows down the search results to just a handful of possibilities. There are a few things you need to consider when comparing products. First of all, look for something that’s easy to set up. One of the reasons that the built-in deduplication features inside Salesforce are not very effective is because they are rule-based. This means that your Salesforce admins will have to create a rule for each type of duplicate which can prove to be impossible if we think about the various shapes and forms of fuzzy duplicates. 

A much better approach would be to choose a tool that uses machine learning to catch the duplicates. This offers you several benefits. First of all, you already eliminate all of the issues and hassles of setting up rules since the algorithm will learn to identify future duplicates without explicitly programmed to do so. You are also simplifying the setup process since the product will be ready to use right away. The machine learning algorithms do the heavy lifting and all you have to do is append the field values for the master record. A lot of products also allow you to automate the duplicate checking process which is always helpful given that new duplicates appear all the time. 

Thoroughly Plan Out the Process

One of the biggest mistakes a lot of companies make is that they start thinking about the endgame right away instead of focusing on how data enters Salesforce. For example, if your users are manually entering data into Salesforce or making edits it can be very easy to make a simple typing mistake which causes all kinds of confusion. Automated data imports are not foolproof as well since a lot of time the data is incomplete and if any of the fields required by the object are missing the import will fail. Therefore you need to account for all of the duplicate data entry points and plan out how you will address all of these issues. 

In addition to planning out the technical aspects of implementing the deduplication tool, you will also take into consideration the human factor i.e. any issues the end-users will have while getting accustomed to the new product. This will also require some planning since you don’t want to make a sudden change which interrupts the workflow of your employees. Also, be sure to provide user training since it will take your employees some time to get adjusted, especially if there is a complex setup process involved. 

Set Attainable Goals

Recent data shows that somewhere between 10%-30% of the data inside a company’s CRM is duplicate data. The key metrics you should be monitoring are accuracy, consistency, and completeness. The accuracy of the data is best measured through business interaction since this provides you with real-time insights. If this is not possible, then you should use independent confirmation techniques. Pay close attention to the ratio of data to accuracy which will identify known errors. This includes missing or incomplete information that could potentially be located in a duplicate record. If all of the processes you are implementing are proving to be effective, then the ratio should increase over time. 

When we look at consistency, this refers to conflicting data. When you have duplicate records they will usually contain several versions of the truth and you have to append the entries to identify the master record and merge all of the duplicates. If you have conflicting data, you will not be able to get a complete view of your customer and you could be aligning your strategies incorrectly. This is where the completeness of the data comes in. Try thinking about all of the data scattered among duplicate records as pieces of a large puzzle that give you invaluable insights about the customer. Combing through all of the records manually or even with a rule-based application will prove to be very time-consuming if not impossible since it will not be possible to create a rule to fit each scenario. 

Constantly Collect Feedback

We mentioned the importance of monitoring some of the key metrics in your deduplication efforts, but listening to the actual people using the tool on a daily basis is just as important if not more. They could provide you with valuable insights that data may not be able to measure. For example, they could tell you that they are not trusting the tool to properly cleanse the data or that they are still spending more time than they would like fixing some of the duplicates manually and a lot of other constructive feedback. At the end of the day, you have to remember that the reason you are installing this particular app is to assist the people on the ground communicating with customers. If they are telling you that this thing just isn’t working, then this should be the most important factor in deciding to make a change. 

Don’t Postpone Deduping Your Salesforce 

While the duplicate issue may have snowballed into a big problem for many companies, they are unwilling to start tackling this problem given the magnitude and the number of resources it will require to properly deal with this problem. However, you always have to keep in mind that these duplicates are constantly draining your resources. As a general rule, keep in mind the 1-10-100 ratio. It costs $1 to verify the quality of the data you have, $10 to eliminate each duplicate, and $100 for every duplicate that is left unchanged. If you have hundreds of thousands or millions of records, such costs could really add up, which is why you should not delay deduping your Salesforce. 

Previous

Scale-up your business with Salesforce Large Data Volume Orgs and Testing

Next

Who needs so many records?

1 Comment

  1. Ian

    Great article. Also think about how you optimize the user experience so they are less likely to put in dirty data so the problem does not reoccur so quickly – clean up page layouts, add validation rules, use Quick Actions. Dirty data often due to technical debt

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Powered by WordPress & Theme by Anders Norén