What is Data Cleansing?

Data cleansing is when you identify and/or remove data you don’t want in your data set because it is:

  • Inaccurate
  • Incomplete
  • Incorrect
  • Irrelevant

What is Enrichment?

Data enrichment is when you enhance, refine, or improve raw data. It is used interchangeably with the term “data appending.”

What’s the Difference Between Data Cleansing and Data Enrichment/Data Appending?

Data cleansing is when you find out if your data is accurate and, if you so choose, correct inaccuracies and inefficiencies. Data enrichment/data appending is when you add information to your data.

A clear understanding of these categories will give you an appreciation for how a company manages its data.

What is data cleansing?

Data cleansing is the process of detecting, reporting, and correcting errors in data to prepare it for further analysis. The data cleansing process can be applied to all types of databases, including:

  • Dirty data / raw data
  • Customer data
  • First party data
  • Third party data
  • CRM data
  • Customer profiles
  • Analytics data

The most common type of data error in a database is duplicate entries. These arise when the same information is entered more than once or when information is repeated across different datasets. However, data cleansing is not limited to detecting and correcting duplicates.

Data cleansing can also identify and/or remove data you don’t want in your data set because it is:

  • Inaccurate
  • Incomplete
  • Incorrect
  • Irrelevant

You do this because if the data you have is wrong, it can cause bugs in your system, or worse, it can misinform decisions. So, you want to make sure you have clean data before you start any data analysis or enrichment.

What’s an example of data cleansing?

Data cleansing can be manual or automatic. In a manual approach, imprecise or inconsistent data is corrected by hand. Automatic approaches use computer programs to analyze data and make corrections.

Furthermore, data cleansing looks different based on the application of your data. For a company that uses data to create customer personas to run marketing simulations, data cleansing might mean identifying and removing personas that display irrational behaviors, like both wanting and not wanting to purchase a particular product.

What is data enrichment?

Data enrichment is when you enhance, refine, or improve raw data. It’s used interchangeably with the term “data appending.”

It is the process of applying additional data to existing data to enhance its utility or accuracy and improve data quality. It is generally performed after data has been cleaned.

In the context of data mining, enriching data is a way to improve the value of mined data by adding additional information, weightings, or context to the data so that data mining algorithms can make better predictions. Enrichment is often done using a variety of heuristics.

This is particularly important for Big Data and Machine Learning projects but useful in many other cases.

What’s an example of data enrichment?

Data enrichment looks different depending on the particular application of the data. In the case of a company that creates customer personas, data enrichment may mean adding data points such as date of birth, race, or gender.

Or, your data might be a list of US cities. To make it easier to work with, you can add “state” abbreviations.

In the case of AI.Reverie, data enrichment may mean introducing new elements into simulated environments. Let’s take our simulated wind turbines, for example.

Data enrichment may entail lowering the temperature of an environment in which we’ve situated our wind turbines to convey the effects of a deep freeze. This could be useful to regional planning around power outages.

What’s the difference between data cleansing and data enrichment/data appending?

Data cleansing and data enrichment are often used interchangeably. Both ensure data is accurate and relevant to your application, but there are key differences between the two.

Data cleansing is when you find out if your data is accurate and, if you so choose, correct inaccuracies and inefficiencies. Data enrichment/data appending is when you add information to your data.

It’s the difference between removing inaccurate buyer personas and adding data points to buyer personas. Or the difference between removing redundant elements and adding new ones to a simulated environment.

Wrapping Up

Data cleansing and data enrichment are foundational practices in data management. They help ensure your data is consistent, accurate, and useful to your purposes.