Poor Data – Don’t Just Treat Symptoms, Treat The Cause.
July 2, 2011 1 Comment
The Data Warehousing Institute estimates that data quality problems currently cost U.S. businesses over $600 billion annually. Even with these figures to guide us, it is still very difficult to use metrics to determine the cost of poor data quality and its effects on your organization. This is because making the mistake may be too distant from recognizing the mistake. Errors are very hard to repair, especially when systems extend far across the enterprise, and the final impact is very unpredictable.
Have you ever considered how much time and resources your organization spends on correcting, fixing and analyzing corrupted or erroneous data? What about the cost of delayed information exchange or lost revenue due to misplaced data or incorrect input? Evaluating data and determining errors is a time consuming process, not to mention the time needed to correct them. In a time of decreased budgets, some organizations may not have the resources for such projects and may not even be aware of the problem. Others may be spending all their time fixing problems leaving no time to work on preventing them.
According to several leading data quality managers, the cost of poor data quality may be expressed as simple formula that equates into:
Cost of Poor Data Quality = Lost Business Value + Cost to Prevent Errors + Cost to Correct Errors + Cost of Validation
Loss of Business Value can be HUGE and can lead to business interruptions as well. Let’s use an example to illustrate the cost of fixing an element of poor data.
- A staff person spends about 40% of their time each day on this task
- There are five people performing this operation (5 x 3.2 hours = 16 staff hours per day.)
- Accounting tells you that these people earn $45 per hour (payroll + benefits.)
- Total annual hours of cleanup is 4000 hours annually (16 staff hours x 250 annual working days.)This means the annualized cost to fix the known poor data is $180,000.
This cost of the poor data quality extends far beyond the cost to fix it. It spreads through and across the business enterprise like a virus affecting systems from shipping and receiving to accounting and customer service. Eventually, your customers may lose patience with you, and you may lose their business.
Let’s look at traditional approach to cleanse the data when ever data quality issue is recognized in a business. The traditional approach to correct bad data fixes the bad data that’s already been created with data quality or ETL tools. This generally happens whenever there is an urgent need to fix the bad data either because of needs arisen from a data migration effort or a business problem.
This approach suffers from three problems:
First, data cleansing and repository building are almost always carried out on a project by project basis. Even if the project is successful, and bad data is transformed to good data, the repository starts to degrade in absence of any ongoing data quality sustenance measures. More and more newly created bad data will creep into the system. And the data already cleansed start getting stale. Data has a shelf-life and needs constant care and feeding. Without addressing how bad data is created, these solutions are costly and unsustainable.
Second, it’s difficult to get the business side fully committed to and involved in these projects. Without a change of mindset, data continues to be seen as IT’s responsibility. And to exacerbate the problem, the software tools used were meant for an IT user base, which leaves the business without a way to directly participate in the process. Without full and sustained business engagement, these projects often do not yield anticipated benefits.
Third, it is very, very hard to fix bad data using technical tools alone. A computer algorithm for data cleansing, no matter how cleverly constructed, can only address a very small subset of data problems. A data cleansing package would not even be able to detect that there is a problem, let alone fix it.
However, by and large these efforts treat the symptoms of disease that surfaced, rather than addressing the root cause. Strictly speaking, these projects represent a cost of bad data in addition to degradation of business performance. Organization can take it as an opportunity to find root cause of bad data and identify people, process or technology related issues. Once the root causes are identified, there MUST be a data governance strategy sponsored, implemented and owned by business.
The bottom line is data ownership and data contents shouldn’t be IT’s responsibility. With data volume and complexity exploding, the treadmill is spinning faster than the traditional approach’s ability to keep up.