Title Image

Blog

Why you should keep data observability separate from data cleansing By Eric Gerstner

  |   RSS Feeds

As a principal for data quality, I enjoy taking time to work with our customer base. Since joining Collibra, I had the privilege of speaking to over 60 companies in just a few months. These organizations demonstrate perspectives and prioritization that show great promise in the industry of data.  I often hear questions about commingling…
The post Why you should keep data observability separate from data cleansing appeared first on Collibra.

As a principal for data quality, I enjoy taking time to work with our customer base. Since joining Collibra, I had the privilege of speaking to over 60 companies in just a few months. These organizations demonstrate perspectives and prioritization that show great promise in the industry of data. 

I often hear questions about commingling the Observability and Resolution features to solve for data quality – typically surmised as “Why can’t our tool both detect and resolve data quality issues?” This want to commingle the solutions likely stems from the Gartner definition published in the Magic Quadrant for Data Quality Solutions, which rightfully states that data quality needs “identification, understanding[,] and correcting flaws in data.”

We agree that correcting flaws is as important as identifying them. However, we caution that while a robust data quality platform should correct flaws, platform leaders should exercise caution in requesting single products/projects to combine both observability and correction. Observability partners well with correction, but merging inspection and action often leads to further complications.

Consider the examples from our peer industries:

Operational Risk. For financial services, data governance found its roots in risk. As CROs prioritized Operational Risk post the 2007 financial crisis, those leaders also looked to centralized thought like the Basel Accords. Basel II lists seven risk categories with clear reference to data governance given “data entry” and “data maintenance” (Event Type 7). It prescribes effective controls and cautions against “conflicting responsibilities,” suggesting that we consider “independent monitoring” (Principle 6). 

How can a tool that is checking the quality of an ETL effectively and independently monitor its own ETL?  

If you were undergoing an Operational Risk review of your IT systems, would providing a program or standalone product that detects and cleanses DQ be effective enough? Or would the controls team suggest you need yet another quality tool to monitor that cleansing?

Cyber Security. A close peer to data governance, technology is the foundation of our tradecraft – so much that CDOs may find themselves reporting to CIOs given the executive priority to tie information technology and good data governance. With that proximity, though, it brings heightened attention to cyber mandates and awareness. Consider NIST’s publication on Cyber Safety and Separation of Duties, the “principle that no user should be given enough privileges to misuse the system on their own.” 

How can a tool balance the requirement for ‘read-all’ access in observability with ‘write-all’ permission that would come with correction? 

If you were presenting your DQ program to a CISO, would allowing a single program access to alter all systems be considered cyber safe? Or would you be forced to consider a robust people-process and access model to manage those changes, thus minimizing the effectiveness of combining in the first place?

Audit & Professional Services. As you build your data governance program, you may find yourself under the advice of some great partners in data governance. The Big Four accounting firms have robust professional services arms that provide great data solutions and action them, but you may also notice that they rarely work for their audit clients. The SEC mandates this stating firms cannot be in a “position of auditing their own work” (SOX 2002). 

How can a tool that is meant to audit the effectiveness of data quality also perform its own service effectively?

If you were being audited by a big four, would presenting standalone software that observes and corrects sit well with auditors who are keen on not auditing their own work? Or would a qualified opinion force you to effectively segregate the functions of the tool?

We believe there is a way to observe data quality, understand it, and correct it safely. However, as you consider those vendor solutions, please ensure that any single product that claims to do everything for data quality may very well do nothing for data quality trust. 

The post Why you should keep data observability separate from data cleansing appeared first on Collibra.