What is data lineage? By Collibra
Data lineage describes how data transforms and flows as it is transported from source to destination, across its entire data lifecycle. It helps organizations get the full story behind their data so they can use their data to make impactful business decisions. Why is data lineage important? Data lineage is important because it ensures that…
The post What is data lineage? appeared first on Collibra.
Data lineage describes how data transforms and flows as it is transported from source to destination, across its entire data lifecycle. It helps organizations get the full story behind their data so they can use their data to make impactful business decisions.
Why is data lineage important?
Data lineage is important because it ensures that an organization’s data is accurate and trusted. Without data lineage, business analysts have no visibility into the correctness of their data, and therefore, could be basing important decisions off of inaccurate and incomplete data. Data lineage enables business analysts to see where their data is coming from so that they can be sure they are using the right data to drive business decisions. It also helps IT and data engineers by automating lineage extraction so that they no longer have to manually map data lineage in Excel spreadsheets, therefore freeing up IT’s time for strategic initiatives. With complete data lineage, data engineers can quickly and easily identify the impact of any changes they are looking to make.
More specifically, data lineage is important because it results in four key benefits that affect the entire business. Data lineage helps organizations in the following ways:
- Comply with regulations: support BCBS239, GDPR, CCPA and other compliance efforts by tracking how data flows through various systems from source to destination
- Automate data mapping efforts: save time by automatically extracting lineage from various source systems and keeping it up to date so IT can spend less time manually mapping data and more time on strategic initiatives
- Better understand and trust your data: understand the full context of your data including the source of the data, how data sets are built and aggregated, the quality of data sets, and any transformations along the data journey so you can make informed data driven decisions
- Save time doing manual impact analysis: enable impact analysis at a granular level (columnar, table, or business report) of any changes to downstream systems
These benefits help the Chief Data Officer, the business analyst, and IT do their job effectively and efficiently, thus enabling the organization to become data driven.
What is a data lineage tool?
A data lineage tool automatically maps relationships between data points to show how data moves from system to system and how data sets are built, aggregated, sourced and used — providing complete, end-to-end lineage visualization. An enterprise-grade data lineage tool should include features such as:
- Automated lineage extraction: discover and extract lineage automatically from source systems for an end-to-end view of your data with visibility into full data context
- Summary business lineage: trace data flows with an interactive data map that shows summary lineage from data source to report
- Detailed technical lineage: view transformations, drill down into table, column, and query-level lineage, and navigate through your data pipelines
- Indirect lineage: view direct data flows across assets as well as participating indirect relationships that influence the movement of data, such as conditional statements and joins
- In-line context of code: easily identify and drill down into relevant table and column-level code within lineage diagram
- Export lineage diagrams: extract lineage state diagrams in different file formats for reporting and regulatory purposes (PDF, PNG, CSV, etc.)
Data lineage use cases
Large enterprises with huge volumes of data that is spread out across numerous databases and systems use data lineage for a number of different use cases. Data lineage can help the Chief Data Officer comply with regulations, the business analyst make more accurate decisions, and IT spend less time manually mapping data and more time on strategic initiatives. In particular, data lineage can help a large enterprise with six distinct use cases:
- Regulatory compliance: help comply with regulations such as BCBS 239, GDPR, and CCPA by understanding data for regulatory purposes
- Self-service analytics: enable more accurate analytics and decision-making by providing important context around the data
- Data exploration and viability: improve discovery capabilities to ensure more accurate analytics and decision making
- Rationalization and cloud migration: assist planning and execution of data modernization initiatives (e.g., DWH to cloud) by identifying and documenting the critical data elements for cloud migration
- Asset management: identify the least and most usable (and certified) data assets across the enterprise
- Impact analysis: conduct impact analysis at a granular level (columnar, table level, or business report) of any changes on downstream systems
As these six use cases show, data lineage really helps across the enterprise to ensure digital transformation by providing the necessary context to unlock the value of an organization’s data.
Types of data lineage
There are two different types of data lineage — business lineage and technical lineage. Rudimentary data lineage solutions only have business lineage; more advanced data lineage tools have both business and technical lineage. Business lineage provides only a summary view. It shows an interactive map that traces data flows from source to report.
Business lineage is an important tool for business analysts who want to see where their data is coming from to ensure they are using data from a reliable source, but do not want to be bogged down by every alteration in the data.
In contrast, detailed technical lineage allows IT and data architects to view transformations, drill down into table, column, and query-level lineage, and navigate through their data pipelines. Together, business lineage and technical lineage provide a holistic view of an organization’s data so that data citizens in all departments and roles can use data to make accurate business decisions.
Why your organization needs data lineage
Without automated data lineage, IT must manually maintain lineage in Excel spreadsheets. This means someone must build the mappings and keep them up to date, which takes a massive amount of time, especially for enterprises with large amounts of data that is scattered across databases and systems. This waste of time can result in financial loss and impede innovation. With data lineage, organizations can avoid this headache.
Because of the visibility into data relationships provided by data lineage, business analysts will be able to ensure that trustworthy data is used in business analysis, building confidence in and extracting value from data across the organization.