Introduction to metadata
Metadata is organised information that describes, locates or otherwise makes it easier to retrieve information. Metadata is all around us, it started as catalogue cards in libraries and is now used mainly in a digital manner. Metadata is everywhere, every webpage, every file, picture, piece of software has metadata that describes what it is, when was it created, what size it is, generally everything you or a computer needs to know to efficiently find information.
There are two main types of metadata, descriptive and structural. Descriptive metadata is information that is used for identification or discovery of a resource. It can be a title, abstract, author or keywords. Structural metadata usually refers the properties of an object such as the format, size, media, when it was created and so on.
Metadata provides important benefits to a business including:
- Consistency – metadata has information that helps business users understand the difference between business terms such as: clients and consumers, revenue and sales and so on.
- Understanding of relationships – metadata helps the business user to resolve inconsistencies when determining if business terms are associated throughout the data environment. If say the same entity in one form is declared as a delegate and another one is guest, metadata would help to resolve this issue.
- Clarity of data lineage – metadata usually contains the origins of a data set and help determine where it came from and how it was created. Moreover, metadata can contain auditable information about its users, who created, changed, deleted or moved data with the exact timestamp.
To manage metadata on a large, enterprise scale it is common to create a metadata repository. There are three main approaches to building a metadata repository: centralized metadata repository, distributed metadata repository and federated or hybrid metadata repository.
A centralised metadata repository is the traditional approach. This approach offers good scalability for new metadata to be captured, good access to information and fairly high performance. However, it does run the risk of being a single point of failure and a performance bottleneck.
A distributed metadata repository allows a business user to access up-to-date metadata from all systems in real-time. This approach offers better data quality as the data can be viewed in real-time, however because all of the systems need to be available in real-time a single system failure can potentially bring the metadata repository down.
Lastly, the hybrid approach tries to marry best of both worlds, it can support real-time access of data from source systems and centrally maintain metadata definitions or have a reference path to locations with the accurate definitions, thus improving performance and quality.
Metadata and its efficient management is crucial especially for large businesses that run the risk of major costs if there is no strategy and solutions in place to maintain massive repositories.