Easing Cloud Migrations with Tamr By fabienvaucheret
With all the buzz around cloud technology, it’s not uncommon to come across a team or organization that is planning to, or in the process, of migrating data from one or more legacy systems over to a cloud infrastructure. And…
With all the buzz around cloud technology, it’s not uncommon to come across a team or organization that is planning to, or in the process, of migrating data from one or more legacy systems over to a cloud infrastructure. And this is for good reason. In fact, this may be exactly what you’re working on — trying to determine which cloud to migrate to, what resources it will take, and how long it will take, among many other factors to consider. Moving to the cloud offers up an opportunity to make use of data in many different ways by many different consumers (e.g. data scientists, analytic teams).
While these are all important considerations; before any migration it is critical to think about cleaning and curating your data. An all too common scenario is for organizations to lift and shift their data without any thought about improving the data quality before it’s moved. So while the data now resides in a new location (the cloud) it is still as unusable as before, and the data consumers won’t have an intimate knowledge of the data or how it got there.
Why? The data is still sitting in several places (albeit on the cloud), and is not defined into logical entities that can be easily digested by the business.
Think of this issue like moving into a new apartment. Would you pack up the messes in your old place, and then send it over to your new apartment in boxes of messes? More likely, you would neatly organize and pack the things you plan to bring so it’s easy to unpack and you give yourself a fresh start. The same logic can be applied for data migrations; figure out what is most important to shift, logically organize, master, and curate, and then migrate to the new infrastructure. Taking the time to organize and curate data into logical entities allows you to have a known baseline to plug into existing applications today and new ones tomorrow.
ETL Alone Cannot Solve Migration Issues
A typical migration process may heavily utilize an ETL tool in order to move data from multiple source systems. In addition, ETL services are used to build the required logical entities (customers, accounts, etc.) for reporting as well as any further processing of data prior to being stored in a downstream data warehouse for consumption. There may be additional reporting logic needed to power the analytics and reporting downstream systems, as well as for any future migration efforts. Some of the common pitfalls with this strategy are:
- Data quality may be poor
- The up-front time required to write the ETL logic will cause delays
- Too many resources may be required
- Time to value will be far too long
The Tamr Advantage
Tamr can be used to generate the logical entities for reporting via accelerated schema mapping and data mastering. This can provide a trusted view of each entity for reporting combining data from multiple sources from Day 1. Reporting systems can use this data along with any other data which may not need to go through Tamr (e.g. transactions).
The need for complex additional ETL is severely reduced due to Tamr’s machine learning-based capabilities for reporting and migration.
In this workflow, multiple disparate internal and external data sources are brought together into a landing zone. From here, Tamr’s Schema Mapping provides accelerated entity mapping to reduce the time it takes to align common attributes and build target data models. Once the logical entities are defined, Tamr’s Record Mastering and Golden Records capabilities match and dedupe the data to provide a new curated layer for the logical entities. These cleansed and mastered datasets are then sent to the new cloud infrastructure that also feeds into downstream reporting and analytics applications.
Under the hood, Tamr is able to map, enrich, match, classify, and consolidate data at scale thanks to the patented human-guided machine learning technology. Business data experts contribute directly to the mastering model by answering simple match or no-match questions about the data. This iterative process makes the time to develop accurate, curated datasets as part of the migration much more quick since no traditional rule development processes are implemented.
Getting Started with Tamr
The target cloud infrastructure will have data that is ready to use from the get-go. This allows you to immediately start benefiting from all the new features in the cloud that drove your decision to migrate in the first place. Tamr’s solution allows for this requirement to be met with best-of-breed machine learning technology that accelerates at scale and optimally cleanses and organizes your data during the migration.
To learn more about Tamr’s data migration services, download our full white paper, Migration, Unified. And as always, please contact us with any questions or request a demo to see Tamr’s solution in action.