Technology may run on software, but software needs fuel. That fuel is data. And unless properly configured for a specific application, technology won't run very well. Like putting the wrong gas in your car makes it run badly (or not at all), bad input means bad output. Input = data and that makes clean data migration really important to the success of your ETL (extract, transfer, load) project.
The reality is that data migration is actually the last step in the process. We've sat through hundreds of data-mapping sessions and validated thousands of datasets. From that experience, we've learned a few things about data migration and its partner, data validation. In this post, we define the four big truths about data migration, but if you think you know them, skip to the Readiness Quiz and find out if your team is truly ready.
Truth #1: Begin at the Beginning: Source Data
Data migration begins by understanding the source data. What it represents? How is it generated? How is it used today? And what controls it? A new technologically advanced system may not fall into the same DB storage or structure. So How much of Source Data is really of use? That needs to be migrated?
Particularly if your shop is migrating off legacy technology, what many project managers find out is that Jane or Ron or Jeff are the only people who understand its data. Uh oh, Houston, we have a problem!
Now you’re faced with new technology that needs its version of data, called target data. What happens when a new application with new programs and new configuration rules starts running? You’re about to find out and it might not be what you’d expect. That’s where data profiling comes in.
Data profiling is a process used to define source data. It’s comprised of three definitional components:
We believe this first step is essential to ensure a high quality ETL process. For example, to address identified quality issues, you might need to modify the conversion code. Another example is the opportunity data profiling offers to cleanse your files of incorrect or unused data. Particularly if you’re migrating from a legacy system in place for many years, there could be thousands of errors in the data that will bog down the schedule with testing errors.
Manually performing this kind of data profiling is possible, but bringing in a data migration expert will accelerate your ability to pinpoint data issues through the use of sophisticated query tools. In addition, expert data profiling ensures that you’ve captured the most important data issues, especially those that might put your organization at risk post-conversion.
Correct profiling will filter out unnecessary data, extra validations and legacy data linkages, dependencies.
Truth #2: How Good is Your Data?
Data is generally considered high quality if it is fit for use. "Data cleansing" is the process to improve data quality.
Data cleansing represents both small changes, like ensuring dates are in the correct format, and complex changes, such as de-duping records. Data cleansing is often seen as an opportunity by organizations, to clean up files before a technology conversion. However, defining, prioritizing, and executing against it can be a real challenge.
Here’s some of the questions to guide your data migration team’s analysis:
- How much of your existing data is accurate & inaccurate?
- What are the data update intervals and which process(es) update the data?
- Is your data compliant with current regulations?
- Are data attributes consistent across all datasets?
- Should you convert all data, regardless if the original use case still exists?
- Are data formats consistent across all datasets ? (for e.g. Date Format, Currency Decimal places etc.)
Truth #3: The False Promise of Lift and Shift
Can you imagine plugged in, turned on, and ready to go business technologies? That’s just not ever going to happen. Then why do we believe data can be lifted and shifted?
Lift and Shift is a just myth and if anyone guarantees that data migration from source to target will work by L & S method is simply telling a lie. The truth is that successful data validation requires a thorough output mapping between the source and target systems. Ensuring your team has a deep understanding of its new technology, either through vendor training or working with a third-party expert can be a critical factor to a successful conversion. Once completed, a comparison between the source and target system outputs is defined, variances noted, and rank ordered.
Now, armed with real information, your organization can better define its data validation scope. In our experience, time spent testing data unnecessarily, without prioritizing data migration risk, is one of the major causes of implementation delays. This causes coding, implementation delays, additional regression testing cycles, and increases the likelihood the product launch schedule will be interrupted.
Truth #4: Testing is Subjective
The truth is that it may not be possible to completely test a new application. You are going to end up going live with some degree of errors. We understand the dynamics of financial institutions and the desire for an error-free implementation. These demands often result in over-testing where every single field, attribute, and condition is a test case. This kind of approach is not just inefficient, but also time consuming and does little to improve implementation quality.
The question is then, is how do you decide what to test? This is where all the previous activities bear fruit. With a proper understanding of your source system data, assurance of data quality, and effective output-mapping, you’ll be ready to make these decisions with confidence. However, at this point in the process, if you still lack confidence, consider the following:
- If the target system is mature, consider presuming that all core functionality works as defined and focus on any customization or areas of high processing complexity.
- If the target system is immature, consider running the source and target systems in parallel for some period in order to focus testing on variances between the two.
- At a minimum, clean the source system data of inaccuracies, unused fields, and inactive records.
- Consider bringing in an outside domain expert to work in partnership with your team.
Are You Ready? Take the Quiz and Find Out
This is our Readiness Quiz. It will help you determine how closely your perception of readiness is to your actual readiness. The way to use it is to rate your readiness on a scale of Very Ready to Not Ready. Once completed, it will help you to see more clearly the strengths and weaknesses of your organization relative to its ETL plan.
Feel free to cut this quiz out and share it with your planning/steering committees. It’s based on our decades-long experience of guiding organizations to better understand their readiness for the big work ahead of them.
The four truths about data migration/validation are:
Organizations undertake data migrations for any number of reasons. These include everything from an entire system upgrade to establishing a new data warehouse to merging new data from an acquisition.
In today’s competitive environment, data is not only the fuel that drives your processes, but it’s the key to unlocking the value in new technologies. The question to ask yourselves is this: are you ready?
In our experience, we’ve found that organizations often under-estimate the effort to effectively migrate data and minimize data validation risks. This is usually the case because rigorous data quality protocols were either not put in place or have degraded over time. Another challenge is ensuring your team has access to knowledgeable domain and application experts that can effectively put data into its proper context for analysis.
Our Readiness Quiz should give you a better picture of how well your organization understands its source data and how prepared you are for the ETL. If you’d like to discuss its results with our team of experts, we’re here to help make sense of it all.