Data Conversion is delicate but essential, you better do it right from the beginning
Data has become an essential part of your business. Unfortunately, it is not static;
- Data changes over time,
- Systems changes over time,
- New technologies come on the horizon to help you perform better.
To stay up-to-date and not end up being stuck in legacy systems, data migration & conversion is a must. But also, when you want different systems to exchange data, which is often needed for building and working with AI solutions, data conversion is your ballgame.
At Kentivo we continuously have to deal with data-conversions. It requires a careful treatment of data, after all: Trash in is trash out. Our experience is that it is, for example, very important for data to flow seamlessly from one system to another to be able to do predictions based on multiple data sources. One of our best lessons learned: Conversion is more than mapping fields between tables in different databases, plan your steps carefully.
Real-life conversion cases
Let’s give some real-life examples, to show you the importance of a well-designed data conversion pipeline:
Aderant Templates: Together with Clickt we created a pipeline to automatically convert 95% of invoices templates to new template formats in the financial system. This pipeline can be used for numerous implementations of Aderant. For organisations that use Aderant this means that they can quickly convert the bulk which allows the focus to be on the edge cases, improving the quality of the migration enormously.
Market Intelligence Migration: Together with Media Digitaal we converted 20 years of business information to a new platform without interrupting the production by the editor desk. This migration and necessary conversion, helps Media Digital going to a new platform, using AI and improving their service. The conversion allowed a classic relational dataset of tagged articles to be transformed into a dataset that can leverage ontologies for its market analytics.
Product Management Pipeline: For continuous analyses of trends in the product portfolio, we connected the pipeline to a data warehouse. A classic conversion project, where we needed the right data from different sources filling up the AI engine to be able to predict the Product Life Cycle of a product. The pipelines allow the effective creation of different tasks that run on specified intervals and at the same time also support loading a small required data set as an excel spreadsheet.
With every project, we make sure that the well-designed data conversion pipeline can easily be adapted to new cases.
Approach to a so well-designed data conversion pipeline
In most data science projects 80% of the time is preparing the data. At Kentivo, our prime focus is to continuously reduce this number. This allows us to build solutions faster, with higher quality. A well-designed data conversion pipeline is essential to do this.
Coming from the A.I. angle, we took the approach of setting up a proper framework for data conversions. Being aware the cases might differ dramatically in terms of dynamics, the pipelines needed to be very flexible and extendable.
What is needed to build a well-designed data conversion pipeline? Before starting, one must ask the following questions:
- Is it a one-of conversion or will the conversion have to run many times?
- Will both systems continue to exist next to each other?
- Is the data dynamic, will it continue to change over time?
- Is all data required in the conversion?
- How will the data be employed in the new system?
- What can be done through conversion to optimize performance in how the data is used afterward?
Starting with a clear view on what the purpose afterward is, will help setting up the pipelines and avoid over-engineering. It goes beyond field mapping between different tables instead it involves converting it to a representation in the context of a new system. This might involve the need for adding additional calculations or formats as well as logic to better interpret the context of data for predictions.
Our efforts ended up in a well-designed conversion feeding data in a consistent manner into our Genie platform. The genie platform incorporates various pipelines that can also be combined. They have been used as part of A.I. solutions, but also as stand-alone to migrate data or convert document templates. Although not designed for those stand-alone cases, it turned out that with the robustness and flexibility needed in A.I. made them also suitable for this.
Take-aways
So, make sure that you think of the following before starting with converting data:
- Conversion is more than mapping fields between tables in different databases, plan your steps carefully
- Design your conversion pipeline with clear principles so you can leverage it for many different use cases while not having to engineer everything up-front
- Carefully consider the dynamics of the data and the purpose of the conversion. Just converting everything will lead to needless effort, not only in building but also in the steps after the data conversion.
Last but not least, the answer is not in the tooling, the answer to do it effectively is in the discussions and thinking things through in terms of importance, context, and dynamics. If you need help with your data conversion, feel free to contact us.