Standardizing data from heterogeneous sources
Discover how our solution for standardizing data from heterogeneous sources simplifies the management and integration of information from different sources, for better quality and a unified view of your data.
The need
Our customer, a major player in the passenger and freight transport sector, wanted to reduce the time spent - several weeks, even months - gathering the input data needed to complete a project.
So the customer's Data teams began designing an intranet portal where in-house project managers could find the data they needed to carry out their projects in just a few clicks.
The problem: each department producing potentially reusable data published an information sheet on this data in a specific format. As a result, there were several hundred formats.
The portal's raison d'être was to enable cross-functional searches, i.e. searches on datasets produced by different departments. The harmonization of data sheets was therefore an essential prerequisite for the success of the portal project.
Proposed solution
Establishment of a single format for the MSDS.
Format import: Tale of Data uses the target format to automatically suggest to the user the data transformations required to switch from the current format to the target format.
Use of Tale of Data by the customer's Data team to create, for each input MSDS format, the lists of data transformations required to obtain an output MSDS.
Automation of the entire process: every day, new MSDSs are deposited by the various departments on the customer's private cloud (Microsoft Azure). Tale of Data retrieves these MSDSs and automatically applies the corresponding transformations (depending on the originating service and the nature of the MSDS).
Once in pivot format, the records are deduplicated, then sent by Tale of Data to the portal (via API), where they are indexed and made available for search.
Earnings
Tens of millions of euros saved thanks to a spectacular reduction in start-up times for new projects.
The portal is now systematically used by project managers to gather the data they need for their projects.
Therate of data reuse has risen sharply, with a significant reduction in the number of datasets purchased from external service providers, as the project manager had no way of knowing that they were already held by the company.
Standardized locations (construction sites, warehouses, depots, etc.) enable precise geospatial searches to be carried out on datasets within the portal.
The risk of failure has fallen sharply, as projects get off the ground faster and with the right input data.