Product catalog data reliability
Product catalog quality
Optimization of a repository / database
Identification of duplicates
The need
Our customer is a major player in the retail sector.
The quality of the information contained in its product repository is the prerequisite for the success of all its other projects, such as those involving its customer and supplier databases and its e-commerce website.
Improving the quality of the data in its product database, and maintaining its reliability over time, is therefore a major challenge.
The Tale of Data solution has made it possible to data reliabilityto solve the following problems:
- Deduplicate products thanks to its specific rules ('business rules'), written in natural language
- Detect outliers and normalize or rectify them, without writing code
- Standardize product descriptions: color, material, units, etc., which may differ from one supplier to another
Proposed solution
Tracking down duplicates is the first step in any data quality project .
Thanks to its embedded Artificial Intelligence engine, Tale of Data can automatically match texts with similar spelling. This is one of its special features, impossible with traditional tools.
To illustrate, Excel is unable to detect a duplicate between the words "logiciel" and "logitiel", and this is the added value of the Tale of Data solution.
Tale of Data integrates a range of strategies and algorithms to perform this initial work on duplicate detection: approximate matching, consonant or vowel frequency, fragmentation or automatic word weighting.
All these methods can help you find hidden duplicates!
Once duplicates, triplicates and quadruplicates have been detected, data quality operations are automated : rectification, homogenization, fuzzy name joins, deduplication.
The desired validation rules are implemented via Tale of Data's business rules engine.
No scripting specialists were required: everything was done via the solution's interface and the rules written via the menus. The use of a no-code tool containing ready-to-use functions enabled the business teams to work quickly, without needing to mobilize skills outside their own.
Finally, alert dashboards are created to prevent any decline in data quality.
The ability to automate and schedule processing guarantees the continuity of corrections and prevents data degradation over time.
Data quality is sustainable.
Earnings
Harmonization of the product catalog via deduplication and the creation of business rules have enabled the implementation of a quality repository, available to the whole company; a prerequisite and foundation for other Data projects.
Production went live in just a few weeks, and the monitoring system was automated.
This essential step has accelerated the process of putting corrections online on the merchant website, a project which did not seem feasible at the time.