top of page


Quality control of a product catalog
Optimization of a repository / database
Identification of duplicates

Colored pens

Reliability of the data in a 'Products' catalog


Our client is a major player in distribution.

The quality of the information contained in its product repository is the prerequisite for the success of all other projects, such as those on the customer and supplier databases and its commercial website.

Improving the quality of the data in its product database and maintaining its reliability over time is therefore a major challenge.

The Tale of Data solution has helped solve the following problems:

  • Deduplicate products with its specific rules ('business rules), written in natural language

  • Detecting outliers to normalize or adjust them, without writing code

  • Standardize product descriptions: color, material, units, ... which are sometimes different depending on the supplier


L'harmonization of the product catalog via the deduplication and the creation of business rule have enabled the implementation of a quality repository, available to the entire company; a prerequisite and foundation for other Data projects.

The production start-up was completed in a few weeks and the monitoring system was automated.

This step, which was essential, accelerated the implementation of corrections on the commercial website; a project that did not seem achievable at the time.

Proposed solution

The "duplicate tracking" is the first step realized in a project of data quality project.


Thanks to itsArtificial Intelligence engine, Tale of Data can automatically match texts with similarities in spelling. This is one of its specificities, impossible with traditional tools.

For illustration, Excel is not able to detect a duplicate between the words "logiciel" and "logitiel", and this is the added value of the Tale of Data solution.

To carry out this first work on duplicate detection, Tale of Data integrates a range of strategies and algorithms : approximate correspondence, consonant frequency or vowels, fragmentation or weighting automatic word weighting.

All these methods allow you to find well-hidden duplicates!

The automation of data quality operations is then organized after detection of duplicates, triplicates, quadruplicates: rectification, homogenization, fuzzy joins on names, deduplication.


The validation rules rules are implemented via the business rules of Tale of Data.


No specialist in scripts was required: everything was done via the solution's interface and the rules written via the menus. The use of a no-code tool containing ready-to-use functions The use of a no-code tool containing rules allowed the business teams to work quickly, without the need to mobilize skills outside their own.

Finally alert dashboards are created to prevent any decrease in data quality.

The possibility automate and schedule treatments guarantees the durability of corrections and prevents data degradation over time.

The quality of the data is sustainable.

Other applications are possible
Do not hesitate to contact us

bottom of page