Complete your data on millions of lines
Enrich your data in a variety of ways to add value to your data and select the strategy that best fits your context and need at the time.
enrichment with the help of repositories,
enrichment by join between 2 sources,
enrichment using fuzzy logic and/or phonetics.
Enrich your data with repositories
Repositories are, in essence, enrichment data.
Data enrichment from repositories allows you to cross-reference and complete your data with internal or external information. This is an important step in the quality of your data.
Increase the content of your data in a few clicks, without ever writing a single line of code is one of the strengths and specificities of Tale of data.
Leverage all available repositories
your own, by cross-referencing information from different sources,
your organization's internal repositories, produced by other departments,
or external repositories to your organization. It can be commercial databases or datasets available in open data. Tale of Data provides its users with a large number of public data in its catalog (SIRENE, IBAN, LEI, ...).
Repository-based reconciliation in Tale of Data allows you to use several matching strategies:
Exact matches: this is ideal. In this case, you have common information between the two sources and you can easily create a bridge between your information and this repository.
Fuzzy matches (e.g. phonetic, similarity rate): this is the option to choose if you have data likely to contain spelling inaccuracies.
Make the most of your repositories with Tale of Data
Tale of Data repositories have two major advantages:
they can be shared with other users,
they offer excellent performance because they are automatically indexed.
A piece of information can be retrieved in a few milliseconds in a repository containing hundreds of millions of lines.
It is thus possible to enrich data in mass, in a reduced time.
Enrichment by join
Join enrichment is a solution that allows you to join multiple files using a common key.
The ease of use of this function offers a wealth of combinations to the user:
types of joints (see illustration on the right)
join conditions: equal, different, greater (strictly or not) or lesser (strictly or not), ...
Thanks to this function, you can easily enrich your data with additional information from different sources, without having to write a script. Indeed, Tale of Data allows you to cross, in the same treatment, Excel or CSV files with those of a Database.
This feature offers non-technical or low-tech users the ability to quickly and efficiently process large data sets without technical programming skills.
Fuzzy logic enrichment
Fuzzy logic is a complementary method to data enrichment by join.
If the join strategy requires a common key between your datasets, fuzzy logic allows you to free yourself from this constraint.
Apply reconciliations and enrichments with similar data, always without writing a line of code.
Approximate spelling (1 or more differences), phonetics, ignore case, accents, spaces, ... whatever strategy and function you use, Tale of data detects 'approximate' terms and correlates data from different sources, even without a common key.
The advantage of the confidence index in matchmaking
Finally, the confidence index measures the reliability of a fuzzy join. This index goes from 0 to 1.
If the index = 1, the join is 100% reliable between your two sources and all the joined/replaced fields are identical.
If the index is between 0.99 and 0.85, the reconciliations proposed by the solution are to be studied and the decision will be taken on a case-by-case basis.There can be, for example, only one letter of difference (Smith and Smith) and in spite of this difference, they are indeed the same data. It will be logical to put them together.
Finally, if the index is less than 85%, the join is unreliable. There are large differences in the reconciled fields and it is unlikely that the study is relevant. Tale of Data allows you not to match these data.
In other cases, only one letter of difference is normal and does not come from an input error. This is the case, for example, with Vitalis and Vitalys. Your confidence level will be high if you only use the name to match the information, even though they are two different companies.
The confidence index thus allows the user to facilitate his reconciliation decisions .