By Jean-Christophe Bouramoué
The ability to effectively monitor your data has become indispensable for anticipating anomalies and guaranteeing consistent quality. By enriching its functionalities with advanced statistics and customized natures, Tale of Data enhances your data observability. These tools enable you to better understand the health of your data, quickly identify significant deviations and configure alerts tailored to your needs. Thanks to these innovations, you now have more precise and proactive control over your information systems.
I - Advanced statistics in the Mass Discovery module
We've added an advanced statistics calculation to Tale of Data's data discovery module.
In concrete terms, when you run an analysis on thousands or even millions of tables, Tale of Data collects, for each column, the number of distinct values, the minimum, maximum, average, standard deviation and various percentiles. This information is added to data quality statistics and semantic analysis (e.g. identification of sensitive or personal data) already offered by the Data Discovery module.
This new feature offers two advantages:
A much more accurate snapshot (by column) of the actual state of your data.
A new range of possibilities in terms of Data Observability: one of the aims of Data Observability is to provide a precise mapping of the health of your data, and to trigger alerts when quality indicators exceed certain thresholds.
In Tale of Data, you can now trigger alerts on much more specific events. Here are a few examples:
I want to receive an alert when the number of modalities in a column falls below twenty values, indicating that something has gone wrong with a data import process.
I want to receive an alert when the threshold of the 5% highest values for my column rises above a certain threshold, which means that a certain number of outliers have appeared in my dataset.
I want to receive an alert when the standard deviation for a given column (numerical or date type) has fallen sharply. This may mean that some processing has led to a regression that has produced an unusual distribution on that column.
II - Adding custom natures
As standard, Tale of Data is capable of recognizing nearly fifty "natures" of data. In fact, by analyzing thousands of structured files or database tables, Tale of Data automatically provides a precise mapping of the columns in which telephone numbers, e-mails, IBANs, surnames, first names, etc. are present.
Tale of data also provides quality statistics on these columns: the percentage of missing data and the percentage of invalid data (e.g. malformed emails).
What's new is that you can now define your own data natures and benefit from the massive analysis and statistics offered by Tale of Data.
Tale of data offers three ways of defining custom natures:
Specify a list of values : For example, you can define a "Color" type for which the list of permitted values is white, yellow, orange, red, blue, green, brown, gray and black. Tale of data will be able to identify columns of type "Color" and provide the number of cells with a value not belonging to the specified list of colors.
Specify a regular expression For example, if the detection of license plates in datasets scattered across your information system is important to you, you can specify in Tale of data that a French license plate consists of 2 letters followed by a dash, followed by 3 letters, another dash, then 2 digits. Tale of data will then be able to search tens of thousands of data sets for columns containing license plates.
Provide a script : This last option is important when certain calculations need to be performed to ensure the validity of the data. For example, if you're looking for datasets containing intra-Community VAT numbers, you'll need to check a number of scripting rules to identify and rigorously validate this type of data. For example, in France, the VAT number is made up of the code FR then 11 digits: a 2-digit computer key (to be verified with an algorithm) followed by the company's 9-digit SIREN number.)
Customized natures enable you to adapt Tale of Data's analysis and monitoring capabilities to your data typology. This gives you a powerful means of triggering alerts on data anomalies specific to your business before they impact the smooth running of your company.
Conclusion: proactive monitoring for optimal, risk-free data quality
Advanced statistics and custom nature features bring a new dimension to Data Observability, enabling you to examine your data with even greater precision. Thanks to these tools, you can not only monitor the evolution of your data in real time, but also configure specific alerts based on defined criteria, such as anomalies in standard deviations or outliers. This level of monitoring enables you to anticipate potential problems before they impact your business processes, guaranteeing optimum data quality while reducing risks.
Tale of Data offers you a proactive way of controlling your data, improving decision-making and limiting downtime due to anomalies. To find out more about the importance of data quality in data governance and its strategic role, read our article on data quality, a major pillar of data governance.