
Industry
Sensor data (IoT), financial transactions, plant information, anomaly detection

Monitoring and improving the quality of PI data
Our client, an industrial group with hundreds of subsidiaries worldwide, wanted to control and improve the quality of PI data (PI = Plant Information data emitted by sensors installed on production sites).
There were several aims:
-
PI nomenclatures (assets, attributes, tags) that have clear naming rules and no duplicates, for better re-use of tags and cross-site analyses
-
Implement a powerful monitoring system for PI Tags (= time series): real-time detection of missing and outlier data, defective sensor identification, etc.
-
Feed the teams of Data Scientists with reliable data, which is an essential prerequisite for building consistent and efficient predictive models (forecasting, predictive maintenance, etc.).
Proposed solution
Harmonization of sensor nomenclature:
Tale of Data automatically matches texts (name, description, etc.) with spelling differences using algorithms of fuzzy matching advanced : phonetics (English / French) , consonant frequency (or vowels), fragmentation words (N-Gram)or even automatic weighting words: the least discriminating words are given a low weight.
Monitoring of sensor data with algorithms time series analysis algorithms of Tale of Data :
-
For each type of sensor, setting of appropriate alert thresholds for measured values (temperature, pressure, etc.). Thresholds are fixed through automatic analysis of history over several years
-
For each type of sensor, setting of appropriate alert thresholds for the time differences between two measurements. Thresholds are fixed through automatic analysis of history over several years
-
Automatic alerts set to trigger if preset thresholds are overstepped or data is missing
Gains
The harmonization of the labels and the deduplication allowed the creation of a shared repository of metadata PI: Assets, Attributes, Tags.
This repository of metadata repository PI metadata repository, with clear naming rules has opened up many possibilities:
-
Consistent system representation: items that represent the same type of equipment have the same set of attributes, with standardized names, descriptions and units of measure
-
Facilitation of multi-point analysis The metadata allow aggregation or comparison of time seriesfor monitoring, reporting or predictive analysis (Machine Learning )
Time series analysis produced, in just a few weeksa fully automated monitoring system that continuously analyzes data from tens of thousands of sensors.
Very specific alerts have been set up (sensors producing incorrect values or anomalies in the time intervals between two measurements). Business users can reconfigure the alerts at any time without having to write any code.