Generative AI and data quality: a virtuous circle for innovation
Industry
Monitoring and improving data quality
from sensors
Sensor data (IoT), Financial transactions, Plant Information, Anomaly detection
The need
Our customer, an industrial group with hundreds of subsidiaries worldwide, wanted to control and improve the quality of PI data (PI = Plant Information: data emitted by sensors installed on production sites).
The objectives were manifold:
- Have PI nomenclatures (Assets, Attributes, Tags) with clear naming rules, which are free of duplicates to enable better tag reuse and cross-site analysis.
- Set up a high-performance monitoring system for PI Tags (= time series): real-time detection of missing or aberrant data, identification of faulty sensors, etc.
- Supply teams of Data Scientists with reliable data, an essential prerequisite for building coherent, high-performance predictive models (forecasting, predictive maintenance, etc.).
Proposed solution :
Harmonization of sensor nomenclature :
Tale of Data automatically matches texts (name, description, etc.) with spelling differences using advanced fuzzy matching algorithms: phonetics (English/French), consonant (or vowel) frequency, word fragmentation (N-Gram), and automatic word weighting: less discriminating words are given a lower weight.
Monitoring sensor data with Tale of Data's time series analysis algorithms:
- Determination, by sensor type, of appropriate alert thresholds for measured values (temperature, pressure, etc.): these thresholds were obtained by running an automatic analysis over several years of historical data.
- Determination, by sensor type, of appropriate alert thresholds for time gaps between two measurements: these thresholds were obtained by running an automatic analysis over several years of historical data.
- Automatic alerts when thresholds are exceeded or data is missing
Earnings
The harmonization of wording and deduplication has enabled the creation of a shared repository of metadata repository PI: Assets, Attributes, Tags.
This metadata repository PI metadata repository, with clear naming rules, has opened up many new possibilities:
- Consistent system representation: same set of attributes for elements representing the same type of equipment, with standardized names, descriptions and units of measurement.
- Facilitation of "multi-point" analyses: standardized metadata enable time series to be aggregated or compared, whether for monitoring, reporting or predictive analysis (Machine Learning ).
By analyzing time series, we were able to put a fully automated monitoring system into production in just a few weeks, continuously analyzing data from tens of thousands of sensors.
Alerts on very specific conditions have been set up (sensors emitting erroneous values or showing anomalies in the time intervals between two measurements). These alerts can be reconfigured at any time by business users, without writing any code.