By Jean-Christophe Bouramoué
After the IAG shockwave, Jean-Christophe Bouramoué, creator of the data quality platform Tale of Data, is more pragmatic than ever.
Impact and reflection on generative AI
The earthquake caused by the spectacular progress of generative AI at the end of 2022 seems to have subsided somewhat, giving way to more pragmatic reflection and action on the part of those involved in and witnessing this revolution. After the understandable phase of amazement, the right questions are starting to be asked: what is the real impact of generative AI on productivity? What are the risks associated with putting these technologies into production? The answers are complex and highly domain-dependent, not to mention ethically challenging. It is clear, however, that we have reached a milestone in our thinking about our relationship with AI.
Tale of Data advances in 2024: integrating generative AI
So what about Tale of Data, the French data quality editor? Version 2024 of Tale of Data integrates generative AI, offering advanced features for auditing, analyzing, correcting and transforming data. Until now, our solution analyzed columns in datasets entirely automatically: checking the consistency of rows required the user to create business rules.
With the integration of generative AI, Tale of Data can now detect inconsistencies on a line without human intervention: for example, the Tale of Data audit can point out that an individual's postal address is not located in the same country as that indicated by their telephone number. This is not to say that there should be no human intervention - on the contrary - but the primary analysis can be carried out by AI. Another important point: Tale of Data uses the power of generative AI only on a small sample of data (less than a hundred lines, possibly anonymized, for example). Our platform then interprets and verifies the AI's responses to perform control and remediation processing on potentially massive data (billions of lines).
The strengths of the approach we have chosen lie first and foremost in making the use of natural language by users when expressing their needs compatible with the need for data confidentiality. This is made possible by the fact that the user only exposes a tiny part of his or her data. Secondly, the cost of using ChatGPT and other generative AI is kept under control, which is no mean feat when you consider the input and output costs of API "tokens" (to put it simply: generative AI providers charge in proportion to the number of words contained in the questions asked and the answers produced). Invoices from generative AI providers can quickly become prohibitive, even though the cost of the Tale of Data solution is fixed. Finally, generalizing diagnosis and remediation operations from a small sample enables gains in terms of speed of execution and productivity that are far superior to those obtained by relying solely on generative AI.
Users, aware of the strengths and weaknesses of generative AI, will be able to share the results of their work and interact with other teams in the company, which remains one of our strengths. With Tale of Data, all transformations are auditable and testable by users, and by the IT department if necessary. This avoids surprises during production rollouts, and protects against the risk of LLM (Large Language Model) hallucinations.
Towards the democratization of data use
Tomorrow, it will be possible to imagine other uses that will reinforce and boost the use of our solution's many functions. And that's what we're working on right now: democratizing uses by enabling users to express their expectations in their own words, while giving them all the necessary means of control.