By Jean-Christophe Bouramoué
As the amount of data generated by businesses continues to grow exponentially, the traditional centralized approach to managing this data is becoming less and less appropriate.
Until now, the methodologies employed involved a centralized team responsible for collecting, storing and maintaining data, and a set of data consumers who used this data to make decisions.
The centralized approach has reached its limits for many organizations, as it leads to :
Data silos, making it difficult, if not impossible, to reuse data in other contexts.
Insufficient or even inconsistent data quality, as data is qualityed by centralized teams independently of any context of use. In reality, data quality is meaningless in absolute terms: it depends on the context and needs of data consumers.
Excessive difficulties and delays for consumers in finding and retrieving the data they need. This means that the centralized approach becomes impractical as the quantity of datasets increases (i.e. failure to scale).
In recent years, a new paradigm of data organization has emerged, known as Data Mesh.
Zhamak Dehghani founded the Data Mesh concept in 2018 as the first to propose a paradigm shift in big data management, based on data decentralization.
What is Data Mesh?
Data Mesh is a new way of organizing data that aims to overcome the limitations of traditional approaches. Rather than having a centralized team in charge of managing all the data, the Data Mesh proposes a decentralized approach by transferring ownership of the data to the data producers, who are the real experts in the field.
Data producers are grouped by functional area. Each functional area is responsible for managing its own data.
In this context, the term functional area should be understood in the business sense. These include, for example, marketing, sales, customer service, human resources, risk or compliance departments, anti-fraud departments, finance departments, etc.
Each functional area has its own data team, responsible for collecting, storing and maintaining data relating to that area. This data team works closely with the domain's business experts to ensure that data is accurate, relevant and up-to-date.
Data Mesh introduces the concept of Data as a Product. Instead of using data as a by-product of a process, data becomes THE product.
A Data as a Product is a self-contained, reusable unit of data.
Autonomous, meaning that the product contains everything it needs to be directly usable: the dataset itself, its metadata (i.e. information about the data), information about its quality, as well as the infrastructure required for its operation (= continuous updating and accessibility).
Example of Data as a Product:
A raw dataset.
A prepared data set (standardized, enriched, etc.)
A dataset resulting from a process that takes several datasets as input and performs a series of transformations and calculations.
A data set resulting from the application of a predictive model obtained through machine learning: this is a processor capable of automatically classifying data presented to it. For example, given information on a given customer, the predictive model will tell you whether this customer is a good candidate for the purchase of a particular product, or whether he or she is likely to buy a similar product from your competitor.
Any other type of data asset that can be packaged and delivered to data consumers within an organization.
Beware of confusion between Data as a Product and Data Product. In a Data Mesh context, the former refers to a published, reusable dataset: the data IS the product. The second concept refers to a digital product that solves a business problem using input data (e.g., a dashboard for tracking sales data: the dashboard IS the product).
Data as a Product is designed to be easily discovered and consumed by different teams within the organization.
A Data as a Product is published on a unified platform, providing a standardized means of accessing all the company's Data as a Product. Each functional domain team creating a Data as a Product becomes its owner. They are responsible for its quality and consistency, and for ensuring that the data is accurate and up-to-date.
Data as a Product standards for discoverability, security and interoperability are defined on the basis of a federated, i.e. decentralized, governance model.
Data Mesh vs. Data Lake
What is a Data Lake?
Data Lakes are an approach to data management that focuses on storing data in raw, unprocessed form. They are used to ingest data that does not yet have a defined purpose.
What is the role of the Data Lake?
Like the Data Mesh, Data Lakes aim to reduce data silos and improve accessibility. However, Data Lakes always have a centralized data team responsible for data management, unlike Data Mesh, which decentralizes data management.
Data Mesh vs. Data Lake: incompatible?
Data Mesh and Data Lake are not mutually exclusive concepts. Data Mesh is an architectural and organizational approach to data management. As a storage system for very large volumes of data, a data lake can be just as much a part of a data mesh as a data warehouse or cloud storage system.
⚙️ Unlike Data Lakes, Data Mesh is intrinsically designed, through Data as a Product, to extend access to data to non-technical populations, foremost among which are business users.
What are the advantages of Data Mesh?
Data Mesh offers several advantages over centralized approaches to data management. Here are just a few examples.
Reducing data silos :
One of the biggest problems with traditional approaches to data management is the creation of silos, where data is stored in different systems and not easily accessible by other teams within the organization.
Data Mesh helps reduce these data silos by making Data as a Product :
Interoperable: Data as a Product is standardized so that it's easy to use data from other Data as a Products in a Data as a Product, without having to worry about the technical details of storing this data physically (which could just as easily be in a company database as in the cloud).
Easily discovered and consumed by other teams.
Improving data quality :
In a centralized approach, Data Quality teams have no choice but to prepare data "blindly", without any idea of its actual use.
In practice, this doesn't work, as data quality management is highly dependent on the context and needs of data consumers.
In the Data Mesh approach, each functional area team is responsible for managing its own data and Data as a Product. The problem of data quality and accuracy is therefore much easier to manage.
Indeed, the teams working on a Data as a Product are perfectly familiar with the context in which the data is used, since they work in close collaboration with business experts in the field.
They therefore know exactly which remediation and transformation operations are relevant for a Data as a Product to deliver results that live up to expectations.
Increased agility :
By decentralizing data management, Data Mesh enables organizations to be more agile and responsive to changing business needs. Functional domain teams are able to make changes to their Data as a Product quickly and easily, without having to go through a centralized data team.
Shorter time-to-market :
The division into business domains, with more compact teams taking responsibility for managing their own data, means that requests can be processed much more quickly. Functional domain teams can therefore deliver Data as a Product (or evolutions of existing Data as a Product) faster and more efficiently.
Improving collaboration :
The Data Mesh encourages collaboration between functional domain teams and data consumers. This leads to a better understanding of data and its use within the organization.
New problems caused by Data Mesh
While Data Mesh offers a number of advantages over traditional data management approaches, it also brings with it a number of difficulties inherent in decentralized approaches. Difficulties that need to be taken into consideration.
Increased complexity :
Because of its decentralized nature, the Data Mesh introduces a new level of complexity into data management.
The fact that each functional area team is responsible for managing its own Data as a Product can lead to data governance and coordination problems.
It's important to understand that the Data Mesh doesn't eliminate the need for a centralized data engineering team. However, their responsibility needs to focus more on:
determining the best data infrastructure solutions for publishing, sharing and reusing Data as a Product,
the definition of interoperability standards between Data as a Product created by different functional domain teams,
protection of sensitive information.
The division into self-managed business domains has led to an increase in the number of teams in charge of Data as a Product.
Every functional area team needs data skills, including data modeling, architecture, engineering and governance. This can be difficult for organizations without a wide range of technical talent.
Small businesses are unlikely to benefit from a Data Mesh approach. Indeed:
their data is not as complex as that of large organizations,
their workforce does not allow them to create dedicated data teams for each business area.
Data security and privacy :
The division into teams, each dedicated to a specific business area, increases the risk of data security and confidentiality problems.
It's important that organizations put in place robust security and confidentiality measures to protect sensitive data.
Data Mesh represents a major shift in the way organizations approach data management and analysis.
This is more of an organizational change than a technological one, based on a decentralized approach with a breakdown of data by functional area (e.g. Marketing, Sales, HR, Compliance...).
One of the most innovative aspects of the Data Mesh is that data is seen as a product in its own right, referred to as Data as a Product.
Data as a Product has certain fundamental characteristics: it is discoverable, documented, reliable, interoperable and secure.
Data Mesh enables organizations to unleash the full potential of their data assets:
by breaking down data silos, thanks to the interoperability of Data as a Product,
by increasing team agility and autonomy, as these teams are smaller and each specializes in a specific functional area,
by improving the efficiency of data quality management: data quality is managed, within each functional area, by a team that has an in-depth understanding of the data and its context of use.
However, the adoption of Data Mesh is not without its problems.
Companies must be prepared to invest time and money in the infrastructure, tools and processes needed to support a decentralized data architecture. They must also be prepared to empower data and business teams to work together to ensure data quality, consistency and security.
Despite these challenges, the potential benefits of the Data Mesh are considerable. By enabling organizations to democratize access to data, making it easier to discover, reuse and exploit, the Data Mesh has the potential to revolutionize the way we think about data and its role in business success. It is therefore likely that we will continue to see interest and adoption of Data Mesh grow in the years to come.