The Next Generation of Data Platforms is Data Mesh [2025 edition]
As enterprises become more agile, data centralization appears more and more as a thing of the past. As pure decentralization also failed, a third form needed to appear.
Data centralization is a thing of the past world, a waterfall world. The world has changed, and we are entering a product-oriented data engineering & management world.
I wrote the first version of this article back in August 2022. Since then, things have evolved, so I decided to rejuvenate this fundamental article.
Evolution of data platforms
Before diving into the details of Data Mesh, let’s review how the information industry came to this situation.
Sixty-five million years ago, dinosaurs… no, I will not go that far away in time. In 1971, Edgar Codd invented the third normal form, the key to relational databases. Soon after that, enterprises started seeing the benefits of aggregating data, which opened the way to the creation of data warehouses.
With the advent of data warehouses, the need for more rigorous data management practices emerged. Just as a logistics warehouse requires clear aisles, shelves, and space identification, a data warehouse needs the same level of organization. You had to design the data warehouse to accommodate incoming data and build ETL (extract, transform, and load) processes to fill the warehouse. Enterprises were now capable of performing analytics in a new dimension. Unfortunately, warehouses are not very flexible, and with the increasing number of data sources, onboarding data has become complex.
Let’s imagine a retail company, Great Parts, with B2C and B2B activities. They have a few thousand stores across North America, a loyalty program, and they accept returns.
Now, imagine Great Parts decides to expand its loyalty program to B2B; you will have to build an ETL process between the B2B returns and your customer space. If you plan to add a new data source, such as clickstream data from your mobile application, web applications, or B2B sites, consider the following. It will be increasingly complicated, and you will have to manage the ETL spaghetti.
As is often the case in our industry, Great Parts decided to completely shift from the data warehouse to a data lake. The pendulum just shifted drastically. In a data lake, you collect all the data you want and store it. Wherever. You can see where this is going.
Now, it gets a little tricky when you try to consume your data again. Storing is easy; however, reading is complex. You can access the data by creating small data warehouses (databases or data marts) for your analytics loads, but you’re back to the ETL spaghetti. It is the same dilemma when it comes to operational processing through microservices.
More recent architectures, like the data lakehouse, are trying to combine the best of the lake and warehouse. However, they still lack the data quality, governance, and self-service features to ensure compliance with the enterprise and regulatory standards.
Opportunities
Unfortunately for some technologists, projects are not happening for the sake of technology; opportunities and challenges drive them. Let’s look at the opportunities that presented themselves as Great Parts leadership considered a new data platform.
Great Parts pioneered in self-service analytics, offering business analysts and data scientists access to their data warehouse very early compared to many companies. The success of this initiative, combined with Great Parts’ willingness to move to the cloud, drove a need for a different type of data platform.
In addition to the self-service, data scientists’ needs have evolved with more data discovery capabilities. As with many companies, data sources have increased, whether internally, through acquisitions, or even from external sources such as data providers.
As the business has become more and more complex, a major driver was increased compliance and auditability, and the challenge became about marrying big data, self-service discovery, experimentation, compliance, and governance, while providing a clear path from data experimentation to production.
The team at Great Parts settled on the Data Mesh paradigm as it was the best suited for their customer needs.
The four principles of the Data Mesh
In May 2019, a brilliant engineer, Zhamak Dehghani, published a paper highlighting the basis of the Data Mesh. In her paper, Dehghani sets the ground for four principles, which, over the last couple of years, have been refined into the Data Mesh’s four core principles. I like to compare those principles to how the agile manifesto disrupted the waterfall-based lifecycle in software engineering. Data Mesh is bringing to data engineering many of the concepts you may have been familiar with in agile software engineering.

Let’s discover together those four principles.
1. Principle of Domain Ownership
The term “domain” has been so overused in the last decades that its meaning is almost gibberish. Nevertheless, let’s try to tame the domain and ownership in this context.
A domain is a specific area of business you are focusing on. If you are in the healthcare industry, it can be a hospital or a particular department, such as radiology. Identifying the domain sets the boundaries and helps you avoid falling into scope-creep situations (as in, let’s also include the hospital cafeteria in the project).
If you are familiar with domain-driven development, this principle will come naturally to you.
It is common sense: don’t try to boil the ocean. Find the people who know a domain best, and associate them with a data architect. The decentralized team has a precious domain expertise: they know more about the data sources, data producers, rules, history, and evolution of systems than a centralized team that switches from domain to domain. Adding the data architect to the mix will bring the security, rules, and global governance to stay compliant with the enterprise policies.
2. Principle of Data as a Product
In software engineering, agile replaced the project with the product. It was only a question of time before data became one as well. Let’s see what a data product can bring.
Focusing on a data product will enable you to switch from a project planning perspective to a customer-centric approach, and it's only FAIR, as data must be:
Findable,
Accessible,
Interoperable, and
Reusable.
In the previous version of this article, I used the DAUNTIVS1 acronym, but it was overwhelmingly complicated and confusing at the same time. FAIR covers the attributes more straightforwardly.
The atomic unit in Data Mesh is a data product.
I also used the term “data quantum”: in software architecture, the smallest deployable element is called a quantum. When applied to data architecture, the data quantum is the smallest deployable element bringing value. However, the term created confusion, and although there is an explanation, it was unnecessary to detail a data quantum vs. a data product. Although data quantum is not related to quantum computing, it added a little bit of unneeded spiciness to the overall comprehension.

You’re probably wondering, “Hey, how is that different from my data lake with a couple of data governance tools?” The answer is that size matters: instead of an entire enterprise-level lake, you focus on a single domain. It’s definitely more “byte size” and chewable.
Thanks to its smaller size and scope, implementation is faster, and the value from data is reinjected into the company a lot faster.
3. Principle of the Self-Serve Data Platform
When I was a kid, in France, I loved going to the local supermarket with my parents as it had a cafeteria where I could put on a tray all the food I wanted. The self-service empowered me to make (bad) food choices. But what does it mean when it comes to a data platform?
Since its inception in 2001, Agile has proven to be a working methodology. Agile software engineering empowered software engineers. The way to empower data scientists is to give them access to data.
Data scientists and analysts spend (too much) time in their data discovery phase. In many situations, they find a piece of data in a random column in a table somewhere and take a bet on the fact that this is what they need. Sometimes it works, sometimes your PB&J toast doesn’t fall on the jelly side2.
Empowering the data scientists means that you must give them access to not only a basic catalog of fields but precise definitions, active and passive metadata, feedback loops, and much more. They are your customers; you want to be this 5-star Yelp cafeteria, not this crappy 1-star shack.
It's not the only component, but you probably see the link with discovery tools & marketplaces.
4. Principle of Federated Computational Governance
Every word of this principle has a very important meaning. Let me try to convey to you their crucial interpretation.
Information technology has become so ubiquitous in our day-to-day lives. States and governments have developed laws to manage how personal data is handled and used. Famous examples include Europe’s GDPR (2016), California’s CCPA (2018), and France’s National Commission on Informatics and Liberty (1978). Of course, those constraints are not the only push towards governance in enterprises; companies like Great Parts often have data governance rules and protections that may go beyond what the law requires.
But why a push towards computational governance and not just data governance? Because data governance is too limiting. Even when you include metadata in your governance (and of course, you do), you are still missing the entire ecosystem of computational resources linked to your systems. In a modern, multicloud & hybrid world, you must account for many more assets. It made sense to extend from data to computational governance.
You also rely on computational resources to establish governance, such as connectors and scanners, to, among other things, enable anomaly detection.
Your data governance team creates policies applicable to the entire organization, which the domain team will follow to achieve enterprise-level consistency and compliance. However, the domain team owns the local governance at the product and contract level, maximizing the team’s expertise.
Four principles
Like Alexandre Dumas’ Three Musketeers, who were four, the four principles of Data Mesh are intertwined.
Each principle influences the others, and as you design and build Data Mesh, you cannot look at one principle in isolation: you need to progress on the four fronts at the same time. It is easier than it seems, as you will see how Great Parts builds Data Mesh.
Building Our First Data Product
Now that you have read about the motivation, opportunities, and governing principles, it seems about time to build your first data product, or, more precisely, architect it before you implement it.
Before building an entire implementation of Data Mesh, you will need to focus on each data product.
You can divide the data product into five subcomponents:
• The dictionary services,
• The observability services,
• The control services,
• The data onboarding (the old ETL), and
• The interoperable data.
The dictionary interface is the precious sesame to your passive metadata. Your data product users can connect, without authentication, to the dictionary. Their data discovery is then extremely simplified as they can browse the dictionary in a very interactive way without a need for specific permissions, with additional description, and access to data lineage. When they find what they need, they can easily check that they have access or request access to the data.
The observability plane brings an interface between the built-in observability of the data product and REST clients. This allows a data scientist to gauge the quality of the data within the data product and decide if the data product will match their SLO (service-level objectives) expectations.
The control plane offers access to a REST API where you can control the onboarding and the data store(s). If you want to create a new version of your dataset in the data product, then there is an API call for that. Do you need to control which data quality rules should be applied to your data onboarding? There is an API call for that. This interface is mainly oriented towards data engineers managing the data products.
As you can imagine, the three sets of APIs are similar for each data product: there is no need to learn a new API for each data product. To simplify your usage, you can wrap your REST APIs in a Python API accessible via a notebook or your BI tool's SDK.
The data onboarding component is your old data pipeline on steroids. In many (if not all) pre-Data Mesh data engineering projects, the focus was on the data pipeline. The Data Mesh puts the pipeline back in its place. The pipeline is essential, but it is also an element of the data onboarding process, including observability and application data quality rules. Adding all those functions in this component secures the classic, often failing, fragile ETL process. Yup, the days when the pipeline is the quarterback of the team are behind us.
Last, but not least, the interoperable model is your critical data in a consumable way. In some cases, the data is your existing dataset (or datamart) wrapped into a data product. I could have represented this component as the classic cylinder you can see in older architecture diagrams, but remember that the data exposed by a data product is not always relational.
The promise of the data product is to separate the application from the data. This has an impact on the data modeling inside the data product.
Welcome to the Mesh
So far, you have learned a lot about the data product. Hopefully, you see the value of the data product, but what additional value would a data mesh bring over a crowd of data products?
A set of data contracts governs each data product: the primary data contract defines the relationship between the data product and its users. It also describes the interoperable model and SLA (service-level agreement) details. This consumer-oriented data contract can also be called an output or user data contract.
Figure 8 illustrates the role of the data contract. A data product may have several data contracts as input and offer a data contract for its consumer.

When the data products have meshed, like in Figure 9, the resulting data product inherits the data contracts from the source data products. This mechanism simplifies interoperability, increases data quality, and decreases time to market.

Iterative with Additional Value at Every Step of the Way
Let me introduce you to the Data Holy Trinity. This Trinity is composed of data contracts, data products, and Data Mesh. At times, you may want to remind yourself why we are doing all this work. Hopefully, your answer is the constant journey for extracting value from your data.
Although the average data warehouse or data lake project can take months or years to get value, following a product-oriented data engineering & management (PODEM) methodology can reduce the time to value to weeks.
Standardized Icons
As those concepts are maturing, we also need new symbols. The Bitol team & users gathered and settled on those shapes.
The data contract is an equilateral Triangle (rotated 90°), symbolizing the juncture of schema, business meaning, and SLAs. The symbol ▸ is called BLACK RIGHT-POINTING SMALL TRIANGLE, and its Unicode attributes are: U+25B8, UTF-8: E2 96 B8.
The data product is a horizontal Hexagon, symbolizing interconnectedness, multiple output ports, and usability. It is inherited from the hexagonal architecture. The symbol ⬣ is called HORIZONTAL BLACK HEXAGON, and its Unicode attributes are: U+2B23, UTF-8: E2 AC A3.
The Aegean symbol representing 90,000 is used for Data Mesh. It illustrates the vast quantity of interconnected data products you can have in your instance of Data Mesh. The symbol 𐄳 is called AEGEAN NUMBER NINETY THOUSAND, and its Unicode attributes are: U+10133, UTF-8: F0 90 84 B3.
Those symbols are starting to be more widespread, but we are still early in their adoption.
Challenges
The road to building a data product is paved with good ideas, but the devil is clearly in the implementation.
Nevertheless, here is some advice to get started.
As with any disruptive technologies and methodologies, be prepared to guide your users through this transition. Many data engineers live by the sacrosanct data pipeline, and reducing their idol to a mere component in a mesh can be traumatizing.
Prepare your leadership for the time to stand up a new platform: they should not expect results two weeks after you start (or even three…).
As with all product development, identify clearly who your users are and what tools they currently use. You may need to transition or extend their tooling, and this may create friction and resistance.
And the truth is, there is no “Data Mesh product” out there yet (I just joined Actian, give me a few weeks). There might be bricks, elements, or components that can be assembled to help you build your mesh (Spark remains a fantastic engine to perform your data transformation at scale, and more). However, there is nothing like an OTS (off-the-shelf) platform, whether commercial or open source.
The lack of software vendors in the field fosters innovation but equally chaos. The next few months will tell us whether we made the right choices in terms of user experience, technology choices, and implementations.
Jean-Georges “jgp” Perrin is a data & AI leader, author, inventor, and Lifetime IBM Champion. He leads product-oriented data engineering and management at Actian.
Extra resources
There are three excellent books:
Implementing Data Mesh: Design, Build, and Implement Data Contracts, Data Products, and Data Mesh (O’Reilly, 2024) by
and (me).Data Mesh for all ages (2023) by
(me).The reference book: Data Mesh: Delivering Data-Driven Value at Scale (O’Reilly, 2022) by Zhamak Dehghani.
Video:
Yours, humbly featuring The next generation of Data Platforms is the Data Mesh — DataFriday 2x07.
Communities:
Definitely pay a visit to Data Mesh Learning, where you will find links to the newsletter and a fast-growing Slack community.
Bitol is a Linux Foundation project working on fostering open standards for data contracts (ODCS), data products (ODPS), and more. Data Mesh Learning and Bitol share the same Slack workspace.
Discoverable, Addressable, Understandable, Natively accessible, Trustworthy and truthful, Interoperable and composable, Valuable on its own, and Secure.
From: De temps en temps, la tartine de confiture ne tombe pas du côté de la confiture.
JGP - always restack with a note here on Substack. Then, more people get to see this post. What is the most important update in your post?