top of page

Data contracts within a Data mesh?

Writer's picture: Rudy NauschRudy Nausch

Without high-quality data, every AI and analytics initiative will be underwhelming at best and actively damaging the business at worst.


To overcome this problem, data producers must be willing to take ownership of production data and collaborate with data consumers to support high-value use cases.


Under a data mesh governance structure, this becomes the default paradigm and the concept of a data contract is an interesting approach.


Data contracts are API-based agreements between producers and consumers designed to solve exactly that problem. Data contracts could be implemented by these steps:

  1. Identify use cases for your data

  2. Create requirements around the schema and values of the data

  3. Document the expected semantics of the dataset

  4. Collaborate with data producers to define the potential value of the use cases

  5. Identify masking policies based on regulatory and privacy requirements

  6. Define the contract and infrastructure, as code, in a source control repository

  7. Validate a robust curation layer above raw CDC events

  8. Automate schema compatibility checks in the CI/CD workflow

  9. Make the data available through the business facing layer, with role based masking

  10. Write integration tests to verify semantic validity

  11. Create data tests to verify data correctness

  12. Generate monitors to alert on shifts in semantics & anomalies

  13. Push all contracts to a catalogue for discoverability and re-use


This framework treats data as code and results in an explosion of conversation and collaboration around what data is meaningful, what it semantically means, how it is used, and where it should be used.


Data Contracts are not a new concept. They are simply new implementations of a very old idea - that producers and consumers should work together to generate high-quality, semantically valid data from the ground up instead of insisting on modelling poor-quality data exclusively after it lands in a Data lake.


Data contracts could eventually form a backbone of production-quality data and are important in driving AI/ML, advanced analytics, and other high-value use cases.


Original post Chad Sanderson, with some edits and added ideas by me.

9 views0 comments

Recent Posts

See All

Yorumlar


©2024 by Eudaimonic. All Rights Reserved.

bottom of page