“Boldly go where no one has gone before…!”
MLOps maturity models are highly valuable to a business when it comes to the adoption of AI into enterprises. Microsoft’s or Google's versions show a clear and transparent roadmap, help to locate a company’s present stage and point out the next possible steps on how to move forward.
In countless cases, this has proven to be extremely useful, as having a clear target picture helps to avoid ending up in dead-ends in an already complex and fast-moving landscape such as in data science and AI. However, there is a problem with today’s MLOps maturity models: they focus on internal MLOps processes only.
While this seems perfectly fine for most organizations today that are still operationalizing MLOps in their own business, the most innovative frontrunners are now starting to reach high levels of internal ML sophistication. These innovators are now looking for opportunities for exponential innovation outside their own four walls but have little in the way of maturity reference points to guide them on their external, federated MLOps journeys.
Such is the nature of being a first mover!
Thought leaders in management and technology consultancies such as McKinsey, BCG, Gartner, Deloitte or Capgemini all agree that data sharing and data collaborations involving multiple companies are the means to further extend the competitive edge in AI. More specifically, Privacy-enhancing technologies (PETs) such as Federated Learning, Differential Privacy, or Synthetic Data are expected to be the driving force behind the creation of these collaborative data ecosystems.
This set of technologies aims to reduce friction and simplify today’s manual and time-consuming processes to align legal, compliance, and infrastructure for effective data sharing and trusted collaboration between organizations.
Yet, there is no roadmap on how to implement such technologies into production environments, and how to move beyond experimentation. That is why we are going to extend the known MLOps Maturity Models by another metric: The PET Adoption Stages.
What All PETs Have in Common: Complexity
Just as with other emerging technologies, Data Scientists and Data Leaders are facing uncertainty and ambiguity when dealing with PETs. The Report "Privacy Enhancing Technologies: Categories, Use Cases, and Considerations" by the Federal Reserve Bank of San Francisco names the following challenges:
Fundamental lack of internal capacities and specialized expertise within organizations
High variability in the configuration needed to deploy PETs
Complex integration and maintenance into existing tech stacks and data pipelines
Different stages of maturity - some PETs and frameworks are still in early phases of development
Using PETs does not guarantee privacy, and even enhanced techniques can be reversed or compromised
The inherent complexity of PETs, data, and AI are causing companies to experience heavy setbacks in trying to replace the previously manual processes of setting up data collaborations with third parties.
Today, we can observe three maturity stages:
- Level 1: Experimentation
- Level 2: Implementation
- Level 3: Scaling
Let us have a look at each of the steps in detail and check why only very few make it beyond a successful implementation.
Experimentation: What PET should I leverage for my use case?
At the first stage, we often see Data Scientists experimenting with one or more PETs, using mainly open-source frameworks. The article “An Overview of Approaches to Privacy-Preserving Data Sharing” does a great job in giving a comprehensive overview of PETs, and attributes them to privacy level, data utility, and their maturity stage. Another great resource is the PETs Adoption Guide by the UK Centre for Data Ethics and Innovation, where a decision tree helps to identify the right PET for your use case.
The main insight at this stage is that using only one PET, for example, Federated Learning, does not guarantee an improvement in data privacy and security. In real-life scenarios, it is more of a combination and orchestration of different PETs, combined with access controls and additional security layers. How such a productive environment must be built and secured, while maintaining data usability is dependent on many different variables. This makes it extremely difficult to scope a project beyond experimentation.
Implementation: How Do I Implement PETs Into Production?
Typically, the complex reality of implementing PETs overshadows the initial ambition. Multiple stakeholders are involved, such as data scientists, DevOps, IT Architects, or MLOps engineers, all with different expectations and ideas. A common realization at this point is that the scope of the project is growing way beyond the original plan.
At the technology level, we can see familiar “anti-patterns”. Just as in the popular illustration from the Google research paper "Hidden Technical Debt in Machine Learning Systems", where the small block ‘ML Code’ receives disproportionate attention compared with the vast and complex infrastructure surrounding it, the same becomes applicable to privacy-enhancing technologies. To be able to use PETs sensibly and with a strong focus on data science usability, an entire environment is required, which must be extended and customized depending on the use case, the assets that need to be protected, and how many partners you want to collaborate with.
The paper “Building Trusted Research Environments - Principles and Best Practices; Towards TRE ecosystems” gives guidance regarding the high-level architecture and how to safeguard such an environment.
Still, there are great uncertainties. Considering the available data management platforms, MLOps tools and PET frameworks, Data Scientists and Data Engineers are facing almost infinite design choices. Trying to replicate the setup of previously successful use cases in new scenarios is difficult, if not impossible due to the customizations required for each. This can overwhelm even the best data science and engineering teams and very few manage to overcome this complexity.
The great ambiguity and lack of standards at the previous levels lead to uncontrolled development of bespoke methods, frameworks, and environments, influenced by the many different variables across enterprises:
Type of systems and applications, data formats, and ML algorithms
Number and type of data and ML pipelines
Other data analysis platforms and technologies involved
Regulation and governance that must be observed
Such systems become more and more difficult to control and govern as the complexity increases. Ultimately, this leads to great inefficiencies and unacceptable security risks, especially regarding the privacy and IP of sensitive data and ML models.
Scaling: How To Engage Into Collaborative Data Ecosystems On Request and At Scale?
To truly enable highly dynamic collaborative data ecosystems, we require a robust framework that drives consistency with integrated workflows of tools and infrastructure across organizations, without having to reinvent the wheel each time again. Federated MLOps is the urgently needed evolution of DevOps and MLOps, that focuses heavily on a data-centric and collaborative approach to AI, while keeping data privacy and IP protection of all assets as highest priority. With Federated MLOps, enterprises can finally:
Increase quality and effectiveness of collaborative ML with standard processes
Drive consistency with integrated workflows across all tools and infrastructure, without sacrificing flexibility
Manage privacy-preserving data science as an enterprise capability
Safely and universally manage privacy-enhancing technologies and data science with enterprise controls across companies
Unlock privacy-preserving and federated MLOps at enterprise scale
Enable secure, continuous, and scalable feedback loops between testing and operationalizing of ML models or AI-enabled data products
The best-of-breed approach to a safe, rapid, and managed implementation on this scale, is to use open platforms for federated & privacy-preserving data science. The core capabilities of such a platform provide an operational blueprint for scalable, secure data collaborations with multiple partners, while providing flexibility by supporting already-existing tech stacks, pipelines, data lakes, and data products. In our Buyer’s Guide we go more into the specifics on how to implement Federated MLOps into enterprises and include tons of valuable insights for your next step on your AI journey.
There Has Never Been a Better Time for Collaborative Data Ecosystems
Admittedly, building sustainable and collaborative data ecosystems is a long way off for most, and many have previously failed in the attempt. But the time is now. Never in the history of data science and AI have so many technologies converged that could enable the most innovative among us to push the boundaries of MLOps: Moving from the internal to the external, to establish truly secure and collaborative data ecosystems on a global scale.