Myth #1: Federated learning is only applicable for mobile devices
When conducting initial research into federated learning, many people first see examples involving edge devices. This is due to how federated learning developed, particularly in the earliest days of its technological development.
In 2016, a team of Google researchers introduced a paradigm-changing concept: the ability to train machine learning models from user interaction with mobile devices.
Since then, this concept has been further developed. Some papers split federated learning into two categories: "cross-device federated learning", and "cross-silo federated learning" and it is applicable to all scales of federation.
At Apheris, we are differentiating between internal across several business units in a company, in a one-to-one collaboration with a partner of your supply chain, as part of multi-party industry collaboration, or even across entire data ecosystems such as those envisioned by Gaia-X or Catena-X.
Myth #2: Only data scientists have to care about federated learning
One of federated learning's key features is the ability to do data science on data that is not visible.
This sounds like it is only relevant for data scientists, doesn't it?
But federated learning is far more than that. If done right, federated learning can be a scalable, technical solution to today's legal and business problems.
When combined with other privacy-enhancing technologies, FL enables companies to solve some of the most complex and time-consuming processes within their organization. They can stay compliant, protect the IP of sensitive data assets, maintain privacy and keep data sovereignty when leveraging data to create business value.
Today, we can solve all of this with the help of only one, scalable solution.
These issues would otherwise occupy thousands of highly skilled employees in companies across all industries!
Data Scientist
Access to more data
Trains AI models of higher quality
Brings AI successfully in production
Compliance
Knows that the IP of data and models stays protected
Data Sovereignty is given
Compliance, security and privacy are maintained
Executive
Enhances data & partnership potential
Increases the ROI of AI and data science projects
De-risks investments into AI
Myth #3: Federated learning preserves the privacy of data
Not quite: Federated learning on its own does not protect the privacy and IP of your data sets. Of course, it does have some considerable privacy-preservation advantages compared to traditional, centralized machine learning approaches, since it enables the training of a model whilst retaining personal training data on the servers.
Nevertheless, model parameters can still carry sensitive features that can be exploited to reconstruct or infer related personal information. To solve this, additional privacy-preserving technologies, such as Differential Privacy and comprehensive IT security measures must be employed to protect the IP and privacy of data and models.
Myth #4: Federated learning is a theoretical concept and not applied in production
In this paper, a team of Google researchers explain how they use federated learning in a commercial, global-scale setting to train, evaluate and deploy models to improve search suggestion quality without direct access to the underlying user data. Companies such as Apple and Samsung have already followed suit with similar use cases.
Besides that, there are plenty of other examples across all industries:
Pharma
MELLODDY is a project from a large consortium in the pharma space, involving companies such as Amgen, Bayer, MERCK, Novartis, AstraZeneca, and more. It aims to leverage the world’s largest collection of small molecules with known biochemical or cellular activity to enable more accurate predictive models and increase efficiencies in drug discovery.
Manufacturing
The European project MUSKETEER is focused on two industrial scenarios - smart manufacturing and healthcare. MUSKETEER aims to create a validated, federated, privacy-preserving machine learning platform that is interoperable, scalable, and efficient enough to be deployed in real use cases.
Healthcare
In a recent example, a team of NVIDIA researchers published a sensational use case for federated learning - predicting clinical outcomes in patients with COVID-19. They have shown that it is possible to reach an impactful result that would be otherwise unachievable if only using local data and centralized training.
Enterprise-grade Federated Learning Platforms
While there are federated learning systems used in production, the complexity of deploying and maintaining such a solution is still very high.
To date, there are only very few enterprise-grade solutions that apply this concept - one of them being Apheris.
In this context, we have made an interesting observation: People might have heard about federated learning, but they still have little knowledge of how to assess tools and platforms within this emerging discipline.
To support the evaluation process and to accelerate the industry-wide adoption of federated learning, we created the "Buyer's Guide to Secure Multi-Partner Data Collaboration". The guide contains a list of must-have features for any enterprise-grade platform, so you can make the best possible selection for your company and your use cases.
Myth #5: Federated learning doesn’t work on heterogenous data and data must be highly standardized
This is only partly true.
Common data models (CDM) such as CDISC for clinical trials, OMOP for Electronic Medical Records, or OPC-UA for Industry 4.0 are definitely essential to be able to collaborate on data. Companies also need to have a sufficiently high data governance maturity, ensuring that data is of high quality and suitable for machine learning.
But the reality shows that even with CDM, data from multiple parties is rarely clean and harmonized.
Enterprise-grade platforms like Apheris open up an entirely new discipline within federated & privacy-preserving data science.
Apheris supports the entire data science workflow, from data exploration to individual preprocessing pipelines for each dataset, running statistical analysis, testing, and validation of models - and all of that in a privacy-preserving manner.
Myth #6: Federated learning requires more computational resources than centralized learning
Recent studies have shown that training large models in conventional data centers can cause a significant increase in CO2eq production. Federated learning is a more carbon-friendly way to train neural networks and can have a positive impact on reducing carbon emissions.
The website ML CO2 Impact does a great job in raising awareness around this topic and even allows you to calculate your GPU's emissions.
In comparison, a team of researchers at the University of Cambridge published a paper in July 2021 which helps to quantify the CO2eq emissions of training deep learning models either in data centers or on the edge.
On a cross-device level, the massive increase in smart IoT devices will have a significant impact on carbon emissions. Federated learning is a more sustainable way to apply artificial intelligence to the Internet of Things.
In cross-silo scenarios, at Apheris we are currently working with manufacturers and their supply chain partners to help them achieve their long-term sustainability goals. With federated and privacy-preserving data science, multiple partners can securely leverage data from production and machine settings, which results in increased product sustainability and a lower CO2 footprint.
Myth #7: Open-Source frameworks allow companies to build up secure multi-partner data collaborations
Bringing federated learning to life and into production requires much more than just technology or frameworks. Especially because it operates on data - one of the most important, sensitive, and yet somehow intangible assets that companies have today.
Federated learning is an amazing tool that enables data collaborations - but you have to see it as one of many building blocks.
There are many other requirements that enterprise customers expect:
Secure Federated Architecture that ensures only computation results move between isolated and confidential environments
Support of the full data science workflow and integration of any Python library
Enterprise-grade security, such as access management, traceability & auditability, data encryption, and the highest of standards in development and staff security
Additional state-of-the-art privacy-enhancing technologies that protect data privacy and IP
A privacy approval process that enables optimal model utility while preserving data privacy
Legal & contractual support in form of comprehensive compliance frameworks and streamlined processes
Only in combination with enterprise-grade features and contractual frameworks can cross-company collaboration thrive and leverage federated learning to its full potential.
One Fact: Federated learning is a key technology of our future
Last but not least, we're going to close this article with a fact.
Mona Flores, Global Head of Medical AI at NVIDIA recently published an article that covers the results of a great paper around federated learning in healthcare.
The headline says it all: "Medical AI Needs Federated Learning, So Will Every Industry"
At Apheris, we are convinced that all projects that involve federated learning are extremely valuable. We are proud to contribute to a more ethical and sustainable future and are looking forward to furthering developments in the field.
Do you want to discuss how you can securely collaborate on data with partners? Let us know.