When thinking about using federated learning, there are several open-source frameworks and software options available. The right choice is highly dependent on the purpose and nature of the use case.
The most important questions you should ask yourself are:
How often do you want to apply federated learning?
How standardized does the setup need to be?
How much support do you need with implementation and maintenance?
How much governance, security and privacy capabilities do you need for your use case?
In this piece, we're excited to share with you a snapshot of some of the most beloved open-source frameworks out there.
It's truly inspiring to witness the dedication of the various teams striving to make the world's data more interconnected. At Apheris, we're on a similar mission, using NVIDIA FLARE as our core federation engine while incorporating all the essentials for high-stakes scenarios that demand top-notch security, privacy, and governance.
We've spent quite some time delving into and experimenting with these frameworks. By sharing our experiences, we aim to make it easier for everyone to find their way through the fast-expanding universe of federated learning.
Open-Source Software for Federated Learning
As always, the open-source community has done fantastic work over the past years. Kudos to all teams! We can't list all frameworks available here but want to highlight a few popular examples that we had a closer look at a year ago before we decided on our new federation engine. All projects listed below have an Apache 2 license (Data taken from GitHub, April 2024).
Project Name | Maintainer | # of Stargazers | # of Contributors |
---|---|---|---|
NVIDIA FLARE | NVIDIA | 523 | 34 |
FLOWER | Flower | 4100 | 121 |
Substra | Owkin | 267 | 36 |
FATE | WeBank | 5500 | 86 |
PySyft | OpenMined | 9200 | 424 |
OpenFL | Linux Foundation | 654 | 78 |
TensorFlow Federated | 2300 | 107 |
Each of the communities, engineers, and data scientists involved in these projects has made a fantastic effort to further push the research and development in federated learning and privacy-preserving data science. Thank you all for the great work!
Let’s take a look at each project.
NVIDIA FLARE
NVIDIA FLARE (FLARE) is maintained by the one and only NVIDIA team. The federation engine might not have the largest open-source community of our selection yet but a tremendous amount of well designed (security & governance relevant) features, a security-hardened architecture and is domain-agnostic. Furthermore, FLARE makes it easy to use models from MONAI and Hugging Face and enables ML Engineers to easily connect to existing ML workflows (PyTorch, RAPIDS, Nemo, TensorFlow).
Features we particularly liked are
FL Simulator for rapid development and prototyping
Privacy Preservation with differential privacy, homomorphic encryption plus more good stuff
and the specification-based API for extensibility.
The FLARE project came out of NVIDIA CLARA which is a suite of computing platforms for highly sensitive industries like life sciences, genomics and drug discovery. Flare has been battle hardened in many projects and has been implemented by several companies.
Learning resources:
Flower
Flower is not only a “Friendly Federated Learning Framework” but also an extremely friendly community. Join their open Slack channel and you’ll see everyone is very kind & supportive. The Flower team has gathered the second largest contributor base of our selection and does a fantastic job driving federated learning forward.
For this they follow clear principles
Flower is highly customizable to be adapted to each individual use case
Extendability: Many components can be extended and overridden to build new state-of-the-art systems
Framework-agnostic: stay in your preferred framework pytorch, TensorFlow, scikit-ealrn, you name it… best quote from the repo “even raw NumPy for users who enjoy computing gradients by hand”: you see, a very friendly project!
Learning resources:
Substra
Substra is a federated learning software initially developed by a multi-partner research project around Owkin. Owkin contributed Substra to the Linux Foundation who is now hosting Substra.
Substra is focused on the medical field with the purpose of data ownership and privacy. Substra supports a wide variety of interfaces for different types of users. It has a python library for data scientists, command-line interfaces for admins, and graphical user interfaces for project managers and other high-level users. In terms of deployment, Substra involves a complex Kubernetes setup for every node.
The key features of Substra are:
Privacy: Substra uses trusted execution environments (also called enclaves) that enables setting aside private regions for code and data
Traceability: Substra writes all operations on the platform to an immutable ledger
Security: Substra encrypts model updates, data on rest, and network communication
Learning resources:
PySyft
PySyft is an open-source Python 3 based library that enables federated learning for research purposes and uses FL, differential privacy, and encrypted computations. It was developed by the OpenMined community and works mainly with deep learning frameworks such as PyTorch and TensorFlow.
PySyft supports two types of computations:
Dynamic computations over data that cannot be seen
Static computations, which are graphs of computations that can be executed later on in a different computing environment
PySyft defines objects, machine learning algorithms, and abstractions. With PySyft, you can't work on real data science problems that involve communication across networks. This would require another library, called PyGrid.
PyGrid implements federated learning on web, mobile, edge devices, and different types of terminals. PyGrid is the API to manage and deploy PySyft at scale. It can be controlled using PyGrid Admin.
PyGrid consists of three different components:
Domain: A Flask based application used to store private data and models for federated learning
Worker: An ephemeral compute instance managed by domain components to perform computations on data
Network: A Flask based application to monitor and control different domain components
Learning resources:
FATE
FATE (Federated AI Technology Enabler) is an open-source project that aims to support a secure and federated AI ecosystem. FATE is available for standalone and cluster deployment setups. The open-source framework is backed by WeBank, a private-owned neo bank based in Shenzhen, China.
For using and writing custom models for it, you need to have some knowledge of protocol buffers.
Learning resources:
OpenFL
Intel® Open Federated Learning is a Python 3 open-source project developed by Intel to implement FL on sensitive data. OpenFL has deployment scripts in bash and leverages certificates for securing communication, but requires the user of the framework to handle most of this by himself.
The library consists of two components: the collaborator, which uses a local dataset to train global models and the aggregator, which receives the model updates and combines them to create the global model. OpenFL comes with a Python API and a command-line interface.
The communication between the nodes is done using mTLS, and hence certificates are required. It is necessary to certify each node in the federation. OpenFL supports lossy and lossless compression of data to reduce communication costs. OpenFL allows developers to customize logging, data split methods, and aggregation logic.
The OpenFL design philosophy is based on the Federated Learning (FL) Plan. It is a YAML file that defines required collaborators, aggregators, connections, models, data, and any required configuration. OpenFL runs on docker containers to isolate federation environments.
Learning resources:
TensorFlow Federated
TensorFlow Federated (TFF) is a Python 3 open-source framework for federated learning developed by Google. The main motivation behind TFF was Google's need to implement mobile keyboard predictions and on-device search. TFF is actively used at Google to support customer needs.
TFF consists of two main API layers:
Federated Core (FC) API
FC is a programming environment for implementing distributed computations. Each computation performs complex tasks and communicates over the network to coordinate and align. It uses pseudo-code similar abstraction to express program local executable in various target runtimes (mobiles, sensors, computers, embedded systems, etc.) since a performant multi-machine runtime is included with TFF.
Federated Learning (FL) API
A high-level API enables plugging existing machine learning models to TFF without deep-diving into how federated learning algorithms work. FL API is built on top of FC API.
Federated Learning API consists of three main parts:
Models: Classes and helper functions that enable the wrapping of existing models with TFF
Federated Computation Builders: Helper functions to construct federated computations
Datasets: Canned collections of data to use for simulation scenarios
The separation of layers between FL and FC is intended to facilitate work done by different users. Federated Learning API helps machine learning developers to implement FL to TF models and for FL researchers to introduce new algorithms, while the Federated Core API is for systems researchers.
Learning resources:
When Federated Learning is not enough
Federated learning is an innovative approach that allows for the training of machine learning models in a way that protects privacy, without the need to exchange raw data. This method is particularly valuable in fields like healthcare, where patient information is highly sensitive, and in manufacturing, where protecting intellectual property is crucial.
However, a straightforward federated learning setup might not always suffice, especially when strict regulatory or internal compliance standards are in play. To address this, the Apheris team has introduced the Compute Gateway. This technology operates at the edge within a Data Provider's environment, empowering them to oversee the processing of their sensitive data on an algorithmic level.
This enhanced level of oversight is what we refer to as federated Computational Governance, a flexible approach applicable across various types of data and algorithms. At its core, federated learning is all about collaboration on data - the Compute Gateway adds the capability to shake hands on a computational level.
Our solution today integrates with NVIDIA FLARE which means you get all the brilliant features, ecosystem integrations as well as other goodies on top. But we are, of course, very happy to see the collaborative spirit of the open-source community also among federated learning projects and the recently announced collaboration between Flower and NVIDIA FLARE. Great things to come!
We go more into depth on what it takes to implement federated learning into enterprise MLOps pipelines in our whitepaper.
Feel free to reach out to us if you want to learn more.