Skip to content

Compute SpecsπŸ”—

What is a Compute SpecπŸ”—

A Compute Spec is a contract between a Data Scientist and a Data Custodian organization that specifies computational access to their sensitive data. An Data Scientist can submit a Compute Spec using the Apheris CLI, and a data custodian can see it in the Apheris Governance Portal.

The Compute Spec specifies the particular model and parameters for securely running statistics functions and machine learning models on specified datasets. Think of it as a blueprint that includes:

  • Dataset ID: Each Compute Spec is linked to a specific dataset or datasets by an identifier. This ID is used to identify which Compute Gateways the code runs on
  • Model: Each Compute Spec specifies one model in our Model Registry to run on the specified data. Generally, a model is defined through a Docker image that contains the pre-configured environment in which the code will execute. It ensures consistency and reproducibility by packaging the code, runtime, system tools, system libraries, and settings.
  • Max Compute resources: The Compute Spec details the required computing resources; specifically the number of virtual CPUs, the amount of RAM, and any GPU requirements, for both computations on the Gateway (client) and the Orchestrator (server) side. Data Scientists and ML Engineers can specify the resources they want to allocate for their computations, to ensure these will meet the task's demands. The resources are allocated while a Compute Spec is activated, and they are freed up when the Compute Spec is deactivated. At the same time, Data Custodians have control over the resources they make available to the Gateway - the Data Scientists and ML Engineers can only allocate resources within these pre-determined boundaries.

In essence, a Compute Spec is a comprehensive contract between Data Scientist and Data Custodian that delineates how, where, and on what data a model code can execute within the respective Compute Gateway.

Creating Compute SpecsπŸ”—

As a Data Scientist, you create the Compute Spec using the Apheris CLI. For details on how to create Compute Specs, please refer to our tutorials for

When you create a Compute Spec, it is submitted from your Apheris CLI to the Apheris Orchestrator. The data custodian organization to whom the datasets in a Compute Spec belongs, can view the incoming Compute Specs in the Apheris Governance Portal by clicking β€œCompute Spec” in the navigation on the left.

Here you can see a table of Compute Specs, together with the information by whom the Compute Spec was created and which organization they are from, as well as the datasets and model that they requested to use.

Apheris Compute Spec overview.png

Important

Please note that a federated computation across multiple datasets requires that each dataset resides in a different Gateway.

Approval of Compute SpecsπŸ”—

For any Compute Spec that a Data Scientist creates, the Apheris product validates whether this Compute Spec satisfies the computational access that the corresponding data custodian has granted to this particular Data Scientist in the Asset Policy.

Imagine that you as a Data Custodian created an Asset Policy, in which you gave Data Scientist X the permission to run model Y with specific parameters on your dataset Z. Then any Compute Spec that a Data Scientist creates which doesn’t satisfy this (or other) Asset Policies you have defined is rejected. And any Compute Spec that satisfies the defined Asset Policies is automatically approved. Hence there is no manual approval step for you as a Data Custodian, hence you control what computational access you want to grant to a Data Scientist via Asset Policies.

Activating Compute SpecsπŸ”—

An approved Compute Spec allows the Data Scientist to run actual computations on the data. To do this, first, you as a Data Scientist activate the Compute Spec. This means that the resources specified in the Compute Spec as well as the model code are provisioned (both on the Orchestrator as well as on the Gateway), so that you are in a ready state to run actual computations.

Concretely, when you activate a Compute Spec, this means the Apheris product will provision one NVFlare server on the Orchestrator, and one NVFlare client on each Compute Gateway that contains datasets as specified in the Compute Spec. Generally, a Gateway can have multiple activated Compute Specs in parallel (as much as the resources of the infrastructure the Gateway is deployed in will allow for).

When you as a Data Scientist initially activate a Compute Spec, there will be a short delay (up to 2-3 mins on first run) while the corresponding resources are being provisioned. And once the Compute Spec is successfully activated, you can submit as many jobs using the Apheris CLI as you like.

Once you are done with running jobs on the data, you can deactivate the Compute Spec, which will shut down any resources. As long as the Compute Spec is approved, you can activate and deactivate the same Compute Spec as often as you want.

Scalability of Compute SpecsπŸ”—

For a specific Compute Spec ID, you can have only one activation at the same time - so there can never be two activated Compute Specs in parallel for the same Compute Spec ID. If you want to scale horizontally, i.e. run computations in parallel, you need to create further Compute Specs with the same content as the first one. Each Compute Spec you create will get it’s own Compute Spec ID, and you can activate each one, which allows you to run multiple computations in parallel (if the resources on the Compute Gateway infrastructure allow for that).

Within one Compute Spec, you can have as many jobs as you like, these will run sequentially and are queued up using NVFlare mechanisms. Jobs within a single Compute Spec cannot run in parallel.

In order to scale vertically, you can change the hardware requirements in the Compute Spec you create.

Culling Idle Compute SpecsπŸ”—

The Global Environment Culling feature automatically terminates idle Compute Specs across all gateways. Apheris can configure this feature upon customer request during deployment.

A Compute Spec is deemed idle and eligible for termination if it has not had any running Compute Spec Jobs within a specified duration.

This duration, referred to as the 'idle threshold', is configurable to suit different operational needs.

ConfigurationπŸ”—

The environment culling feature can be globally configured with the following options:

environmentCulling:
  enabled: true
  schedule: "0 23 * * *"
  timezone: "UTC"
  threshold: "12h"

Enabled: Set to true to enable automatic cleanup of idle Compute Specs globally.

Schedule: Specifies when the cleanup process runs. It follows cron syntax.

Timezone: Defines the timezone in which the schedule should be interpreted.

Threshold: Determines how long a Compute Specs can remain idle before it is eligible for termination.

Configure your EnvironmentπŸ”—

Please contact your Apheris representative if you want to change the default configuration for this feature in your environment.