Derived Datasets🔗

Apheris empowers data scientists to create Derived Datasets, facilitating the reuse of pre-processed information. This optional feature requires Data Persistence to be enabled on the Gateway.

As the owner of the original Dataset, you can view but not edit Derived Datasets. The Datasets used to create a Derived Dataset are displayed in the "derived from" section of the dataset view.

Viewing a Derived Dataset shows its inheritance — Derived Datasets show the dataset they were derived from.

Multi-Stage Derivation🔗

A new Derived Dataset can be created from either regular Datasets or existing Derived Datasets. Regular Datasets are created and managed by Data Custodians.

Derived Datasets do not have asset policies directly associated with them. Instead, they inherit policies from the regular Datasets they are derived directly or indirectly from (ie: when using another Derived Dataset as input).

When a Compute Spec uses existing Derived Datasets to create new Derived Datasets during its execution, the newly created Datasets store the following information:

List of input Datasets: can include a mix of regular Datasets and Derived Datasets.
List of policy-providing Datasets: only include regular Datasets (i.e., the root datasets in the derivation tree).

Managing Access to Derived Datasets🔗

Derived Datasets inherit permissions from their policy-providing Datasets. To allow new models and users to utilize newly created Derived Datasets, the asset policies of all contributing regular Datasets must permit the combination of these new models and users.

It's crucial to understand that a newly created Derived Dataset will be visible only to users who have permission to access all of its contributing policy-providing Datasets.

Important

Data from outside the Gateway must not be stored in Derived Datasets, including aggregate data coming from other Gateways or sensitive data present in the model's Docker image.

Asset Policies list datasets and derived datasets — Asset policies list dataset and derived datasets that they are applied to.