Regression Models Reference🔗
The Apheris Regression Models codebase is a toolkit to allow you to run regressions on Apheris.
We currently support two models: the general logistic regression and the Cox Regression.
Remote inference is not currently supported for the logistic regression as this would allow a user to reverse engineer the input features from predictions.
regression.logistic_regression.api_client🔗
fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2)
🔗
Trains a linear regression model using the specified datasets and session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterable[FederatedDataFrame], FederatedDataFrame]
|
The datasets to be used for training. |
required |
session |
Union[SupervisedMLSession, LocalDebugMLSession]
|
The session object that defines compute_spec and dataset ids, |
required |
num_rounds |
int
|
The number of training rounds to perform. |
5
|
feature_cols |
List[Union[int, float, str]]
|
Columns to be used as features in the model. |
required |
target_col |
Union[int, float, str]
|
Column to be used as the target variable. |
required |
validation_set_col |
Optional[Union[int, float, str]]
|
Column to be used for the validation set. Defaults to None. |
None
|
validation_split |
float
|
Fraction of the data to be used for validation. Defaults to 0.2. |
None
|
feature_selector_direction |
Optional[str]
|
Direction for feature selection ('forward' or 'backward'). Defaults to None. |
None
|
num_steps_per_round |
int
|
Number of steps to perform per round. Defaults to 2. |
2
|
Returns:
Name | Type | Description |
---|---|---|
results |
dict
|
Dictionary containing the model parameters as results of the training process. |
validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter, validation_metric='accuracy')
🔗
Validates the model based on the specified validation dataset.
Args:
datasets (FederatedDataFrame): The dataset to be used for prediction.
session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object
that defines compute_spec and dataset ids.
feature_cols (List[Union[int, float, str]]): Columns to be used as features in
the model.
ground_truth_col (Union[int, float, str]): Column to be used as the ground truth.
modelparameter (dict): The model parameters to be used for prediction as
dictionary. This can be the output of the fit_lr function.
validation_metric (str): The metric to be used for validation. Can be
any scorer listed in sklearn.metrics.get_scorer_names
or "confusion_matrix", defaults
to "accuracy".
Returns:
results: Dictionary containing the predicted values.
regression.cox.api_client🔗
fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2)
🔗
Actual training job to fit a cox regression model to given federated datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterable[FederatedDataFrame], FederatedDataFrame]
|
List of FederatedDataFrame or single FederatedDataFrame that point to the datasets to be used for training. |
required |
session |
Union[SupervisedMLSession, LocalDebugMLSession]
|
session object that defines compute_spec and dataset ids, |
required |
num_rounds |
int
|
Number of training rounds |
5
|
time_col |
Union[int, float, str]
|
Column name of integer valued time columns, |
required |
target_col |
Union[int, float, str]
|
Column name of ground truth, |
required |
validation_set_col |
Optional[Union[int, float, str]]
|
Column name of boolean valued validation set indicator, if None a validation set is obtained by a train test split of 20% |
None
|
max_time |
int
|
maximum time over all datasets, if not given it is computed in a preliminary max computation, |
-1
|
num_steps_per_round |
int
|
number of steps per federated round, |
2
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with three keys: |
dict
|
|
dict
|
|
dict
|
|
Raises:
Type | Description |
---|---|
RuntimeError
|
If the job cannot be created |
TimeoutError
|
If the job takes longer than the supplied timeout |
ResultsNotFound
|
If the job did not complete due to an error. In this case, please check the supplied logs for more details. |
validate_cox(datasets, session, time_col, ground_truth_col, modelparameter)
🔗
Validates the model based on the specified validation datasets and column ground_truth_col. For validation, the default scoring function from the lifelines CoxPH model, which is the average partial log-likelihood, is used. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. time_col (Union[int, float, str]): Column to be used as time column for the cox inference. The time column should be integer valued. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_coxph function.
Returns:
Name | Type | Description |
---|---|---|
results |
dict
|
Dictionary containing the predicted values. |
regression.session🔗
LocalDebugMLSession
🔗
Bases: LocalDebugSimpleStatsSession
Local session object that connects the regression model with nvflare simulator and supports running a simulation of an SupervisedMLJobDefinition.
SupervisedMLSession
🔗
Bases: SimpleStatsSession
Session object that connects the regression models with job api and supports running a SupervisedMLJobDefinition.
Can be instantiated manually for a running compute spec, but typically will be
created using the provision
function from regression.session.provision
.
regression.session.provision🔗
provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None)
🔗
Create and activate a compute spec to run remote regression models on Apheris.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_ids |
List[str]
|
A list of Apheris dataset IDs |
required |
client_n_cpu |
float
|
The fractional number of CPUs to request in the compute spec for the Compute Gateways. Consider increasing this if your computation takes too long. |
0.5
|
client_memory |
int
|
The amount of client memory to request in the compute spec for the Compute Gateways. Consider increasing this if your computation runs out of memory. |
1000
|
server_n_cpu |
float
|
The fractional number of CPUs to request in the compute spec for the Orchestrator. Consider increasing this if your computation takes too long during aggregation. |
0.5
|
server_memory |
int
|
The amount of client memory to request in the compute spec for the Orchestrator. Consider increasing this if your computation runs out of memory in the Orchestrator. |
1000
|
modelversion |
Optional[str]
|
The version of regression models to use for this session. Defaults to the latest available version. |
None
|
Returns:
Type | Description |
---|---|
SupervisedMLSession
|
A |
misc🔗
ResultsNotFound
🔗
Bases: Exception