Skip to content

Regression Models Reference🔗

The Apheris Regression Models codebase is a toolkit to allow you to run regressions on Apheris.

We currently support two models: the general logistic regression and the Cox Regression.

Remote inference is not currently supported for the logistic regression as this would allow a user to reverse engineer the input features from predictions.

regression.logistic_regression.api_client🔗

fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2) 🔗

Trains a linear regression model using the specified datasets and session.

Parameters:

Name Type Description Default
datasets Union[Iterable[FederatedDataFrame], FederatedDataFrame]

The datasets to be used for training.

required
session Union[SupervisedMLSession, LocalDebugMLSession]

The session object that defines compute_spec and dataset ids,

required
num_rounds int

The number of training rounds to perform.

5
feature_cols List[Union[int, float, str]]

Columns to be used as features in the model.

required
target_col Union[int, float, str]

Column to be used as the target variable.

required
validation_set_col Optional[Union[int, float, str]]

Column to be used for the validation set. Defaults to None.

None
validation_split float

Fraction of the data to be used for validation. Defaults to 0.2.

None
feature_selector_direction Optional[str]

Direction for feature selection ('forward' or 'backward'). Defaults to None.

None
num_steps_per_round int

Number of steps to perform per round. Defaults to 2.

2

Returns:

Name Type Description
results dict

Dictionary containing the model parameters as results of the training process.

validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter, validation_metric='accuracy') 🔗

Validates the model based on the specified validation dataset. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. feature_cols (List[Union[int, float, str]]): Columns to be used as features in the model. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_lr function. validation_metric (str): The metric to be used for validation. Can be any scorer listed in sklearn.metrics.get_scorer_names or "confusion_matrix", defaults to "accuracy". Returns: results: Dictionary containing the predicted values.

regression.cox.api_client🔗

fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2) 🔗

Actual training job to fit a cox regression model to given federated datasets.

Parameters:

Name Type Description Default
datasets Union[Iterable[FederatedDataFrame], FederatedDataFrame]

List of FederatedDataFrame or single FederatedDataFrame that point to the datasets to be used for training.

required
session Union[SupervisedMLSession, LocalDebugMLSession]

session object that defines compute_spec and dataset ids,

required
num_rounds int

Number of training rounds

5
time_col Union[int, float, str]

Column name of integer valued time columns,

required
target_col Union[int, float, str]

Column name of ground truth,

required
validation_set_col Optional[Union[int, float, str]]

Column name of boolean valued validation set indicator, if None a validation set is obtained by a train test split of 20%

None
max_time int

maximum time over all datasets, if not given it is computed in a preliminary max computation,

-1
num_steps_per_round int

number of steps per federated round,

2

Returns:

Type Description
dict

A dictionary with three keys:

dict
  • coef: The regression coefficients (betas) for each covariate in the model
dict
  • baseline_hazard: The baseline hazard function, that is the risk of the event happening at a particular time point for a baseline individual (one with all covariates equal to zero).
dict
  • cumulative_hazard: The integral of the baseline hazard over time, representing the total accumulated risk of the event up to a specific time point.

Raises:

Type Description
RuntimeError

If the job cannot be created

TimeoutError

If the job takes longer than the supplied timeout

ResultsNotFound

If the job did not complete due to an error. In this case, please check the supplied logs for more details.

validate_cox(datasets, session, time_col, ground_truth_col, modelparameter) 🔗

Validates the model based on the specified validation datasets and column ground_truth_col. For validation, the default scoring function from the lifelines CoxPH model, which is the average partial log-likelihood, is used. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. time_col (Union[int, float, str]): Column to be used as time column for the cox inference. The time column should be integer valued. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_coxph function.

Returns:

Name Type Description
results dict

Dictionary containing the predicted values.

regression.session🔗

LocalDebugMLSession 🔗

Bases: LocalDebugSimpleStatsSession

Local session object that connects the regression model with nvflare simulator and supports running a simulation of an SupervisedMLJobDefinition.

SupervisedMLSession 🔗

Bases: SimpleStatsSession

Session object that connects the regression models with job api and supports running a SupervisedMLJobDefinition.

Can be instantiated manually for a running compute spec, but typically will be created using the provision function from regression.session.provision.

regression.session.provision🔗

provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None) 🔗

Create and activate a compute spec to run remote regression models on Apheris.

Parameters:

Name Type Description Default
dataset_ids List[str]

A list of Apheris dataset IDs

required
client_n_cpu float

The fractional number of CPUs to request in the compute spec for the Compute Gateways. Consider increasing this if your computation takes too long.

0.5
client_memory int

The amount of client memory to request in the compute spec for the Compute Gateways. Consider increasing this if your computation runs out of memory.

1000
server_n_cpu float

The fractional number of CPUs to request in the compute spec for the Orchestrator. Consider increasing this if your computation takes too long during aggregation.

0.5
server_memory int

The amount of client memory to request in the compute spec for the Orchestrator. Consider increasing this if your computation runs out of memory in the Orchestrator.

1000
modelversion Optional[str]

The version of regression models to use for this session. Defaults to the latest available version.

None

Returns:

Type Description
SupervisedMLSession

A SupervisedMLSession that should be used as input to the regression functions, such as fit_coxph.

misc🔗

ResultsNotFound 🔗

Bases: Exception