Regression Models Reference🔗

The Apheris Regression Models codebase is a toolkit to allow you to run regressions on Apheris.

We currently support two models: the general logistic regression and the Cox Regression.

Remote inference is not currently supported for the logistic regression as this would allow a user to reverse engineer the input features from predictions.

regression.logistic_regression.api_client🔗

`fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2)` 🔗

Trains a linear regression model using the specified datasets and session.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	The datasets to be used for training.	required
`session`	`Union[SupervisedMLSession, LocalDebugMLSession]`	The session object that defines compute_spec and dataset ids,	required
`num_rounds`	`int`	The number of training rounds to perform.	`5`
`feature_cols`	`List[Union[int, float, str]]`	Columns to be used as features in the model.	required
`target_col`	`Union[int, float, str]`	Column to be used as the target variable.	required
`validation_set_col`	`Optional[Union[int, float, str]]`	Column to be used for the validation set. Defaults to None.	`None`
`validation_split`	`float`	Fraction of the data to be used for validation. Defaults to 0.2.	`None`
`feature_selector_direction`	`Optional[str]`	Direction for feature selection ('forward' or 'backward'). Defaults to None.	`None`
`num_steps_per_round`	`int`	Number of steps to perform per round. Defaults to 2.	`2`

Returns:

Name	Type	Description
`results`	`dict`	Dictionary containing the model parameters as results of the training process.

`validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter, validation_metric='accuracy')` 🔗

Validates the model based on the specified validation dataset. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. feature_cols (List[Union[int, float, str]]): Columns to be used as features in the model. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_lr function. validation_metric (str): The metric to be used for validation. Can be any scorer listed in sklearn.metrics.get_scorer_names or "confusion_matrix", defaults to "accuracy". Returns: results: Dictionary containing the predicted values.

regression.cox.api_client🔗

`fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2)` 🔗

Actual training job to fit a cox regression model to given federated datasets.

Parameters:

Name	Type	Description	Default
`datasets`	`Union[Iterable[FederatedDataFrame], FederatedDataFrame]`	List of FederatedDataFrame or single FederatedDataFrame that point to the datasets to be used for training.	required
`session`	`Union[SupervisedMLSession, LocalDebugMLSession]`	session object that defines compute_spec and dataset ids,	required
`num_rounds`	`int`	Number of training rounds	`5`
`time_col`	`Union[int, float, str]`	Column name of integer valued time columns,	required
`target_col`	`Union[int, float, str]`	Column name of ground truth,	required
`validation_set_col`	`Optional[Union[int, float, str]]`	Column name of boolean valued validation set indicator, if None a validation set is obtained by a train test split of 20%	`None`
`max_time`	`int`	maximum time over all datasets, if not given it is computed in a preliminary max computation,	`-1`
`num_steps_per_round`	`int`	number of steps per federated round,	`2`

Returns:

Type	Description
`dict`	A dictionary with three keys:
`dict`	coef: The regression coefficients (betas) for each covariate in the model
`dict`	baseline_hazard: The baseline hazard function, that is the risk of the event happening at a particular time point for a baseline individual (one with all covariates equal to zero).
`dict`	cumulative_hazard: The integral of the baseline hazard over time, representing the total accumulated risk of the event up to a specific time point.

Raises:

Type	Description
`RuntimeError`	If the job cannot be created
`TimeoutError`	If the job takes longer than the supplied timeout
`ResultsNotFound`	If the job did not complete due to an error. In this case, please check the supplied logs for more details.

`validate_cox(datasets, session, time_col, ground_truth_col, modelparameter)` 🔗

Validates the model based on the specified validation datasets and column ground_truth_col. For validation, the default scoring function from the lifelines CoxPH model, which is the average partial log-likelihood, is used. Args: datasets (FederatedDataFrame): The dataset to be used for prediction. session (Union[SupervisedMLSession, LocalDebugMLSession]): The session object that defines compute_spec and dataset ids. time_col (Union[int, float, str]): Column to be used as time column for the cox inference. The time column should be integer valued. ground_truth_col (Union[int, float, str]): Column to be used as the ground truth. modelparameter (dict): The model parameters to be used for prediction as dictionary. This can be the output of the fit_coxph function.

Returns:

Name	Type	Description
`results`	`dict`	Dictionary containing the predicted values.

regression.session🔗

`LocalDebugMLSession` 🔗

Bases: LocalDebugSimpleStatsSession

Local session object that connects the regression model with nvflare simulator and supports running a simulation of an SupervisedMLJobDefinition.

`SupervisedMLSession` 🔗

Bases: SimpleStatsSession

Session object that connects the regression models with job api and supports running a SupervisedMLJobDefinition.

Can be instantiated manually for a running compute spec, but typically will be created using the provision function from regression.session.provision.

regression.session.provision🔗

`provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None)` 🔗

Create and activate a compute spec to run remote regression models on Apheris.

Parameters:

Name	Type	Description	Default
`dataset_ids`	`List[str]`	A list of Apheris dataset IDs	required
`client_n_cpu`	`float`	The fractional number of CPUs to request in the compute spec for the Compute Gateways. Consider increasing this if your computation takes too long.	`0.5`
`client_memory`	`int`	The amount of client memory to request in the compute spec for the Compute Gateways. Consider increasing this if your computation runs out of memory.	`1000`
`server_n_cpu`	`float`	The fractional number of CPUs to request in the compute spec for the Orchestrator. Consider increasing this if your computation takes too long during aggregation.	`0.5`
`server_memory`	`int`	The amount of client memory to request in the compute spec for the Orchestrator. Consider increasing this if your computation runs out of memory in the Orchestrator.	`1000`
`modelversion`	`Optional[str]`	The version of regression models to use for this session. Defaults to the latest available version.	`None`

Returns:

Type	Description
`SupervisedMLSession`	A `SupervisedMLSession` that should be used as input to the regression functions, such as `fit_coxph`.

misc🔗

`ResultsNotFound` 🔗

Bases: Exception

Regression Models Reference🔗

regression.logistic_regression.api_client🔗

fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2) 🔗

validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter, validation_metric='accuracy') 🔗

regression.cox.api_client🔗

fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2) 🔗

validate_cox(datasets, session, time_col, ground_truth_col, modelparameter) 🔗

regression.session🔗

LocalDebugMLSession 🔗

SupervisedMLSession 🔗

regression.session.provision🔗

provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None) 🔗

misc🔗

ResultsNotFound 🔗

`fit_lr(datasets, session, feature_cols, target_col, validation_set_col=None, validation_split=None, feature_selector_direction=None, num_rounds=5, num_steps_per_round=2)` 🔗

`validate_lr(datasets, session, feature_cols, ground_truth_col, modelparameter, validation_metric='accuracy')` 🔗

`fit_coxph(datasets, session, time_col, target_col, validation_set_col=None, max_time=-1, num_rounds=5, num_steps_per_round=2)` 🔗

`validate_cox(datasets, session, time_col, ground_truth_col, modelparameter)` 🔗

`LocalDebugMLSession` 🔗

`SupervisedMLSession` 🔗

`provision(dataset_ids, client_n_cpu=0.5, client_memory=1000, server_n_cpu=0.5, server_memory=1000, modelversion=None)` 🔗

`ResultsNotFound` 🔗