XGBoost Reference🔗

The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.

apheris_xgboost.api_client🔗

`fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, feature_types=None, num_class=None, quantile_alpha=None, objective='reg:squarederror', eval_metric=[], num_parallel_tree=None, enable_categorical=False, tree_method='hist', early_stopping_rounds=None, nthread=None, timeout=300)` 🔗

Fit an XGBoost model to the given dataset IDs. If no dataset IDs are provided, the model will be fit to all datasets in the session.

Parameters:

Name	Type	Description	Default
`num_rounds`	`int`	The number of rounds to train the model for. `num_rounds` * `num_parallel_tree` is the total number of trees.	required
`target_col`	`Union[str, int, float]`	The target column to predict.	required
`cols`	`List[Union[str, int, float]]`	The columns to use as features.	required
`session`	`XGBoostSession`	The session to use for training.	required
`datasets`	`Optional[List[FederatedDataFrame]]`	Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing.	`None`
`eta`	`Optional[float]`	The learning rate.	`None`
`max_depth`	`Optional[int]`	The maximum depth of the trees.	`None`
`feature_types`	`Optional[List[str]]`	The feature types to use if categorical features are enabled.	`None`
`num_class`	`Optional[int]`	The number of classes for multi-class classification.	`None`
`objective`	`Optional[BoosterObjectives]`	The objective function. Default is `reg:squarederror`.	`'reg:squarederror'`
`eval_metric`	`Union[EvaluationMetric, List[EvaluationMetric]]`	The evaluation metric.	`[]`
`num_parallel_tree`	`Optional[int]`	The number of parallel trees.	`None`
`enable_categorical`	`Optional[bool]`	Whether to enable categorical features.	`False`
`tree_method`	`Literal['exact', 'approx', 'hist']`	The tree method to use.	`'hist'`
`early_stopping_rounds`	`Optional[int]`	The number of communication rounds to wait for early stopping.	`None`
`nthread`	`Optional[int]`	The number of threads to use.	`None`
`timeout`	`int`	The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.	`300`

`predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)` 🔗

Predict using an XGBoost model.

Parameters:

Name	Type	Description	Default
`session`	`XGBoostSession`	The session to use for prediction.	required
`model_parameter`	`Dict[str, Any]`	The model parameter dictionary to use for initializing the model to be evaluated.	required
`feature_cols`	`Optional[List[Union[str, int, float]]]`	The feature columns to use for prediction.	required
`datasets`	`Optional[List[FederatedDataFrame]]`	The datasets to predict on. If not specified, all datasets from compute_spec are used.	`None`
`objective`	`Optional[BoosterObjectives]`	The objective function for the required prediction format. Default is `reg:squarederror`.	`'reg:squarederror'`
`num_class`	`Optional[int]`	The number of classes for multi-class classification.	`None`
`quantile_alpha`	`Optional[float]`	The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used.	`None`
`timeout`	`int`	The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.	`300`

`evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)` 🔗

Evaluate an XGBoost model with remote data.

Parameters:

Name	Type	Description	Default
`session`	`XGBoostSession`	The session to use for evaluation.	required
`model_parameter`	`Dict[str, Any]`	The model parameter dictionary to use for initializing the model to be evaluated.	required
`target_col`	`Union[str, int, float]`	The target column as ground truth.	required
`feature_cols`	`Optional[List[Union[str, int, float]]]`	The feature columns to use for prediction.	required
`datasets`	`Optional[List[FederatedDataFrame]]`	The evaluation datasets. If not specified, all datasets from compute_spec are used.	`None`
`eval_metric`	`Union[EvaluationMetric, List[EvaluationMetric]]`	The XGBoost evaluation metric to use. Can be a single string or a list.	`[]`
`additional_eval_metric`	`Union[ScikitMetric, List[ScikitMetric]]`	Additional scikit-learn evaluation metrics to use. Must be used with `objective="multi:softmax"`.	`[]`
`objective`	`Optional[BoosterObjectives]`	The objective function. Default is `reg:squarederror`.	`'reg:squarederror'`
`num_class`	`Optional[int]`	The number of classes for multi-class classification.	`None`
`quantile_alpha`	`Optional[float]`	The quantile alpha for quantile regression. Only needed if the objective `reg:quantileerror` is used.	`None`
`timeout`	`int`	The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.	`300`

Returns:

Type	Description
`Dict[str, List[float]]`	The evaluation results as a dictionary.

apheris_xgboost.session🔗

`XGBoostSession` 🔗

Bases: ABC

Abstract class for XGBoost sessions. Subclasses must implement the run and get_dataset_mapping method.

`XGBoostLocalSession` 🔗

Bases: XGBoostSession

`init(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None)` 🔗

XGBoost session for local execution.

Parameters:

Name	Type	Description	Default
`dataset_mapping`	`Dict`	Mapping of gateway IDs to local dataset subpaths.	required
`dataset_root`	`str`	Root directory for the local datasets.	required
`ds2gw_dict`	`Optional[Dict[str, str]]`	Optional mapping of dataset IDs to gateway IDs. If not provided, it will be generated automatically with the dataset ids `dataset_id_1` ... `dataset_id_n`.	`None`
`workspace`	`Optional[Union[str, Path]]`	Optional workspace directory. If not provided, a temporary directory.	`None`

`get_dataset_mapping()` 🔗

Returns the dataset mapping of gateway IDs to local dataset subpaths.

`run(xgb_job, **kwargs)` 🔗

Runs the XGBoost job locally in nvflare simulator mode.

Parameters:

Name	Type	Description	Default
`xgb_job`	`Union[TrainingJob, PredictionJob, ValidationJob]`	XGBoost job to run.	required

Returns: The trained XGBoost model as a dictionary.

`XGBoostRemoteSession` 🔗

Bases: XGBoostSession

`init(compute_spec_id, dataset_ids, dataset_root='/workspace/input')` 🔗

XGBoost session for remote execution.

Parameters:

Name	Type	Description	Default
`compute_spec_id`	`str`	Compute spec ID.	required
`dataset_ids`	`List[str]`	List of dataset IDs.	required
`dataset_root`	`Optional[str]`	Optional root directory for the remote datasets default is "/workspace/input".	`'/workspace/input'`

`get_dataset_mapping()` 🔗

Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.

`run(xgb_job, **kwargs)` 🔗

Runs the XGBoost job remotely on Apheris Gateways.

Parameters:

Name	Type	Description	Default
`xgb_job`	`Union[TrainingJob, ValidationJob, PredictionJob]`	XGBoost job to run.	required

Returns: The trained XGBoost model as a dictionary.

XGBoost Reference🔗

apheris_xgboost.api_client🔗

predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300) 🔗

evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300) 🔗

apheris_xgboost.session🔗

XGBoostSession 🔗

XGBoostLocalSession 🔗

__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None) 🔗

get_dataset_mapping() 🔗

run(xgb_job, **kwargs) 🔗

XGBoostRemoteSession 🔗

__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input') 🔗

get_dataset_mapping() 🔗

run(xgb_job, **kwargs) 🔗

`predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)` 🔗

`evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)` 🔗

`XGBoostSession` 🔗

`XGBoostLocalSession` 🔗

`init(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None)` 🔗

`get_dataset_mapping()` 🔗

`run(xgb_job, **kwargs)` 🔗

`XGBoostRemoteSession` 🔗

`init(compute_spec_id, dataset_ids, dataset_root='/workspace/input')` 🔗

`get_dataset_mapping()` 🔗

`run(xgb_job, **kwargs)` 🔗