Skip to content

XGBoost Reference🔗

The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.

apheris_xgboost.api_client🔗

fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, feature_types=None, num_class=None, quantile_alpha=None, objective='reg:squarederror', eval_metric=[], num_parallel_tree=None, enable_categorical=False, tree_method='hist', early_stopping_rounds=None, nthread=None, timeout=300) 🔗

Fit an XGBoost model to the given dataset IDs. If no dataset IDs are provided, the model will be fit to all datasets in the session.

Parameters:

Name Type Description Default
num_rounds int

The number of rounds to train the model for. num_rounds * num_parallel_tree is the total number of trees.

required
target_col Union[str, int, float]

The target column to predict.

required
cols List[Union[str, int, float]]

The columns to use as features.

required
session XGBoostSession

The session to use for training.

required
datasets Optional[List[FederatedDataFrame]]

Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing.

None
eta Optional[float]

The learning rate.

None
max_depth Optional[int]

The maximum depth of the trees.

None
feature_types Optional[List[str]]

The feature types to use if categorical features are enabled.

None
num_class Optional[int]

The number of classes for multi-class classification.

None
objective Optional[BoosterObjectives]

The objective function. Default is reg:squarederror.

'reg:squarederror'
eval_metric Union[EvaluationMetric, List[EvaluationMetric]]

The evaluation metric.

[]
num_parallel_tree Optional[int]

The number of parallel trees.

None
enable_categorical Optional[bool]

Whether to enable categorical features.

False
tree_method Literal['exact', 'approx', 'hist']

The tree method to use.

'hist'
early_stopping_rounds Optional[int]

The number of communication rounds to wait for early stopping.

None
nthread Optional[int]

The number of threads to use.

None
timeout int

The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.

300

predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300) 🔗

Predict using an XGBoost model.

Parameters:

Name Type Description Default
session XGBoostSession

The session to use for prediction.

required
model_parameter Dict[str, Any]

The model parameter dictionary to use for initializing the model to be evaluated.

required
feature_cols Optional[List[Union[str, int, float]]]

The feature columns to use for prediction.

required
datasets Optional[List[FederatedDataFrame]]

The datasets to predict on. If not specified, all datasets from compute_spec are used.

None
objective Optional[BoosterObjectives]

The objective function for the required prediction format. Default is reg:squarederror.

'reg:squarederror'
num_class Optional[int]

The number of classes for multi-class classification.

None
quantile_alpha Optional[float]

The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used.

None
timeout int

The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.

300

evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300) 🔗

Evaluate an XGBoost model with remote data.

Parameters:

Name Type Description Default
session XGBoostSession

The session to use for evaluation.

required
model_parameter Dict[str, Any]

The model parameter dictionary to use for initializing the model to be evaluated.

required
target_col Union[str, int, float]

The target column as ground truth.

required
feature_cols Optional[List[Union[str, int, float]]]

The feature columns to use for prediction.

required
datasets Optional[List[FederatedDataFrame]]

The evaluation datasets. If not specified, all datasets from compute_spec are used.

None
eval_metric Union[EvaluationMetric, List[EvaluationMetric]]

The XGBoost evaluation metric to use. Can be a single string or a list.

[]
additional_eval_metric Union[ScikitMetric, List[ScikitMetric]]

Additional scikit-learn evaluation metrics to use. Must be used with objective="multi:softmax".

[]
objective Optional[BoosterObjectives]

The objective function. Default is reg:squarederror.

'reg:squarederror'
num_class Optional[int]

The number of classes for multi-class classification.

None
quantile_alpha Optional[float]

The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used.

None
timeout int

The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s.

300

Returns:

Type Description
Dict[str, List[float]]

The evaluation results as a dictionary.

apheris_xgboost.session🔗

XGBoostSession 🔗

Bases: ABC

Abstract class for XGBoost sessions. Subclasses must implement the run and get_dataset_mapping method.

XGBoostLocalSession 🔗

Bases: XGBoostSession

__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None) 🔗

XGBoost session for local execution.

Parameters:

Name Type Description Default
dataset_mapping Dict

Mapping of gateway IDs to local dataset subpaths.

required
dataset_root str

Root directory for the local datasets.

required
ds2gw_dict Optional[Dict[str, str]]

Optional mapping of dataset IDs to gateway IDs. If not provided, it will be generated automatically with the dataset ids dataset_id_1 ... dataset_id_n.

None
workspace Optional[Union[str, Path]]

Optional workspace directory. If not provided, a temporary directory.

None

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to local dataset subpaths.

run(xgb_job, **kwargs) 🔗

Runs the XGBoost job locally in nvflare simulator mode.

Parameters:

Name Type Description Default
xgb_job Union[TrainingJob, PredictionJob, ValidationJob]

XGBoost job to run.

required

Returns: The trained XGBoost model as a dictionary.

XGBoostRemoteSession 🔗

Bases: XGBoostSession

__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input') 🔗

XGBoost session for remote execution.

Parameters:

Name Type Description Default
compute_spec_id str

Compute spec ID.

required
dataset_ids List[str]

List of dataset IDs.

required
dataset_root Optional[str]

Optional root directory for the remote datasets default is "/workspace/input".

'/workspace/input'

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.

run(xgb_job, **kwargs) 🔗

Runs the XGBoost job remotely on Apheris Gateways.

Parameters:

Name Type Description Default
xgb_job Union[TrainingJob, ValidationJob, PredictionJob]

XGBoost job to run.

required

Returns: The trained XGBoost model as a dictionary.