XGBoost Reference🔗
The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.
apheris_xgboost.api_client🔗
fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, feature_types=None, num_class=None, quantile_alpha=None, objective='reg:squarederror', eval_metric=[], num_parallel_tree=None, enable_categorical=False, tree_method='hist', early_stopping_rounds=None, nthread=None, timeout=300)
🔗
Fit an XGBoost model to the given dataset IDs. If no dataset IDs are provided, the model will be fit to all datasets in the session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_rounds
|
int
|
The number of rounds to train the model for.
|
required |
target_col
|
Union[str, int, float]
|
The target column to predict. |
required |
cols
|
List[Union[str, int, float]]
|
The columns to use as features. |
required |
session
|
XGBoostSession
|
The session to use for training. |
required |
datasets
|
Optional[List[FederatedDataFrame]]
|
Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing. |
None
|
eta
|
Optional[float]
|
The learning rate. |
None
|
max_depth
|
Optional[int]
|
The maximum depth of the trees. |
None
|
feature_types
|
Optional[List[str]]
|
The feature types to use if categorical features are enabled. |
None
|
num_class
|
Optional[int]
|
The number of classes for multi-class classification. |
None
|
objective
|
Optional[BoosterObjectives]
|
The objective function. Default is |
'reg:squarederror'
|
eval_metric
|
Union[EvaluationMetric, List[EvaluationMetric]]
|
The evaluation metric. |
[]
|
num_parallel_tree
|
Optional[int]
|
The number of parallel trees. |
None
|
enable_categorical
|
Optional[bool]
|
Whether to enable categorical features. |
False
|
tree_method
|
Literal['exact', 'approx', 'hist']
|
The tree method to use. |
'hist'
|
early_stopping_rounds
|
Optional[int]
|
The number of communication rounds to wait for early stopping. |
None
|
nthread
|
Optional[int]
|
The number of threads to use. |
None
|
timeout
|
int
|
The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s. |
300
|
predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)
🔗
Predict using an XGBoost model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session
|
XGBoostSession
|
The session to use for prediction. |
required |
model_parameter
|
Dict[str, Any]
|
The model parameter dictionary to use for initializing the model to be evaluated. |
required |
feature_cols
|
Optional[List[Union[str, int, float]]]
|
The feature columns to use for prediction. |
required |
datasets
|
Optional[List[FederatedDataFrame]]
|
The datasets to predict on. If not specified, all datasets from compute_spec are used. |
None
|
objective
|
Optional[BoosterObjectives]
|
The objective function for the required prediction format. Default is
|
'reg:squarederror'
|
num_class
|
Optional[int]
|
The number of classes for multi-class classification. |
None
|
quantile_alpha
|
Optional[float]
|
The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used. |
None
|
timeout
|
int
|
The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s. |
300
|
evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective='reg:squarederror', num_class=None, quantile_alpha=None, timeout=300)
🔗
Evaluate an XGBoost model with remote data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session
|
XGBoostSession
|
The session to use for evaluation. |
required |
model_parameter
|
Dict[str, Any]
|
The model parameter dictionary to use for initializing the model to be evaluated. |
required |
target_col
|
Union[str, int, float]
|
The target column as ground truth. |
required |
feature_cols
|
Optional[List[Union[str, int, float]]]
|
The feature columns to use for prediction. |
required |
datasets
|
Optional[List[FederatedDataFrame]]
|
The evaluation datasets. If not specified, all datasets from compute_spec are used. |
None
|
eval_metric
|
Union[EvaluationMetric, List[EvaluationMetric]]
|
The XGBoost evaluation metric to use. Can be a single string or a list. |
[]
|
additional_eval_metric
|
Union[ScikitMetric, List[ScikitMetric]]
|
Additional scikit-learn evaluation metrics to use. Must
be used with |
[]
|
objective
|
Optional[BoosterObjectives]
|
The objective function. Default is |
'reg:squarederror'
|
num_class
|
Optional[int]
|
The number of classes for multi-class classification. |
None
|
quantile_alpha
|
Optional[float]
|
The quantile alpha for quantile regression. Only needed if the
objective |
None
|
timeout
|
int
|
The timeout for the nvflare task. For remote training job this is also used as the apheris job timeout with an additional buffer of 60s. |
300
|
Returns:
Type | Description |
---|---|
Dict[str, List[float]]
|
The evaluation results as a dictionary. |
apheris_xgboost.session🔗
XGBoostSession
🔗
Bases: ABC
Abstract class for XGBoost sessions. Subclasses must implement the run
and get_dataset_mapping
method.
XGBoostLocalSession
🔗
Bases: XGBoostSession
__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None)
🔗
XGBoost session for local execution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_mapping
|
Dict
|
Mapping of gateway IDs to local dataset subpaths. |
required |
dataset_root
|
str
|
Root directory for the local datasets. |
required |
ds2gw_dict
|
Optional[Dict[str, str]]
|
Optional mapping of dataset IDs to gateway IDs. If not provided,
it will be generated automatically with the dataset ids
|
None
|
workspace
|
Optional[Union[str, Path]]
|
Optional workspace directory. If not provided, a temporary directory. |
None
|
get_dataset_mapping()
🔗
Returns the dataset mapping of gateway IDs to local dataset subpaths.
run(xgb_job, **kwargs)
🔗
Runs the XGBoost job locally in nvflare simulator mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xgb_job
|
Union[TrainingJob, PredictionJob, ValidationJob]
|
XGBoost job to run. |
required |
Returns: The trained XGBoost model as a dictionary.
XGBoostRemoteSession
🔗
Bases: XGBoostSession
__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input')
🔗
XGBoost session for remote execution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compute_spec_id
|
str
|
Compute spec ID. |
required |
dataset_ids
|
List[str]
|
List of dataset IDs. |
required |
dataset_root
|
Optional[str]
|
Optional root directory for the remote datasets default is "/workspace/input". |
'/workspace/input'
|
get_dataset_mapping()
🔗
Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.
run(xgb_job, **kwargs)
🔗
Runs the XGBoost job remotely on Apheris Gateways.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xgb_job
|
Union[TrainingJob, ValidationJob, PredictionJob]
|
XGBoost job to run. |
required |
Returns: The trained XGBoost model as a dictionary.