Skip to content

XGBoost Reference🔗

The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.

apheris_xgboost.api_client🔗

fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, feature_types=None, num_class=None, quantile_alpha=None, objective=None, eval_metric=[], num_parallel_tree=None, enable_categorical=False, tree_method='hist', nthread=None) 🔗

Fit an XGBoost model to the given dataset IDs.

Parameters:

Name Type Description Default
num_rounds int

The number of rounds to train the model for. num_rounds * num_parallel_tree is the total number of trees.

required
target_col Union[str, int, float]

The target column to predict.

required
cols List[Union[str, int, float]]

The columns to use as features.

required
session XGBoostSession

The session to use for training.

required
datasets Optional[List[FederatedDataFrame]]

Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing.

None
eta Optional[float]

The learning rate.

None
max_depth Optional[int]

The maximum depth of the trees.

None
feature_types Optional[List[str]]

The feature types to use if categorical features are enabled.

None
num_class Optional[int]

The number of classes for multi-class classification.

None
objective Optional[BoosterObjectives]

The objective function.

None
eval_metric Union[EvaluationMetric, List[EvaluationMetric]]

The evaluation metric.

[]
num_parallel_tree Optional[int]

The number of parallel trees.

None
enable_categorical Optional[bool]

Whether to enable categorical features.

False
tree_method Literal['exact', 'approx', 'hist']

The tree method to use.

'hist'
nthread Optional[int]

The number of threads to use.

None

predict_xgboost(session, model_parameter, feature_cols, datasets=None, objective=None, num_class=None, quantile_alpha=None) 🔗

Predict using an XGBoost model.

Parameters:

Name Type Description Default
session XGBoostSession

The session to use for prediction.

required
model_parameter Dict[str, Any]

The model parameter dictionary to use for initializing the model to be evaluated.

required
feature_cols Optional[List[Union[str, int, float]]]

The feature columns to use for prediction.

required
datasets Optional[List[FederatedDataFrame]]

The datasets to predict on. If not specified, all datasets from compute_spec are used.

None
objective Optional[BoosterObjectives]

The objective function for the required prediction format.

None
num_class Optional[int]

The number of classes for multi-class classification.

None
quantile_alpha Optional[float]

The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used.

None

evaluate_xgboost(session, model_parameter, target_col, feature_cols, datasets=None, eval_metric=[], additional_eval_metric=[], objective=None, num_class=None, quantile_alpha=None) 🔗

Evaluate an XGBoost model with remote data.

Parameters:

Name Type Description Default
session XGBoostSession

The session to use for evaluation.

required
model_parameter Dict[str, Any]

The model parameter dictionary to use for initializing the model to be evaluated.

required
target_col Union[str, int, float]

The target column as ground truth.

required
feature_cols Optional[List[Union[str, int, float]]]

The feature columns to use for prediction.

required
datasets Optional[List[FederatedDataFrame]]

The evaluation datasets. If not specified, all datasets from compute_spec are used.

None
eval_metric Union[EvaluationMetric, List[EvaluationMetric]]

The XGBoost evaluation metric to use. Can be a single string or a list.

[]
additional_eval_metric Union[ScikitMetric, List[ScikitMetric]]

Additional scikit-learn evaluation metrics to use. Must be used with objective="multi:softmax".

[]
objective Optional[BoosterObjectives]

The objective function.

None
num_class Optional[int]

The number of classes for multi-class classification.

None
quantile_alpha Optional[float]

The quantile alpha for quantile regression. Only needed if the objective reg:quantileerror is used.

None

Returns:

Type Description
Dict[str, List[float]]

The evaluation results as a dictionary.

apheris_xgboost.session🔗

XGBoostSession 🔗

Bases: ABC

Abstract class for XGBoost sessions. Subclasses must implement the run and get_dataset_mapping method.

XGBoostLocalSession 🔗

Bases: XGBoostSession

__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None) 🔗

XGBoost session for local execution. Args: dataset_mapping: Mapping of gateway IDs to local dataset subpaths. dataset_root: Root directory for the local datasets. ds2gw_dict: Optional mapping of dataset IDs to gateway IDs. If not provided, it will be generated automatically with the dataset ids dataset_id_1 ... dataset_id_n. workspace: Optional workspace directory. If not provided, a temporary directory. timeout: Timeout in seconds for the simulator run to kill all remaining processes in case of a client error.

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to local dataset subpaths.

run(xgb_job) 🔗

Runs the XGBoost job locally in nvflare simulator mode. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.

XGBoostRemoteSession 🔗

Bases: XGBoostSession

__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input') 🔗

XGBoost session for remote execution. Args: compute_spec_id: Compute spec ID. dataset_ids: List of dataset IDs. dataset_root: Optional root directory for the remote datasets default is "/workspace/input".

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.

run(xgb_job) 🔗

Runs the XGBoost job remotely. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.