Skip to content

XGBoost Reference🔗

The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.

apheris_xgboost.api_client🔗

fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, objective=None, eval_metric=None, num_parallel_tree=None, enable_categorical=False, tree_method='hist', nthread=None) 🔗

Fit an XGBoost model to the given dataset ids.

Parameters:

Name Type Description Default
num_rounds int

The number of rounds to train the model for. num_rounds * num_parallel_tree is the total number of trees.

required
target_col Union[str, int, float]

The target column to predict.

required
cols List[Union[str, int, float]]

The columns to use as features.

required
session XGBoostSession

The session to use for training.

required
datasets Optional[List[FederatedDataFrame]]

Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing.

None
eta Optional[float]

The learning rate.

None
max_depth Optional[int]

The maximum depth of the trees.

None
objective Optional[Literal['reg:squarederror', 'reg:squaredlogerror', 'reg:logistic', 'reg:pseudohubererror', 'reg:absoluteerror', 'reg:quantileerror', 'binary:logistic', 'binary:logitraw', 'binary:hinge', 'count:poisson', 'survival:cox', 'survival:aft', 'multi:softmax', 'multi:softprob', 'rank:ndcg', 'rank:map', 'rank:pairwise', 'reg:gamma', 'reg:tweedie']]

The objective function

None
eval_metric Optional[Literal['rmse', 'rmsle', 'mae', 'mape', 'mphe', 'logloss', 'error', 'error@t', 'merror', 'mlogloss', 'auc', 'aucpr', 'pre', 'ndcg', 'map']]

The evaluation metric.

None
num_parallel_tree Optional[int]

The number of parallel trees.

None
enable_categorical Optional[bool]

Whether to enable categorical features.

False
tree_method Literal['exact', 'approx', 'hist']

The tree method to use.

'hist'
nthread Optional[int]

The number of threads to use.

None

apheris_xgboost.session🔗

XGBoostSession 🔗

Bases: ABC

Abstract class for XGBoost sessions. Subclasses must implement the run and get_dataset_mapping method.

XGBoostLocalSession 🔗

Bases: XGBoostSession

__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None) 🔗

XGBoost session for local execution. Args: dataset_mapping: Mapping of gateway IDs to local dataset subpaths. dataset_root: Root directory for the local datasets. ds2gw_dict: Optional mapping of dataset IDs to gateway IDs. If not provided, it will be generated automatically with the dataset ids dataset_id_1 ... dataset_id_n. workspace: Optional workspace directory. If not provided, a temporary directory. timeout: Timeout in seconds for the simulator run to kill all remaining processes in case of a client error.

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to local dataset subpaths.

run(xgb_job) 🔗

Runs the XGBoost job locally in nvflare simulator mode. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.

XGBoostRemoteSession 🔗

Bases: XGBoostSession

__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input') 🔗

XGBoost session for remote execution. Args: compute_spec_id: Compute spec ID. dataset_ids: List of dataset IDs. dataset_root: Optional root directory for the remote datasets default is "/workspace/input".

get_dataset_mapping() 🔗

Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.

run(xgb_job) 🔗

Runs the XGBoost job remotely. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.