XGBoost Reference🔗
The Apheris XGBoost model is an integration with the NVIDIA FLARE XGBoost federation layer. You can see more details about it in our XGBoost Model Registry guide.
apheris_xgboost.api_client🔗
fit_xgboost(num_rounds, target_col, cols, session, datasets=None, eta=None, max_depth=None, objective=None, eval_metric=None, num_parallel_tree=None, enable_categorical=False, tree_method='hist', nthread=None)
🔗
Fit an XGBoost model to the given dataset ids.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_rounds |
int
|
The number of rounds to train the model for.
|
required |
target_col |
Union[str, int, float]
|
The target column to predict. |
required |
cols |
List[Union[str, int, float]]
|
The columns to use as features. |
required |
session |
XGBoostSession
|
The session to use for training. |
required |
datasets |
Optional[List[FederatedDataFrame]]
|
Optional list of the datasets as FederatedDataFrames to fit the model to. If not provided, the session's dataset mapping will be used without any preprocessing. |
None
|
eta |
Optional[float]
|
The learning rate. |
None
|
max_depth |
Optional[int]
|
The maximum depth of the trees. |
None
|
objective |
Optional[Literal['reg:squarederror', 'reg:squaredlogerror', 'reg:logistic', 'reg:pseudohubererror', 'reg:absoluteerror', 'reg:quantileerror', 'binary:logistic', 'binary:logitraw', 'binary:hinge', 'count:poisson', 'survival:cox', 'survival:aft', 'multi:softmax', 'multi:softprob', 'rank:ndcg', 'rank:map', 'rank:pairwise', 'reg:gamma', 'reg:tweedie']]
|
The objective function |
None
|
eval_metric |
Optional[Literal['rmse', 'rmsle', 'mae', 'mape', 'mphe', 'logloss', 'error', 'error@t', 'merror', 'mlogloss', 'auc', 'aucpr', 'pre', 'ndcg', 'map']]
|
The evaluation metric. |
None
|
num_parallel_tree |
Optional[int]
|
The number of parallel trees. |
None
|
enable_categorical |
Optional[bool]
|
Whether to enable categorical features. |
False
|
tree_method |
Literal['exact', 'approx', 'hist']
|
The tree method to use. |
'hist'
|
nthread |
Optional[int]
|
The number of threads to use. |
None
|
apheris_xgboost.session🔗
XGBoostSession
🔗
Bases: ABC
Abstract class for XGBoost sessions. Subclasses must implement the run
and get_dataset_mapping
method.
XGBoostLocalSession
🔗
Bases: XGBoostSession
__init__(dataset_mapping, dataset_root, ds2gw_dict=None, workspace=None)
🔗
XGBoost session for local execution.
Args:
dataset_mapping: Mapping of gateway IDs to local dataset subpaths.
dataset_root: Root directory for the local datasets.
ds2gw_dict: Optional mapping of dataset IDs to gateway IDs. If not provided,
it will be generated automatically with the dataset ids
dataset_id_1
... dataset_id_n
.
workspace: Optional workspace directory. If not provided, a temporary directory.
timeout: Timeout in seconds for the simulator run to kill all remaining processes
in case of a client error.
get_dataset_mapping()
🔗
Returns the dataset mapping of gateway IDs to local dataset subpaths.
run(xgb_job)
🔗
Runs the XGBoost job locally in nvflare simulator mode. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.
XGBoostRemoteSession
🔗
Bases: XGBoostSession
__init__(compute_spec_id, dataset_ids, dataset_root='/workspace/input')
🔗
XGBoost session for remote execution. Args: compute_spec_id: Compute spec ID. dataset_ids: List of dataset IDs. dataset_root: Optional root directory for the remote datasets default is "/workspace/input".
get_dataset_mapping()
🔗
Returns the dataset mapping of gateway IDs to dataset paths on remote side. This information is retrieved from the dataset references of the dataset IDs.
run(xgb_job)
🔗
Runs the XGBoost job remotely. Args: xgb_job: XGBoost job to run. Returns: The trained XGBoost model as a dictionary.