Notable training methods

Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning

Meher Sai Preetam, Meher Bhaskar

Published: Jun 11, 2026 — 17:11 UTC

Problem
The paper addresses the limitations of traditional bagging ensembles, such as Random Forests and Bagged Neural Networks, which utilize uniform voting power across base estimators. This approach neglects the varying competence of individual estimators, leading to overconfidence in predictions. The authors propose a mathematically rigorous framework, Simplex-Constrained Sparse Bagging (SCSB), to improve model performance by optimizing the voting weights based on the local competence of each estimator. This work is a preprint and has not undergone peer review.

Method
SCSB formulates ensemble pruning and probability calibration as a joint optimization problem constrained to the probability simplex. The authors minimize the Out-Of-Bag (OOB) loss to achieve this. To address the “L1-simplex paradox,” where the L1 norm remains constant on the simplex and fails to induce sparsity, they introduce a concave quadratic penalty. This penalty effectively encourages sparsity in the voting weights assigned to the base estimators. The method is model-agnostic, meaning it can be applied to various ensemble architectures without modification.

Results
The authors demonstrate that SCSB can achieve up to 96% compression of the ensemble while maintaining or enhancing generalization accuracy. Specifically, they report significant improvements in Expected Calibration Error (ECE), indicating better probability calibration compared to standard bagging methods. The paper compares SCSB against traditional bagging approaches, showing that it yields linear inference speedups, which is critical for real-time applications. The results are validated across multiple benchmarks, although specific baseline models and datasets are not detailed in the summary.

Limitations
The authors acknowledge that while SCSB significantly improves ensemble performance, it may not be universally applicable to all types of ensemble methods, particularly those that do not rely on bootstrap sampling. Additionally, the computational overhead introduced by the optimization process may be a concern in resource-constrained environments. The paper does not explore the impact of hyperparameter tuning on the performance of SCSB, which could be a potential area for further investigation.

Why it matters
The introduction of SCSB has significant implications for ensemble learning, particularly in applications requiring both high accuracy and efficient inference. By addressing the overconfidence issue inherent in traditional bagging methods, SCSB enhances the reliability of predictions, which is crucial in fields such as healthcare and finance where decision-making is sensitive to probability estimates. This work lays the groundwork for future research into more adaptive ensemble methods that leverage local estimator competence, as published in arXiv.

By Callan Zhang · Jun 11, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG