The theory of boosting provides a computational framework for aggregating approximate weak learning algorithms, which perform marginally better than a random predictor, into an accurate strong learner. In the realizable case, the success of the boosting approach is underscored by a remarkable fact that the resultant sample complexity matches that of a computationally demanding alternative, namely Empirical Risk Minimization (ERM). This in particular implies that the realizable boosting methodology has the potential to offer computational relief without compromising on sample efficiency. Despite recent progress, in agnostic boosting, where assumptions on the conditional distribution of labels given feature descriptions are absent, ERM outstrips the agnostic boosting methodology in being quadratically more sample efficient than all known agnostic boosting algorithms. In this paper, we make progress on closing this gap, and give a substantially more sample efficient agnostic boosting algorithm than those known, without compromising on the computational (or oracle) complexity. A key feature of our algorithm is that it leverages the ability to reuse samples across multiple rounds of boo
Adaptive Boosting with Dynamic Weight Adjustment is an enhancement of the traditional Adaptive boosting commonly known as AdaBoost, a powerful ensemble learning technique. Adaptive Boosting with Dynamic Weight Adjustment technique improves the efficiency and accuracy by dynamically updating the weights of the instances based on prediction error where the weights are updated in proportion to the error rather than updating weights uniformly as we do in traditional Adaboost. Adaptive Boosting with Dynamic Weight Adjustment performs better than Adaptive Boosting as it can handle more complex data relations, allowing our model to handle imbalances and noise better, leading to more accurate and balanced predictions. The proposed model provides a more flexible and effective approach for boosting, particularly in challenging classification tasks.
Single LDR to HDR reconstruction remains challenging for over-exposed regions where traditional methods often fail due to complete information loss. We present a training-free approach that enhances existing indirect and direct HDR reconstruction methods through diffusion-based inpainting. Our method combines text-guided diffusion models with SDEdit refinement to generate plausible content in over-exposed areas while maintaining consistency across multi-exposure LDR images. Unlike previous approaches requiring extensive training, our method seamlessly integrates with existing HDR reconstruction techniques through an iterative compensation mechanism that ensures luminance coherence across multiple exposures. We demonstrate significant improvements in both perceptual quality and quantitative metrics on standard HDR datasets and in-the-wild captures. Results show that our method effectively recovers natural details in challenging scenarios while preserving the advantages of existing HDR reconstruction pipelines. Project page: https://github.com/EusdenLin/HDR-Reconstruction-Boosting
We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $γ$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / γ^2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $Ω(1 / γ)$ rounds or incurs an $\exp(d / γ)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $Ω(1 / γ^2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t γ^2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t^2)$. P
Boosting is a learning scheme that combines weak prediction rules to produce a strong composite estimator, with the underlying intuition that one can obtain accurate prediction rules by combining "rough" ones. Although boosting is proved to be consistent and overfitting-resistant, its numerical convergence rate is relatively slow. The aim of this paper is to develop a new boosting strategy, called the re-scale boosting (RBoosting), to accelerate the numerical convergence rate and, consequently, improve the learning performance of boosting. Our studies show that RBoosting possesses the almost optimal numerical convergence rate in the sense that, up to a logarithmic factor, it can reach the minimax nonlinear approximation rate. We then use RBoosting to tackle both the classification and regression problems, and deduce a tight generalization error estimate. The theoretical and experimental results show that RBoosting outperforms boosting in terms of generalization.
Boosting is a commonly used technique to enhance the performance of a set of base models by combining them into a strong ensemble model. Though widely adopted, boosting is typically used in supervised learning where the data is labeled accurately. However, in weakly supervised learning, where most of the data is labeled through weak and noisy sources, it remains nontrivial to design effective boosting approaches. In this work, we show that the standard implementation of the convex combination of base learners can hardly work due to the presence of noisy labels. Instead, we propose $\textit{LocalBoost}$, a novel framework for weakly-supervised boosting. LocalBoost iteratively boosts the ensemble model from two dimensions, i.e., intra-source and inter-source. The intra-source boosting introduces locality to the base learners and enables each base learner to focus on a particular feature regime by training new base learners on granularity-varying error regions. For the inter-source boosting, we leverage a conditional function to indicate the weak source where the sample is more likely to appear. To account for the weak labels, we further design an estimate-then-modify approach to comp
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizati
This article is a comment of N. Chapuis and L. Bocquet, Sustainable Energy Fuels, 2025, DOI: 10.1039/D4SE01366B which title is "Boosting large scale capacitive harvesting of osmotic power by dynamical matching of ion exchange kinetics". In this work, the authors present an experimental process that shows how it is possible to set up a reverse electrodialysis cell capable of achieving power values of 5 W/m2. This value is the profitability threshold. Our work challenges this claim and questions whether the proposed technique can be scaled up, as well as the modeling of the process.
Boosting algorithms are frequently used in applied data science and in research. To date, the distinction between boosting with either gradient descent or second-order Newton updates is often not made in both applied and methodological research, and it is thus implicitly assumed that the difference is irrelevant. The goal of this article is to clarify this situation. In particular, we present gradient and Newton boosting, as well as a hybrid variant of the two, in a unified framework. We compare these boosting algorithms with trees as base learners using various datasets and loss functions. Our experiments show that Newton boosting outperforms gradient and hybrid gradient-Newton boosting in terms of predictive accuracy on the majority of datasets. We also present evidence that the reason for this is not faster convergence of Newton boosting. In addition, we introduce a novel tuning parameter for tree-based Newton boosting which is interpretable and important for predictive accuracy.
Boosting is a fundamental approach in machine learning that enjoys both strong theoretical and practical guarantees. At a high-level, boosting algorithms cleverly aggregate weak learners to generate predictions with arbitrarily high accuracy. In this way, boosting algorithms convert weak learners into strong ones. Recently, Brukhim et al. extended boosting to the online agnostic binary classification setting. A key ingredient in their approach is a clean and simple reduction to online convex optimization, one that efficiently converts an arbitrary online convex optimizer to an agnostic online booster. In this work, we extend this reduction to multiclass problems and give the first boosting algorithm for online agnostic mutliclass classification. Our reduction also enables the construction of algorithms for statistical agnostic, online realizable, and statistical realizable multiclass boosting.
Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or classification model, Calibrated Boosting-Forest is able to preserve well calibrated posterior probabilities. Along with these benefits, we provide an alternative to the tedious step of tuning gradient boosting machines. We demonstrate that tuning Calibrated Boosting-Forest can be reduced to a simple hyper-parameter selection. We further establish that increasing this hyper-parameter improves the ranking performance under a diminishing return. We examine the effectiveness of Calibrated Boosting-Forest on ligand-based virtual screening where both continuous and binary labels are available and compare the performance of Calibrated Boosting-Forest with logistic regression, gradient boosting machine and deep learning. Calibrated Boosting-Forest achieved an approximately 48% improvement compared to a s
Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $γ$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-γ})$-approximation than the $(γ^2/(1+γ^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min_{\boldsymbol{x}\in
Ensemble methods, particularly boosting, have established themselves as highly effective and widely embraced machine learning techniques for tabular data. In this paper, we aim to leverage the robust predictive power of traditional boosting methods while enhancing fairness and interpretability. To achieve this, we develop Fair MP-Boost, a stochastic boosting scheme that balances fairness and accuracy by adaptively learning features and observations during training. Specifically, Fair MP-Boost sequentially samples small subsets of observations and features, termed minipatches (MP), according to adaptively learned feature and observation sampling probabilities. We devise these probabilities by combining loss functions, or by combining feature importance scores to address accuracy and fairness simultaneously. Hence, Fair MP-Boost prioritizes important and fair features along with challenging instances, to select the most relevant minipatches for learning. The learned probability distributions also yield intrinsic interpretations of feature importance and important observations in Fair MP-Boost. Through empirical evaluation of simulated and benchmark datasets, we showcase the interpret
Boosting combines weak (biased) learners to obtain effective learning algorithms for classification and prediction. In this paper, we show a connection between boosting and kernel-based methods, highlighting both theoretical and practical applications. In the context of $\ell_2$ boosting, we start with a weak linear learner defined by a kernel $K$. We show that boosting with this learner is equivalent to estimation with a special {\it boosting kernel} that depends on $K$, as well as on the regression matrix, noise variance, and hyperparameters. The number of boosting iterations is modeled as a continuous hyperparameter, and fit along with other parameters using standard techniques. We then generalize the boosting kernel to a broad new class of boosting approaches for more general weak learners, including those based on the $\ell_1$, hinge and Vapnik losses. The approach allows fast hyperparameter tuning for this general class, and has a wide range of applications, including robust regression and classification. We illustrate some of these applications with numerical examples on synthetic and real data.
We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a ``swap regret'' like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class H that makes use only of a standard squared error regression oracle for H. We give a weak learning assumption on H that ensures convergence to Bayes optimality without the need to make any realizability assumptions -- giving us an agnostic boosting algorithm for regression. We then show that our weak learning assumption on H is both necessary and sufficient for multicalibration with respect to H to imply Bayes optimality. We also show that if H satisfies our weak learning condition relative to another class C then multicalibration with respect to H implies multicalibration with respect to C. Finally we investigate the empirical performance of our algorithm experimentally using an open source implementation that we make available. Our code repository can be found at https://github.com/Declanc
Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton's method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, SnapBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how SnapBoost is imp
The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. In this paper, we investigate the possibility of parallelizing boosting. Our main contribution is a strong negative result, implying that significant parallelization of boosting requires an exponential blow-up in the total computing resources needed for training.
As an adaptive, interpretable, robust, and accurate meta-algorithm for arbitrary differentiable loss functions, gradient tree boosting is one of the most popular machine learning techniques, though the computational expensiveness severely limits its usage. Stochastic gradient boosting could be adopted to accelerates gradient boosting by uniformly sampling training instances, but its estimator could introduce a high variance. This situation arises motivation for us to optimize gradient tree boosting. We combine gradient tree boosting with importance sampling, which achieves better performance by reducing the stochastic variance. Furthermore, we use a regularizer to improve the diagonal approximation in the Newton step of gradient boosting. The theoretical analysis supports that our strategies achieve a linear convergence rate on logistic loss. Empirical results show that our algorithm achieves a 2.5x--18x acceleration on two different gradient boosting algorithms (LogitBoost and LambdaMART) without appreciable performance loss.
Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice.
Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentia