Skip to main content

Bagging and Boosting

 

What is an Ensemble Method?

The ensemble is a method used in the machine learning algorithm. In this method, multiple models or ‘weak learners’ are trained to rectify the same problem and integrated to gain desired results. Weak models combined rightly give accurate models.

Bagging

Bagging is an acronym for ‘Bootstrap Aggregation’ and is used to decrease the variance in the prediction model. Bagging is a parallel method that fits different, considered learners independently from each other, making it possible to train them simultaneously.

Bagging generates additional data for training from the dataset. This is achieved by random sampling with replacement from the original dataset. Sampling with replacement may repeat some observations in each new training data set. Every element in Bagging is equally probable for appearing in a new dataset. 

These multi datasets are used to train multiple models in parallel. The average of all the predictions from different ensemble models is calculated. The majority vote gained from the voting mechanism is considered when classification is made. Bagging decreases the variance and tunes the prediction to an expected outcome.

Example of Bagging:

The Random Forest model uses Bagging, where decision tree models with higher variance are present. It makes random feature selection to grow trees. Several random trees make a Random Forest.

Boosting

Boosting is a sequential ensemble method that iteratively adjusts the weight of observation as per the last classification. If an observation is incorrectly classified, it increases the weight of that observation. The term ‘Boosting’ in a layman language, refers to algorithms that convert a weak learner to a stronger one. It decreases the bias error and builds strong predictive models.

Data points mispredicted in each iteration are spotted, and their weights are increased. The Boosting algorithm allocates weights to each resulting model during training. A learner with good training data prediction results will be assigned a higher weight. When evaluating a new learner, Boosting keeps track of learner’s errors. 

Example of Boosting: 

The AdaBoost uses Boosting techniques, where a 50% less error is required to maintain the model. Here, Boosting can keep or discard a single learner. Otherwise, the iteration is repeated until achieving a better learner.

Similarities and Differences between Bagging and Boosting

Bagging and Boosting, both being the popularly used methods, have a universal similarity of being classified as ensemble methods. Here we will highlight more similarities between them, followed by the differences they have from each other. Let us first start with similarities as understanding these will make understanding the differences easier.

Bagging and Boosting: Similarities

  1. Bagging and Boosting are ensemble methods focused on getting N learners from a single learner.
  2. Bagging and Boosting make random sampling and generate several training data sets 
  3. Bagging and Boosting arrive upon the end decision by making an average of N learners or taking the voting rank done by most of them.
  4. Bagging and Boosting reduce variance and provide higher stability with minimizing errors.

Bagging and Boosting: Differences

As we said already,

Bagging is a method of merging the same type of predictions. Boosting is a method of merging different types of predictions.

Bagging decreases variance, not bias, and solves over-fitting issues in a model. Boosting decreases bias, not variance.

In Bagging, each model receives an equal weight. In Boosting, models are weighed based on their performance.

Models are built independently in Bagging. New models are affected by a previously built model’s performance in Boosting.

In Bagging, training data subsets are drawn randomly with a replacement for the training dataset. In Boosting, every new subset comprises the elements that were misclassified by previous models.

Bagging is usually applied where the classifier is unstable and has a high variance. Boosting is usually applied where the classifier is stable and simple and has high bias.

Why is bagging better than boosting?

How are the main differences bagging and boosting?

What are the similarities between bagging and boosting?