Skip to main content

Remove Outliers with IQR

  Interquartile Range(IQR)

Steps:

  • 1: Put the numbers in order.
    1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
  • 2: Find the median.
    1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
  • 3: Place parentheses around the numbers above and below the median.
    Not necessary statistically, but it makes Q1 and Q3 easier to spot.
    (1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27).
  • 4: Find Q1 and Q3
    Think of Q1 as a median in the lower half of the data and think of Q3 as a median for the upper half of data.
    (1, 2, 5, 6, 7),  9, ( 12, 15, 18, 19, 27). Q1 = 5 and Q3 = 18.
  • Subtract Q1 from Q3 to find the interquartile range.
    18 – 5 = 13.

What if I Have an Even Set of Numbers?

Example question: Find the IQR for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21.

  • Step 1: Put the numbers in order.
    3, 5, 7, 8, 9, 11, 15, 16, 20, 21.
  • Step 2: Make a mark in the center of the data:
    3, 5, 7, 8, 9, | 11, 15, 16, 20, 21.
  • Step 3: Place parentheses around the numbers above and below the mark you made in Step 2–it makes Q1 and Q3 easier to spot.
    (3, 5, 7, 8, 9), | (11, 15, 16, 20, 21).
  • Step 4: Find Q1 and Q3
    Q1 is the median (the middle) of the lower half of the data, and Q3 is the median (the middle) of the upper half of the data.
    (3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). Q1 = 7 and Q3 = 16.
  • Step 5: Subtract Q1 from Q3.
    16 – 7 = 9.
    This is your IQR.
Steps to perform Outlier Detection by identifying the lowerbound and upperbound of the data:
1. Arrange your data in ascending order
2. Calculate Q1 ( the first Quarter)
3. Calculate Q3 ( the third Quartile)
4. Find IQR = (Q3 - Q1)
5. Find the lower Range = Q1 -(1.5 * IQR)
6. Find the upper Range = Q3 + (1.5 * IQR)

for Body fat dataset we using IQR method

#your code here
df.boxplot(figsize=(12,8))

# to remove outliers we using IQR Method
Q1=df.quantile(0.25)
Q3=df.quantile(0.75)
IQR=Q3-Q1
lower_range=Q1-1.5*IQR
upper_range=Q3+1.5*IQR
df_out=df[~((df<lower_range)|(df>upper_range)).any(axis=1)]

Out[45]:
<AxesSubplot:>
ange=Q1-1.5*IQR
uppe






ck whether outlier removed or not
Out[54]:
<AxesSubplot:>

from above boxplot we say that some column outliers are removed but still some column have outliers.

Popular posts from this blog

Bagging and Boosting

  What is an Ensemble Method? The ensemble is a method used in the machine learning algorithm. In this method, multiple models or ‘weak learners’ are trained to rectify the same problem and integrated to gain desired results. Weak models combined rightly give accurate models. Bagging Bagging is an acronym for ‘Bootstrap Aggregation’ and is used to decrease the variance in the prediction model. Bagging is a parallel method that fits different, considered learners independently from each other, making it possible to train them simultaneously. Bagging generates additional data for training from the dataset. This is achieved by random sampling with replacement from the original dataset. Sampling with replacement may repeat some observations in each new training data set. Every element in Bagging is equally probable for appearing in a new dataset.  These multi datasets are used to train multiple models in parallel. The average of all the predictions from different ensemble models i...