Skip to main content

Remove Outliers with IQR

  Interquartile Range(IQR)

Steps:

  • 1: Put the numbers in order.
    1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
  • 2: Find the median.
    1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
  • 3: Place parentheses around the numbers above and below the median.
    Not necessary statistically, but it makes Q1 and Q3 easier to spot.
    (1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27).
  • 4: Find Q1 and Q3
    Think of Q1 as a median in the lower half of the data and think of Q3 as a median for the upper half of data.
    (1, 2, 5, 6, 7),  9, ( 12, 15, 18, 19, 27). Q1 = 5 and Q3 = 18.
  • Subtract Q1 from Q3 to find the interquartile range.
    18 – 5 = 13.

What if I Have an Even Set of Numbers?

Example question: Find the IQR for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21.

  • Step 1: Put the numbers in order.
    3, 5, 7, 8, 9, 11, 15, 16, 20, 21.
  • Step 2: Make a mark in the center of the data:
    3, 5, 7, 8, 9, | 11, 15, 16, 20, 21.
  • Step 3: Place parentheses around the numbers above and below the mark you made in Step 2–it makes Q1 and Q3 easier to spot.
    (3, 5, 7, 8, 9), | (11, 15, 16, 20, 21).
  • Step 4: Find Q1 and Q3
    Q1 is the median (the middle) of the lower half of the data, and Q3 is the median (the middle) of the upper half of the data.
    (3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). Q1 = 7 and Q3 = 16.
  • Step 5: Subtract Q1 from Q3.
    16 – 7 = 9.
    This is your IQR.
Steps to perform Outlier Detection by identifying the lowerbound and upperbound of the data:
1. Arrange your data in ascending order
2. Calculate Q1 ( the first Quarter)
3. Calculate Q3 ( the third Quartile)
4. Find IQR = (Q3 - Q1)
5. Find the lower Range = Q1 -(1.5 * IQR)
6. Find the upper Range = Q3 + (1.5 * IQR)

for Body fat dataset we using IQR method

#your code here
df.boxplot(figsize=(12,8))

# to remove outliers we using IQR Method
Q1=df.quantile(0.25)
Q3=df.quantile(0.75)
IQR=Q3-Q1
lower_range=Q1-1.5*IQR
upper_range=Q3+1.5*IQR
df_out=df[~((df<lower_range)|(df>upper_range)).any(axis=1)]

Out[45]:
<AxesSubplot:>
ange=Q1-1.5*IQR
uppe






ck whether outlier removed or not
Out[54]:
<AxesSubplot:>

from above boxplot we say that some column outliers are removed but still some column have outliers.

Popular posts from this blog

deploying Machine learning Model : pkl, Flask,postman

1)Create model and train          #  importing Librarys         import pandas as pd         import numpy as np         import matplotlib . pyplot as plt         import seaborn as sns         import requests         from pickle import dump , load         # Load Dataset         url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"         names = [ "sepal_length" , "sepal_width" , "petal_length" , "petal_width" , "species" ]         # Loading Dataset         df = pd . read_csv ( url , names = names )         df . tail ( 11 )         df . columns         test = [         { 'sepal_length' : 5.1 , 'sepal_width' : 3.5 , 'peta...

Binomial Distribution

  The binomial distribution formula is:                                                    b(x; n, P) =  n C x  * P x  * (1 – P) n – x Where: b = binomial probability x = total number of “successes” (pass or fail, heads or tails etc.) P = probability of a success on an individual trial n = number of trials Note:  The binomial distribution formula can also be written in a slightly different way, because  n C x  = n! / x!(n – x)! (this binomial distribution formula uses factorials  (What is a factorial? ). “q” in this formula is just the probability of failure (subtract your probability of success from 1). Using the First Binomial Distribution Formula The binomial distribution formula can calculate the probability of success for binomial distributions. Often you’ll be told to “plug in” the numbers to the  formul...

cammand for installing library in python

 Command for installing in jupyter notebook:                pip install library_name                ex. pip install nump installing from anaconda prompt:           1. pip install numpy           2.   conda install -c conda-forge matplotlib search for conda command for matplotlib and go to official website. Installing from anaconda navigator easy. Somtime give error then open as administrator