Interquartile Range(IQR)
Steps:
- 1: Put the numbers in order.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27. - 2: Find the median.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27. - 3: Place parentheses around the numbers above and below the median.
Not necessary statistically, but it makes Q1 and Q3 easier to spot.
(1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27). - 4: Find Q1 and Q3
Think of Q1 as a median in the lower half of the data and think of Q3 as a median for the upper half of data.
(1, 2, 5, 6, 7), 9, ( 12, 15, 18, 19, 27). Q1 = 5 and Q3 = 18. - 5 Subtract Q1 from Q3 to find the interquartile range.
18 – 5 = 13.
What if I Have an Even Set of Numbers?
Example question: Find the IQR for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21.
- Step 1: Put the numbers in order.
3, 5, 7, 8, 9, 11, 15, 16, 20, 21. - Step 2: Make a mark in the center of the data:
3, 5, 7, 8, 9, | 11, 15, 16, 20, 21. - Step 3: Place parentheses around the numbers above and below the mark you made in Step 2–it makes Q1 and Q3 easier to spot.
(3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). - Step 4: Find Q1 and Q3
Q1 is the median (the middle) of the lower half of the data, and Q3 is the median (the middle) of the upper half of the data.
(3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). Q1 = 7 and Q3 = 16. - Step 5: Subtract Q1 from Q3.
16 – 7 = 9.
This is your IQR.
Steps to perform Outlier Detection by identifying the lowerbound and upperbound of the data:
1. Arrange your data in ascending order
2. Calculate Q1 ( the first Quarter)
3. Calculate Q3 ( the third Quartile)
4. Find IQR = (Q3 - Q1)
5. Find the lower Range = Q1 -(1.5 * IQR)
6. Find the upper Range = Q3 + (1.5 * IQR)
1. Arrange your data in ascending order
2. Calculate Q1 ( the first Quarter)
3. Calculate Q3 ( the third Quartile)
4. Find IQR = (Q3 - Q1)
5. Find the lower Range = Q1 -(1.5 * IQR)
6. Find the upper Range = Q3 + (1.5 * IQR)
for Body fat dataset we using IQR method
#your code here
df.boxplot(figsize=(12,8))
# to remove outliers we using IQR Method
Q1=df.quantile(0.25)
Q3=df.quantile(0.75)
IQR=Q3-Q1
lower_range=Q1-1.5*IQR
upper_range=Q3+1.5*IQR
df_out=df[~((df<lower_range)|(df>upper_range)).any(axis=1)]
Out[45]:
Out[54]: