Box plot: Plot, explained and Robust scaler.

Rajan Lagah
2 min readAug 13, 2020

--

Box plot give very good understanding about data and also tell about outliers. Lets see how.

Plot box plot

import seaborn as sns

sns.set(style="whitegrid")
data = [0.,1,12,15,11,0.,20.,31.,34,22,70,22,26.],

ax = sns.boxplot(data)

But this did not explain much. Let me mark the details

We will sort the list first then

So the sorted above array is

[ 0. ,0. ,1., 11., 12., 15., 20., 22., 22., 26., 31., 34., 70.]

Q2 => It is median of whole the data ( which is 20 )

Q1 => It is median of data from smallest (1st in sorted list ) value to Q2 ( which is 11.)

Q3 => It is median of data from largest (last in sorted list ) value to Q3 ( which is 26 )

IQR => Difference between Q3 and Q1 (Q3-Q1 .i.e 26–11 = 15)

Then We calculate

Q1- 1.5*IQR => Everything smaller than this in data will be consider as outlier.

Q3+ 1.5*IQR => Everything larger than this in data will be consider as outlier.

Example 70 is outlier as

Q3+ 1.5*IQR => 23 + 1.5*15 => 45.5

So 70 > 45.5 So it will be considered as outlier.

Similarly

Q1- 1.5*IQR => 11- 1.5*15 => -11.5

Anything less than 11.5 in our data will be consider as outlier

If still confuse watch this video for more deep information.

Robust Scaler

We have lot of options available to scale our data with in scikit learn. Like MinMaxScaler in which we subtract minimum value and then divide every value with maximum value.

But why Robust Scaler ?

One of the option is Robust Scaler it is different from other as

it have 2 step

  • Subtract the median from data
  • divide each value with IQR (Q3-Q1)

But still why ???

In Robust Scaler as you can see we are dividing each value with IQR(15) not with maximum value(70) like MinMaxScaler, then 70 will make 0 to most of values. So by this we are removing the effect of outliers while scaling.

Implementation

from sklearn.preprocessing import RobustScalerdata =  [0.,1,12,15,11,0.,20.,31.,34,22,70,22,26.]transformer = RobustScaler().fit(data)
scaledData = transformer.transform(data)

To know more about scalers visit this

Thank you …

If you like please applaud and if you have any suggestion for improvement please mail me to rajanlagah@gmail.com

--

--