Box plot: Plot, explained and Robust scaler.
Box plot give very good understanding about data and also tell about outliers. Lets see how.
Plot box plot
import seaborn as sns
sns.set(style="whitegrid")
data = [0.,1,12,15,11,0.,20.,31.,34,22,70,22,26.],
ax = sns.boxplot(data)
But this did not explain much. Let me mark the details
We will sort the list first then
So the sorted above array is
[ 0. ,0. ,1., 11., 12., 15., 20., 22., 22., 26., 31., 34., 70.]
Q2 => It is median of whole the data ( which is 20 )
Q1 => It is median of data from smallest (1st in sorted list ) value to Q2 ( which is 11.)
Q3 => It is median of data from largest (last in sorted list ) value to Q3 ( which is 26 )
IQR => Difference between Q3 and Q1 (Q3-Q1 .i.e 26–11 = 15)
Then We calculate
Q1- 1.5*IQR => Everything smaller than this in data will be consider as outlier.
Q3+ 1.5*IQR => Everything larger than this in data will be consider as outlier.
Example 70 is outlier as
Q3+ 1.5*IQR => 23 + 1.5*15 => 45.5
So 70 > 45.5 So it will be considered as outlier.
Similarly
Q1- 1.5*IQR => 11- 1.5*15 => -11.5
Anything less than 11.5 in our data will be consider as outlier
If still confuse watch this video for more deep information.
Robust Scaler
We have lot of options available to scale our data with in scikit learn. Like MinMaxScaler in which we subtract minimum value and then divide every value with maximum value.
But why Robust Scaler ?
One of the option is Robust Scaler it is different from other as
it have 2 step
- Subtract the median from data
- divide each value with IQR (Q3-Q1)
But still why ???
In Robust Scaler as you can see we are dividing each value with IQR(15) not with maximum value(70) like MinMaxScaler, then 70 will make 0 to most of values. So by this we are removing the effect of outliers while scaling.
Implementation
from sklearn.preprocessing import RobustScalerdata = [0.,1,12,15,11,0.,20.,31.,34,22,70,22,26.]transformer = RobustScaler().fit(data)
scaledData = transformer.transform(data)
To know more about scalers visit this
Thank you …
If you like please applaud and if you have any suggestion for improvement please mail me to rajanlagah@gmail.com