Skip to main content

Posts

Showing posts from August, 2021

Remove Outliers with IQR

   Interquartile Range(IQR) Steps: 1:  Put the numbers in order. 1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27. 2:  Find the median. 1, 2, 5, 6, 7 , 9 , 12, 15, 18, 19, 27. 3:  Place parentheses around the numbers above and below the median. Not necessary  statistically , but it makes Q1 and Q3 easier to spot. (1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27). 4:  Find Q1 and Q3 Think of Q1 as a median in the lower half of the data and think of Q3 as a median for the upper half of data. (1, 2,  5 , 6, 7) ,  9 , ( 12, 15,  18 , 19, 27). Q1 = 5 and Q3 = 18. 5  Subtract Q1 from Q3 to find the interquartile range . 18 – 5 = 13. What if I Have an Even Set of Numbers? Example question : Find the IQR for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21. Step 1:  Put the numbers in order . 3, 5, 7, 8, 9, 11, 15, 16, 20, 21. Step 2:  Make a mark in the center of the data : 3, 5, 7, 8, 9,  |  11, 15, 16, 20, 21. Step 3:  Place...

Kaggle dataset download in colab

  import  os os.environ[ 'KAGGLE_CONFIG_DIR' ]= '/content/drive/MyDrive/Kaggle_Data' %cd  /content/drive/MyDrive/Kaggle_Data /content/drive/MyDrive/Kaggle_Data !ls kaggle.json ! kaggle competitions download -c  30 -days-of-ml # copy api from kaggel 2s ! kaggle competitions download -c  30 -days-of-ml Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.12 / client 1.5.4) Downloading sample_submission.csv.zip to /content/drive/My Drive/Kaggle_Data 0% 0.00/470k [00:00<?, ?B/s] 100% 470k/470k [00:00<00:00, 32.6MB/s] Downloading test.csv.zip to /content/drive/My Drive/Kaggle_Data 87% 22.0M/25.2M [00:00<00:00, 104MB/s] 100% 25.2M/25.2M [00:00<00:00, 98.3MB/s] Downloading train.csv.zip to /content/drive/My Drive/Kaggle_Data 77% 31.0M/40.3M [00:00<00:00, 103MB/s] 100% 40.3M/40.3M [00:00<00:00, 100MB/s] ! mv train.csv.zip  30 -days-of-ml # moving file into folder...