Introduction
Email spam, also referred to as junk email or simply spam, is unsolicited messages sent in bulk by email. Email spam has steadily grown since the early 1990s, and by 2014 was estimated to account for around 90% of total email traffic.Most email spam messages are commercial in nature. Whether commercial or not, many are not only annoying as a form of attention theft, but also dangerous because they may contain links that lead to phishing web sites or sites that are hosting malware or include malware as file attachments. That's why its necessary to filter spam messages/mails to protect user. In this project we are going to use different classification algorithms to classify emails as spam or not spam. For that we have different features extracted from emails, these features include percentage of words, characters etc.
import pandas as pdnames = pd.read_csv('/content/names.csv')names.index = [x for x in range(1,58)]names.T| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variables | word_freq_make: | word_freq_address: | word_freq_all: | word_freq_3d: | word_freq_our: | word_freq_over: | word_freq_remove: | word_freq_internet: | word_freq_order: | word_freq_mail: | word_freq_receive: | word_freq_will: | word_freq_people: | word_freq_report: | word_freq_addresses: | word_freq_free: | word_freq_business: | word_freq_email: | word_freq_you: | word_freq_credit: | word_freq_your: | word_freq_font: | word_freq_000: | word_freq_money: | word_freq_hp: | word_freq_hpl: | word_freq_george: | word_freq_650: | word_freq_lab: | word_freq_labs: | word_freq_telnet: | word_freq_857: | word_freq_data: | word_freq_415: | word_freq_85: | word_freq_technology: | word_freq_1999: | word_freq_parts: | word_freq_pm: | word_freq_direct: | word_freq_cs: | word_freq_meeting: | word_freq_original: | word_freq_project: | word_freq_re: | word_freq_edu: | word_freq_table: | word_freq_conference: | char_freq_;: | char_freq_(: | char_freq_[: | char_freq_!: | char_freq_$: | char_freq_#: | capital_run_length_average: | capital_run_length_longest: | capital_run_length_total: |
| type | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. | continuous. |
Here first 48 variables are percentage of the words in the email.
100*(Number of times word appears in the email / Total number of words in the email)
The second 6 variables are percentage of the symboles in the email.
100*(Number of times character appears in the email / Total number of characters in the email)
capital_run_length_average defines the average length uninterrupted sequence of the capital letters.
capital_run_length_longest defines the length of longest uninterrupted sequence of the capital letters.
capital_run_length_total is the sum of lengths of uninterrupted sequence of the capital letters.
Class is the target variable [0: not spam,1: spam]
Columns are just the frequency of given words occured in the email and we predict the email is spam or not based on these columns
Dataset source : http://www.ics.uci.edu/~mlearn/MLRepository.html
Importing required libraries
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns# # getting columns names from names dataframe# final_var = [Var.split('_')[-1][:-1] for Var in names.Variables[:48] ]# final_var.extend([var[:-1] for var in names.Variables[48:]])# final_var.append('Class')#loading the datasetdf=pd.read_csv("/content/spam.csv")df.columns = range(1,59)df.head()| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.21 | 0.28 | 0.50 | 0.0 | 0.14 | 0.28 | 0.21 | 0.07 | 0.00 | 0.94 | 0.21 | 0.79 | 0.65 | 0.21 | 0.14 | 0.14 | 0.07 | 0.28 | 3.47 | 0.00 | 1.59 | 0.0 | 0.43 | 0.43 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.07 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.132 | 0.0 | 0.372 | 0.180 | 0.048 | 5.114 | 101 | 1028 | 1 |
| 1 | 0.06 | 0.00 | 0.71 | 0.0 | 1.23 | 0.19 | 0.19 | 0.12 | 0.64 | 0.25 | 0.38 | 0.45 | 0.12 | 0.00 | 1.75 | 0.06 | 0.06 | 1.03 | 1.36 | 0.32 | 0.51 | 0.0 | 1.16 | 0.06 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.06 | 0.0 | 0.0 | 0.12 | 0.0 | 0.06 | 0.06 | 0.0 | 0.0 | 0.01 | 0.143 | 0.0 | 0.276 | 0.184 | 0.010 | 9.821 | 485 | 2259 | 1 |
| 2 | 0.00 | 0.00 | 0.00 | 0.0 | 0.63 | 0.00 | 0.31 | 0.63 | 0.31 | 0.63 | 0.31 | 0.31 | 0.31 | 0.00 | 0.00 | 0.31 | 0.00 | 0.00 | 3.18 | 0.00 | 0.31 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.137 | 0.0 | 0.137 | 0.000 | 0.000 | 3.537 | 40 | 191 | 1 |
| 3 | 0.00 | 0.00 | 0.00 | 0.0 | 0.63 | 0.00 | 0.31 | 0.63 | 0.31 | 0.63 | 0.31 | 0.31 | 0.31 | 0.00 | 0.00 | 0.31 | 0.00 | 0.00 | 3.18 | 0.00 | 0.31 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.135 | 0.0 | 0.135 | 0.000 | 0.000 | 3.537 | 40 | 191 | 1 |
| 4 | 0.00 | 0.00 | 0.00 | 0.0 | 1.85 | 0.00 | 0.00 | 1.85 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.223 | 0.0 | 0.000 | 0.000 | 0.000 | 3.000 | 15 | 54 | 1 |
#cheaking datatypesdf.info()<class 'pandas.core.frame.DataFrame'> RangeIndex: 4600 entries, 0 to 4599 Data columns (total 58 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 make 4600 non-null float64 1 address 4600 non-null float64 2 all 4600 non-null float64 3 3d 4600 non-null float64 4 our 4600 non-null float64 5 over 4600 non-null float64 6 remove 4600 non-null float64 7 internet 4600 non-null float64 8 order 4600 non-null float64 9 mail 4600 non-null float64 10 receive 4600 non-null float64 11 will 4600 non-null float64 12 people 4600 non-null float64 13 report 4600 non-null float64 14 addresses 4600 non-null float64 15 free 4600 non-null float64 16 business 4600 non-null float64 17 email 4600 non-null float64 18 you 4600 non-null float64 19 credit 4600 non-null float64 20 your 4600 non-null float64 21 font 4600 non-null float64 22 000 4600 non-null float64 23 money 4600 non-null float64 24 hp 4600 non-null float64 25 hpl 4600 non-null float64 26 george 4600 non-null float64 27 650 4600 non-null float64 28 lab 4600 non-null float64 29 labs 4600 non-null float64 30 telnet 4600 non-null float64 31 857 4600 non-null float64 32 data 4600 non-null float64 33 415 4600 non-null float64 34 85 4600 non-null float64 35 technology 4600 non-null float64 36 1999 4600 non-null float64 37 parts 4600 non-null float64 38 pm 4600 non-null float64 39 direct 4600 non-null float64 40 cs 4600 non-null float64 41 meeting 4600 non-null float64 42 original 4600 non-null float64 43 project 4600 non-null float64 44 re 4600 non-null float64 45 edu 4600 non-null float64 46 table 4600 non-null float64 47 conference 4600 non-null float64 48 char_freq_; 4600 non-null float64 49 char_freq_( 4600 non-null float64 50 char_freq_[ 4600 non-null float64 51 char_freq_! 4600 non-null float64 52 char_freq_$ 4600 non-null float64 53 char_freq_# 4600 non-null float64 54 capital_run_length_average 4600 non-null float64 55 capital_run_length_longest 4600 non-null int64 56 capital_run_length_total 4600 non-null int64 57 Class 4600 non-null int64 dtypes: float64(55), int64(3) memory usage: 2.0 MB
dtypes: float64(57), int64(1)
# cheak shape of datasetdf.shape(4601, 58)
Rows=214 , columns(features)=57
# cheak null valuesdf.isna().sum().sum()0
No null values present in the dataset
data1 = df[df['Class']==1]data = data1.loc[:,:'conference']x = data.columnsy = [data[col].mean() for col in data.columns]plt.figure(figsize = [15,10])plt.title('Average percentage of the words in the spam emails',fontdict={'fontsize':18})sns.barplot(x = x, y = y)plt.xlabel('Words in the email',fontdict={'fontsize':14})plt.ylabel('Average percentage',fontdict={'fontsize':14})plt.xticks(rotation = 90)plt.show()data1 = df[df['Class']==1]# plot for charactersdata = data1.loc[:,'char_freq_;':'char_freq_#']x = data.columnsy = [data[col].mean() for col in data.columns]plt.title('Average percentage of the characters in the spam emails',fontdict={'fontsize':16})sns.barplot(x = x, y = y)plt.xlabel('characters in the email',fontdict={'fontsize':14})plt.ylabel('Average percentage',fontdict={'fontsize':14})plt.xticks(rotation = 90)plt.show()plt.figure(figsize=(8, 15))heatmap = sns.heatmap(df.corr()[['Class']].sort_values(by='Class', ascending=False), vmin=-1, vmax=1, annot=True, cmap='BrBG')heatmap.set_title('Features Correlating with Class', fontdict={'fontsize':18}, pad=16);df.columnsIndex(['make', 'address', 'all', '3d', 'our', 'over', 'remove', 'internet',
'order', 'mail', 'receive', 'will', 'people', 'report', 'addresses',
'free', 'business', 'email', 'you', 'credit', 'your', 'font', '000',
'money', 'hp', 'hpl', 'george', '650', 'lab', 'labs', 'telnet', '857',
'data', '415', '85', 'technology', '1999', 'parts', 'pm', 'direct',
'cs', 'meeting', 'original', 'project', 're', 'edu', 'table',
'conference', 'char_freq_;', 'char_freq_(', 'char_freq_[',
'char_freq_!', 'char_freq_$', 'char_freq_#',
'capital_run_length_average', 'capital_run_length_longest',
'capital_run_length_total', 'Class'],
dtype='object')df["Class"].unique()array([1., 0.])
# describe the datadf.describe()| make | address | all | 3d | our | over | remove | internet | order | ... | char_freq_; | char_freq_( | char_freq_[ | char_freq_! | char_freq_$ | char_freq_# | capital_run_length_average | capital_run_length_longest | capital_run_length_total | Class | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | ... | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 | 4600.000000 |
| mean | 0.023197 | 0.014910 | 0.055129 | 0.001529 | 0.031285 | 0.016313 | 0.015713 | 0.009480 | 0.017127 | 0.013172 | ... | 0.008799 | 0.014271 | 0.004161 | 0.008308 | 0.012632 | 0.002231 | 0.004327 | 0.007297 | 0.043472 | 0.393913 |
| std | 0.068614 | 0.090385 | 0.099240 | 0.032593 | 0.067393 | 0.046573 | 0.053849 | 0.036104 | 0.052974 | 0.035468 | ... | 0.055530 | 0.027732 | 0.026809 | 0.025175 | 0.040964 | 0.021655 | 0.038099 | 0.105952 | 1.292552 | 0.488669 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000536 | 0.000501 | 0.002146 | 0.000000 |
| 50% | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.006665 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.001159 | 0.001402 | 0.005934 | 0.000000 |
| 75% | 0.000000 | 0.000000 | 0.082353 | 0.000000 | 0.039000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.008801 | ... | 0.000000 | 0.019278 | 0.000000 | 0.009699 | 0.008662 | 0.000000 | 0.002458 | 0.004205 | 0.016730 | 1.000000 |
| max | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.250000 | 5.000000 | 78.000000 | 1.000000 |
8 rows × 58 columns
from sklearn.preprocessing import MinMaxScalermm = MinMaxScaler()df[:-2] = mm.fit_transform(df[:-2])df.head(3)| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.046256 | 0.019608 | 0.098039 | 0.0 | 0.014 | 0.047619 | 0.028886 | 0.006301 | 0.000000 | 0.051705 | 0.080460 | 0.081696 | 0.117117 | 0.021 | 0.031746 | 0.0070 | 0.009804 | 0.030803 | 0.185067 | 0.000000 | 0.143114 | 0.0 | 0.078899 | 0.0344 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.01016 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.013536 | 0.0 | 0.011454 | 0.029985 | 0.002421 | 0.003735 | 0.010012 | 0.064836 | 1.0 |
| 1 | 0.013216 | 0.000000 | 0.139216 | 0.0 | 0.123 | 0.032313 | 0.026135 | 0.010801 | 0.121673 | 0.013751 | 0.145594 | 0.046536 | 0.021622 | 0.000 | 0.396825 | 0.0030 | 0.008403 | 0.113311 | 0.072533 | 0.017602 | 0.045905 | 0.0 | 0.212844 | 0.0048 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00000 | 0.0 | 0.0 | 0.012605 | 0.0 | 0.0 | 0.033613 | 0.0 | 0.002801 | 0.002721 | 0.0 | 0.0 | 0.002281 | 0.014664 | 0.0 | 0.008498 | 0.030651 | 0.000504 | 0.008008 | 0.048458 | 0.142551 | 1.0 |
| 2 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.063 | 0.000000 | 0.042641 | 0.056706 | 0.058935 | 0.034653 | 0.118774 | 0.032058 | 0.055856 | 0.000 | 0.000000 | 0.0155 | 0.000000 | 0.000000 | 0.169600 | 0.000000 | 0.027903 | 0.0 | 0.000000 | 0.0000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.014048 | 0.0 | 0.004218 | 0.000000 | 0.000000 | 0.002303 | 0.003905 | 0.011995 | 1.0 |
df["Class"].value_counts()0.0 2788 1.0 1812 Name: Class, dtype: int64
sns.catplot(data=df,x="Class",kind="count")<seaborn.axisgrid.FacetGrid at 0x7feab0e48150>
# asign the x and yx=df.drop(columns=58)y=df[58]# split the datafrom sklearn.model_selection import train_test_splitxtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=1)from sklearn.linear_model import LogisticRegressionmodel=LogisticRegression()model.fit(xtrain,ytrain)ypred=model.predict(xtest)from sklearn.metrics import confusion_matrix,classification_report,accuracy_scorefrom sklearn.metrics import precision_recall_fscore_support as scoreacc=accuracy_score(ytest,ypred)print("Accuracy is :",acc)cm=confusion_matrix(ytest,ypred)print(cm)print(classification_report(ytest,ypred))sns.heatmap(cm,annot=True)plt.show()Accuracy is : 0.7869565217391304
[[381 182]
[ 14 343]]
precision recall f1-score support
0.0 0.96 0.68 0.80 563
1.0 0.65 0.96 0.78 357
accuracy 0.79 920
macro avg 0.81 0.82 0.79 920
weighted avg 0.84 0.79 0.79 920
Here by Using Logistic Regression we got the accuracy 0.899022. to get the better accuracy we go with 'Hyper Parameter Tunning'.
#model import warningswarnings.filterwarnings("ignore")model=LogisticRegression()#Parameterspenalty =['l1', 'l2', 'elasticnet']C=[10,1,0.1,0.001,0.0001]solver=['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']#grid grid=dict(solver=solver,C=C,penalty=penalty)#cv from sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=10,n_repeats=3,random_state=1)#Grid Search cvfrom sklearn.model_selection import GridSearchCVgridcv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring="accuracy",error_score=0)result=gridcv.fit(x,y)print(result.best_score_)print(result.best_params_)0.9243662171083656
{'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Here by Hyper Parameter Tunning we got the accuracy 0.924366 with the best parameter
model=LogisticRegression(C= 10, penalty= 'l1', solver='liblinear')model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation lr_pre,lr_recall,lr_fsc,support=score(ytest,ypred,average='macro')lr_acc=accuracy_score(ytest,ypred)print("Accuracy is :",acc)cm=confusion_matrix(ytest,ypred)print(cm)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9510869565217391
[[534 29]
[ 44 313]]
precision recall f1-score support
0.0 0.92 0.95 0.94 563
1.0 0.92 0.88 0.90 357
accuracy 0.92 920
macro avg 0.92 0.91 0.92 920
weighted avg 0.92 0.92 0.92 920
Retraining the logistic regression model on best parameters Accuracy is : 0.8990228013029316
from sklearn.naive_bayes import GaussianNBGmodel=GaussianNB()Gmodel.fit(xtrain,ytrain)ypred=Gmodel.predict(xtest)#Model Evaluation NB_pre,NB_recall,NB_fsc,support=score(ytest,ypred,average='macro')NB_acc = accuracy_score(ytest,ypred)print("Accuracy is :",accuracy_score(ytest,ypred))print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is : 0.7869565217391304
precision recall f1-score support
0.0 0.96 0.68 0.80 563
1.0 0.65 0.96 0.78 357
accuracy 0.79 920
macro avg 0.81 0.82 0.79 920
weighted avg 0.84 0.79 0.79 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6abffb90>
After the performing Naive Bayes classification we got the accuracy is 0.7926167209554832
from sklearn.svm import SVCmodel=SVC()model.fit(xtrain,ytrain)ypred=model.predict(xtest)#evalutionacc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.9207383279044516
precision recall f1-score support
0.0 0.90 0.98 0.94 564
1.0 0.96 0.83 0.89 357
accuracy 0.92 921
macro avg 0.93 0.90 0.91 921
weighted avg 0.92 0.92 0.92 921
<matplotlib.axes._subplots.AxesSubplot at 0x7feaa01e6110>
After Performing the Support Vector Machines algorithm Accuracy is: 0.9207383279044516
#modelmodel=SVC()#parameterskernel=['linear','poly','rbf','sigmoid']C=[1,0.1,0.01,0.001]gamma=['scale', 'auto']#gridgrid=dict(kernel=kernel,C=C,gamma=gamma)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)from sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring="accuracy")#resultres=grid_cv.fit(xtrain,ytrain)print(res.best_params_)print(res.best_score_){'C': 1, 'gamma': 'scale', 'kernel': 'rbf'}
0.9031702898550723
After the hyper parameter tunning of SVM we get best parameter 'C': 1, 'gamma': 'scale', 'kernel': 'rbf' with accuracy is 0.9031702898550723
# For best parameterfrom sklearn.svm import SVCmodel=SVC(C= 1, gamma='scale', kernel='rbf')model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation SVM_pre,SVM_recall,SVM_fsc,support=score(ytest,ypred,average='macro')SVM_acc = accuracy_score(ytest,ypred)acc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.9043478260869565
precision recall f1-score support
0.0 0.89 0.96 0.92 563
1.0 0.93 0.82 0.87 357
accuracy 0.90 920
macro avg 0.91 0.89 0.90 920
weighted avg 0.91 0.90 0.90 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6ab97690>
After the Retraining the logistic regression model on best parameters Accuracy is: 0.9207383279044516
from sklearn.neighbors import KNeighborsClassifiermodel=KNeighborsClassifier(n_neighbors=5)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation acc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.925
precision recall f1-score support
0.0 0.92 0.96 0.94 563
1.0 0.94 0.86 0.90 357
accuracy 0.93 920
macro avg 0.93 0.91 0.92 920
weighted avg 0.93 0.93 0.92 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e7f2b1790>
using the KNN algorithm Accuracy is: 0.8946796959826275
#model model=KNeighborsClassifier()#parameter grid#1. n_neighbors #2.weights#3.Metricn_neighbors=range(1,31)weights =['uniform', 'distance']metric=["minkowski","euclidean","manhattan"]grid=dict(n_neighbors=n_neighbors,weights=weights,metric=metric)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)#GridSearchCVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring="accuracy")res=grid_cv.fit(xtrain,ytrain)print(res.best_params_)print(res.best_score_){'metric': 'manhattan', 'n_neighbors': 6, 'weights': 'distance'}
0.9193840579710144
from sklearn.neighbors import KNeighborsClassifiermodel=KNeighborsClassifier(n_neighbors=10,metric='manhattan',weights='distance')model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation KNN_pre,KNN_recall,KNN_fsc,support=score(ytest,ypred,average='macro')KNN_acc = accuracy_score(ytest,ypred)acc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.925
precision recall f1-score support
0.0 0.92 0.96 0.94 563
1.0 0.94 0.86 0.90 357
accuracy 0.93 920
macro avg 0.93 0.91 0.92 920
weighted avg 0.93 0.93 0.92 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6a91d850>
Retraining the logistic regression model on best parameters Accuracy is: 0.9163952225841476
from sklearn.tree import DecisionTreeClassifiermodel=DecisionTreeClassifier()model.fit(xtrain,ytrain)ypred=model.predict(xtest)# evaluationacc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.9174809989142236
precision recall f1-score support
0.0 0.93 0.93 0.93 564
1.0 0.90 0.89 0.89 357
accuracy 0.92 921
macro avg 0.91 0.91 0.91 921
weighted avg 0.92 0.92 0.92 921
<matplotlib.axes._subplots.AxesSubplot at 0x7fea8eb2f310>
Using Decision Tree Algorithm Accuracy is: 0.9174809989142236
#modelmodel=DecisionTreeClassifier()criterion =["gini", "entropy"]splitter =["best", "random"]max_features = ["auto", "sqrt", "log2"]max_depth=range(1,11)#parametersgrid=dict(criterion=criterion,splitter=splitter,max_depth=max_depth,max_features=max_features)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=10,n_repeats=3,random_state=1)#Grid Search CVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring="accuracy")res=grid_cv.fit(xtrain,ytrain)print(res.best_params_)print(res.best_score_)from sklearn.tree import DecisionTreeClassifiermodel=DecisionTreeClassifier(criterion='gini', max_depth=10, max_features='sqrt', splitter='best')model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation DT_pre,DT_recall,DT_fsc,support=score(ytest,ypred,average='macro')DT_acc = accuracy_score(ytest,ypred)acc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.8804347826086957
precision recall f1-score support
0.0 0.89 0.91 0.90 563
1.0 0.86 0.83 0.84 357
accuracy 0.88 920
macro avg 0.88 0.87 0.87 920
weighted avg 0.88 0.88 0.88 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6ab97450>
Retraining the model on best parameter Accuracy is: 0.8957654723127035
from sklearn.ensemble import BaggingClassifiermodel=BaggingClassifier()model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation print("accuracy is :",accuracy_score(ytest,ypred))print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)accuracy is : 0.9370249728555917
precision recall f1-score support
0.0 0.94 0.96 0.95 564
1.0 0.94 0.90 0.92 357
accuracy 0.94 921
macro avg 0.94 0.93 0.93 921
weighted avg 0.94 0.94 0.94 921
<matplotlib.axes._subplots.AxesSubplot at 0x7fea8e9f38d0>
Using Bagging Metaestimator algorithm accuracy is : 0.9370249728555917
#model model=BaggingClassifier()n_estimators =[10,50,100,1000]#gridgrid=dict(n_estimators=n_estimators)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)#GridSearchCVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring='accuracy')#resultsres=grid_cv.fit(xtrain,ytrain)print("best parameters are :",res.best_params_)print("best accuracy is :",res.best_score_)from sklearn.ensemble import BaggingClassifiermodel=BaggingClassifier( n_estimators= 1000)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation BM_pre,BM_recall,BM_fsc,support=score(ytest,ypred,average='macro')BM_acc = accuracy_score(ytest,ypred)from sklearn.metrics import accuracy_score,confusion_matrix,classification_reportprint("accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))accuracy is : 0.9543478260869566
precision recall f1-score support
0.0 0.96 0.97 0.96 563
1.0 0.95 0.93 0.94 357
accuracy 0.95 920
macro avg 0.95 0.95 0.95 920
weighted avg 0.95 0.95 0.95 920
After Retraining the model on best parameters accuracy is 0.9467969598262758
from sklearn.ensemble import RandomForestClassifiermodel=RandomForestClassifier()model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Model Evaluation print("accuracy is :",accuracy_score(ytest,ypred))print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)accuracy is : 0.9609120521172638
precision recall f1-score support
0.0 0.96 0.98 0.97 564
1.0 0.97 0.93 0.95 357
accuracy 0.96 921
macro avg 0.96 0.96 0.96 921
weighted avg 0.96 0.96 0.96 921
<matplotlib.axes._subplots.AxesSubplot at 0x7feabce4ccd0>
Using Random Forest Algorithm Accuracy is 0.9609120521172638
#model model=RandomForestClassifier()n_estimators =[10,50,100,1000]criterion =["gini", "entropy"]max_features =["auto", "sqrt", "log2"]#gridgrid=dict(n_estimators=n_estimators,criterion=criterion,max_features=max_features)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)#GridSearchCVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring='accuracy')#resultsres=grid_cv.fit(xtrain,ytrain)print("best parameters are :",res.best_params_)print("best accuracy is :",res.best_score_)best parameters are : {'criterion': 'entropy', 'max_features': 'log2', 'n_estimators': 1000}
best accuracy is : 0.9520833333333332
By Hyper Parameter tunning of the random forest best Accuracy is best accuracy is : 0.9520833333333332 with best parameter are {'criterion': 'entropy', 'max_features': 'log2', 'n_estimators': 1000}
model=RandomForestClassifier(criterion='entropy',max_features='log2',n_estimators=1000)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#evaluationRF_pre,RF_recall,RF_fsc,support=score(ytest,ypred,average='macro')RF_acc = accuracy_score(ytest,ypred)acc=accuracy_score(ytest,ypred)print("Accuracy is: ",acc)print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is: 0.9554347826086956
precision recall f1-score support
0.0 0.95 0.98 0.96 563
1.0 0.96 0.92 0.94 357
accuracy 0.96 920
macro avg 0.96 0.95 0.95 920
weighted avg 0.96 0.96 0.96 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6acc2510>
After Retraining the data on best parameters Accuracy is Accuracy is: 0.9598262757871878
from sklearn.ensemble import AdaBoostClassifiermodel=AdaBoostClassifier()model.fit(xtrain,ytrain)ypred=model.predict(xtest)from sklearn.metrics import classification_report, accuracy_score,confusion_matrixprint("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9348534201954397
precision recall f1-score support
0.0 0.95 0.95 0.95 564
1.0 0.91 0.92 0.92 357
accuracy 0.93 921
macro avg 0.93 0.93 0.93 921
weighted avg 0.93 0.93 0.93 921
After Using Adaboost Algorithm Accuracy is : 0.9348534201954397
#model model=AdaBoostClassifier()n_estimators =[10,50,100,1000]learning_rate =[0.1,1]algorithm =["SAMME", "SAMME.R"]#gridgrid=dict(n_estimators=n_estimators,learning_rate=learning_rate,algorithm=algorithm)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)#GridSearchCVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring='accuracy')#resultsres=grid_cv.fit(xtrain,ytrain)print("best parameters are :",res.best_params_)print("best accuracy is :",res.best_score_)from sklearn.ensemble import AdaBoostClassifiermodel=AdaBoostClassifier(algorithm= 'SAMME.R',learning_rate=0.1, n_estimators= 1000)model.fit(xtrain,ytrain)ypred=model.predict(xtest)Ada_pre,Ada_recall,Ada_fsc,support=score(ytest,ypred,average='macro')Ada_acc = accuracy_score(ytest,ypred)print("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9478260869565217
precision recall f1-score support
0.0 0.95 0.96 0.96 563
1.0 0.94 0.93 0.93 357
accuracy 0.95 920
macro avg 0.95 0.94 0.94 920
weighted avg 0.95 0.95 0.95 920
After Retraining the logistic regression model on best parameters Accuracy is 0.9500542888165038
from sklearn.ensemble import GradientBoostingClassifiermodel=GradientBoostingClassifier(n_estimators=100)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#EvaluationGradBoost_pre,GradBoost_recall,GradBoost_fsc,support=score(ytest,ypred,average='macro')GradBoost_acc = accuracy_score(ytest,ypred)print("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9445652173913044
precision recall f1-score support
0.0 0.95 0.96 0.96 563
1.0 0.94 0.91 0.93 357
accuracy 0.94 920
macro avg 0.94 0.94 0.94 920
weighted avg 0.94 0.94 0.94 920
After using the Gradient Boost Classifier algorithm Accuracy is 0.9511400651465798
from xgboost import XGBClassifiermodel=XGBClassifier()model.fit(xtrain,ytrain)ypred=model.predict(xtest)print("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9445652173913044
precision recall f1-score support
0.0 0.95 0.96 0.96 563
1.0 0.94 0.91 0.93 357
accuracy 0.94 920
macro avg 0.94 0.94 0.94 920
weighted avg 0.94 0.94 0.94 920
After Using the XGboost Classifier Algorithm Accuracy is 0.9457111834961998
#model model=XGBClassifier()n_estimators =[10,50,100]learning_rate =[0.1,1]#gridgrid=dict(n_estimators=n_estimators,learning_rate=learning_rate)#cvfrom sklearn.model_selection import RepeatedStratifiedKFoldcv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)#GridSearchCVfrom sklearn.model_selection import GridSearchCVgrid_cv=GridSearchCV(estimator=model,param_grid=grid,cv=cv,scoring='accuracy')#resultsres=grid_cv.fit(xtrain,ytrain)print("best parameters are :",res.best_params_)print("best accuracy is :",res.best_score_)from xgboost import XGBClassifiermodel=XGBClassifier(learning_rate= 1,n_estimators= 100)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#EvaluationXGB_pre,XGB_recall,XGB_fsc,support=score(ytest,ypred,average='macro')XGB_acc = accuracy_score(ytest,ypred)print("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.9456521739130435
precision recall f1-score support
0.0 0.95 0.96 0.96 563
1.0 0.93 0.92 0.93 357
accuracy 0.95 920
macro avg 0.94 0.94 0.94 920
weighted avg 0.95 0.95 0.95 920
After Retraining the logistic regression model on best parameters Accuracy is 0.9565689467969598
from sklearn.linear_model import LogisticRegressionfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.naive_bayes import GaussianNBfrom sklearn.svm import SVCfrom sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifiermodels=[ ("lr",LogisticRegression()), ("knn",KNeighborsClassifier(n_neighbors=5)), ("GNB",GaussianNB()), ("RF",RandomForestClassifier(n_estimators=50)), ('ABC',AdaBoostClassifier(n_estimators=50)), ('GBC',GradientBoostingClassifier(n_estimators=50)), ('SVM',SVC(C=0.1,probability=True))]from sklearn.ensemble import VotingClassifiermodel=VotingClassifier(estimators=models,voting="soft")model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Evaluationvoting_pre,voting_recall,voting_fsc,support=score(ytest,ypred,average='macro')voting_acc = accuracy_score(ytest,ypred)print("Accuracy is :",accuracy_score(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)print(classification_report(ytest,ypred))Accuracy is : 0.95
precision recall f1-score support
0 0.96 0.96 0.96 563
1 0.93 0.94 0.94 357
accuracy 0.95 920
macro avg 0.95 0.95 0.95 920
weighted avg 0.95 0.95 0.95 920
After Using the Voting Classifier Accuracy is 0.9511400651465798
from sklearn.neighbors import KNeighborsClassifierfrom sklearn.svm import SVCfrom sklearn.tree import DecisionTreeClassifier#Base modelsbase_models=[ ('knn',KNeighborsClassifier(n_neighbors=5)), ('svm',SVC(C=0.1,kernel='linear')), ('DT',DecisionTreeClassifier())]#Final Modelfrom sklearn.linear_model import LogisticRegressionfinal_model=LogisticRegression()#Stacking Classifierfrom sklearn.ensemble import StackingClassifiermodel=StackingClassifier(estimators=base_models,final_estimator=final_model)model.fit(xtrain,ytrain)ypred=model.predict(xtest)#Evaluationstacking_pre,stacking_recall,stacking_fsc,support=score(ytest,ypred,average='macro')stacking_acc = accuracy_score(ytest,ypred)print("Accuracy is ",accuracy_score(ytest,ypred))print(classification_report(ytest,ypred))cm=confusion_matrix(ytest,ypred)sns.heatmap(cm,annot=True)Accuracy is 0.9271739130434783
precision recall f1-score support
0.0 0.93 0.95 0.94 563
1.0 0.92 0.89 0.90 357
accuracy 0.93 920
macro avg 0.93 0.92 0.92 920
weighted avg 0.93 0.93 0.93 920
<matplotlib.axes._subplots.AxesSubplot at 0x7f4e6a7c5910>
After using Stacking Algorithm Accuracy is 0.9305103148751357
models = ['Logistic Regression','Naive Bayes','SVM','KNN','Decision Tree','Bagging metaestimator','Random forest','AdaBoost','Gradient Boost','XGBoost','Voting classifier','stacking Classifier']accuracy = [lr_acc,NB_acc,SVM_acc,KNN_acc,DT_acc,BM_acc,RF_acc,Ada_acc,GradBoost_acc,XGB_acc,voting_acc,stacking_acc]precision=[lr_pre,NB_pre,SVM_pre,KNN_pre,DT_pre,BM_pre,RF_pre,Ada_pre,GradBoost_pre,XGB_pre,voting_pre,stacking_pre]recall = [lr_recall,NB_recall,SVM_recall,KNN_recall,DT_recall,BM_recall,RF_recall,Ada_recall,GradBoost_recall,XGB_recall,voting_recall,stacking_recall]fscore = [lr_fsc,NB_fsc,SVM_fsc,KNN_fsc,DT_fsc,BM_fsc,RF_fsc,Ada_fsc,GradBoost_fsc,XGB_fsc,voting_fsc,stacking_fsc]Evaluation = pd.DataFrame({'No.':[x+1 for x in range(len(models))],'Model':models,'Accuracy':accuracy,'Precision':precision,'Recall':recall,'fscore':fscore})Evaluation.style.highlight_max(subset = ['Accuracy'],color = 'lightgreen')| No. | Model | Accuracy | Precision | Recall | fscore | |
|---|---|---|---|---|---|---|
| 0 | 1 | Logistic Regression | 0.920652 | 0.919540 | 0.912620 | 0.915793 |
| 1 | 2 | Naive Bayes | 0.786957 | 0.808945 | 0.818758 | 0.786592 |
| 2 | 3 | SVM | 0.904348 | 0.909773 | 0.888537 | 0.896853 |
| 3 | 4 | KNN | 0.925000 | 0.928127 | 0.913611 | 0.919765 |
| 4 | 5 | Decision Tree | 0.880435 | 0.875942 | 0.871049 | 0.873322 |
| 5 | 6 | Bagging metaestimator | 0.954348 | 0.953269 | 0.950401 | 0.951788 |
| 6 | 7 | Random forest | 0.955435 | 0.956214 | 0.949751 | 0.952758 |
| 7 | 8 | AdaBoost | 0.947826 | 0.945911 | 0.944047 | 0.944958 |
| 8 | 9 | Gradient Boost | 0.944565 | 0.944095 | 0.938821 | 0.941299 |
| 9 | 10 | XGBoost | 0.945652 | 0.943613 | 0.941759 | 0.942665 |
| 10 | 11 | Voting classifier | 0.950000 | 0.946590 | 0.948386 | 0.947467 |
| 11 | 12 | stacking Classifier | 0.927174 | 0.925117 | 0.921024 | 0.922967 |
plt.figure(figsize = (15,8))sns.barplot(x=accuracy, y=models)plt.xlabel('Accuracy',fontdict={'fontsize':16})plt.ylabel('Models',fontdict={'fontsize':16})plt.title('Evaluation of models',fontdict={'fontsize':18})Text(0.5, 1.0, 'Evaluation of models')
From the Above barplot it seems that Random Forest and Bagging are doing well on the data¶
- Get link
- X
- Other Apps
- Get link
- X
- Other Apps