Enhanced Support Vector Machine Methods Using Stochastic Gradient Descent and Its Application to Heart Disease Dataset

Main Article Content

Ghadeer Mahdi
Seror Faeq Mohammed
Md Kamrul Hasan Khan

Abstract

Support Vector Machines (SVMs) are supervised learning models used to examine data sets in order to classify or predict dependent variables. SVM is typically used for classification by determining the best hyperplane between two classes. However, working with huge datasets can lead to a number of problems, including time-consuming and inefficient solutions. This research updates the SVM by employing a stochastic gradient descent method. The new approach, the extended stochastic gradient descent SVM (ESGD-SVM), was tested on two simulation datasets. The proposed method was compared with other classification approaches such as logistic regression, naive model, K Nearest Neighbors and Random Forest. The results show that the ESGD-SVM has a very high accuracy and is quite robust. ESGD-SVM is used to analyze the heart disease dataset downloaded from Harvard Dataverse. The entire analysis was performed using the program R version 4.3.

Article Details

How to Cite
[1]
Mahdi, G. et al. 2024. Enhanced Support Vector Machine Methods Using Stochastic Gradient Descent and Its Application to Heart Disease Dataset. Ibn AL-Haitham Journal For Pure and Applied Sciences. 37, 1 (Jan. 2024), 412–428. DOI:https://doi.org/10.30526/37.1.3467.
Section
Mathematics

Publication Dates

References

Zou, X.; Hu, Y.; Tian, Z.; Shen, K. Logistic regression model optimization and case analysis. IEEE 7th international conference on computer science and network technology (ICCSNT) 2019, 7, 135-139.

Liaw, A.; Wiener, M. Classification and regression by random Forest. R news. 2002, 3,18-22.

Khorshid, S.F.; Abdulazeez, A.M. Breast cancer diagnosis based on k-nearest neighbors: a review. PalArch's Journal of Archaeology of Egypt/Egyptology. 2021, 18, 1927-51.

Chen, S.; Webb, G.I.; Liu, L.; Ma, X. A. Novel selective naïve Bayes algorithm. Knowledge-Based Systems. 2020, 192, 105361

Choubey, D.K.; Kumar, M.; Shukla, V.; Tripathi, S.; Dhandhania, V.K.; Comparative analysis of classification methods with PCA and LDA for diabetes. Current diabetes reviews. 2020, 16, 833-50. DOI: https://doi.org/10.2174/1573399816666200123124008

Tawfiq, L.N.; Rashid, T.A. On Comparison Between Radial Basis Function and Wavelet Basis Functions Neural Networks. Ibn AL-Haitham Journal For Pure and Applied Science. 2017, 23, 184-92.

Zhi, J.; Sun, J.; Wang, Z.; Ding, W. Support vector machine classifier for prediction of the metastasis of colorectal cancer. Int J Mol Med. 2018, 41, 1419-26. DOI: https://doi.org/10.3892/ijmm.2018.3359

Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing. 2020, 408, 189-215.

Hekmatmanesh, A.; Wu, H.; Jamaloo, F.; Li, M.; Handroos, H. A combination of CSP-based method with soft margin SVM classifier and generalized RBF kernel for imagery-based brain computer interface applications. Multimedia Tools and Applications. 2020, 79, 17521-49.

Wang, Y.; Yu, W.; Fang, Z. Multiple kernel-based SVM classification of hyperspectral images by combining spectral, spatial, and semantic information. Remote Sensing. 2020, 12, 120.

RAHEEM, S. H; KALAF, B. A.; SALMAN, A. N. Comparison of Some of Estimation methods of Stress-Strength Model: R= P (Y< X< Z). Baghdad Science Journal, 2021, 18.2, 1103-1103.‏

JEBUR, I. G.; KALAF, B. A.; SALMAN, A. N. An efficient shrinkage estimators for generalized inverse rayleigh distribution based on bounded and series stress-strength models. In: Journal of Physics: Conference Series. IOP Publishing, 2021, 012054.‏

Mahdi, G.J.; Mohammed, N.J.; Al-Sharea, Z.I. Regression shrinkage and selection variables via an adaptive elastic net model. In Journal of Physics: Conference Series 2021, 1879, 032014.

Qingyang, Z.; Ghadeer, M.; Jian, T.; Hao, C.; A graph-based multi-sample test for identifying pathways associated with cancer progression. Computational Biology and Chemistry, 2020, 87: 107285.‏

ZHANG, Q.; DAO, T. A distance based multisampling test for high-dimensional compositional data with applications to the human microbiome. BMC bioinformatics, 2020, 21, 1-17.‏

Mahdi, G.J, Kalaf, B.A.; Khaleel, M.A. Enhanced supervised principal component analysis for cancer classification. Iraqi Journal of Science. 2021, 1321-33.

Mseer, H.A.; Mahdi, G.J. Comparison among variable selection models and its application to health dataset. InAIP Conference Proceedings 2023, 1, 2414.

Jabbar, A.K. New transform Fundamental properties and its applications. Ibn Al-Haitham Journal for Pure and Applied Sciences. 2018, 31, 1-10.

Mahdi, G.J.; A Modified Support Vector Machine Classifiers Using Stochastic Gradient Descent with Application to Leukemia Cancer Type Dataset. Baghdad Science Journal. 2020,17,1255-69.

Raheem, S.H.; Kalaf, B.A.; Salman A.N. Comparison of Some of Estimation methods of Stress-Strength Model: R= P (Y< X< Z). Baghdad Science Journal. 2021,18,1103-17.

Salah, O.M.; Mahdi, G.J.; Al-Latif, I.A. A modified ARIMA model for forecasting chemical sales in the USA. In Journal of Physics: Conference Series 2021, 1879, 032008.

AL-NOOR, N. H.; KHALEEL, M. A.; MOHAMMED, G. J. Theory and applications of Marshall Olkin Marshall Olkin Weibull distribution. In: Journal of Physics: Conference Series. 2021,20, 012101.‏

SHEAH, R. H.; ABBAS, I. T. Using multi-objective bat algorithm for solving multi-objective non-linear programming problem. Iraqi Journal of Science, 2021, 997-1015.‏

MOHAMMED, M. J.; MOHAMMED, A. T. Analysis of an Agriculture Data Using Markov Basis for Independent Model. In: Journal of Physics: Conference Series. 2020, 012071.‏ DOI: https://doi.org/10.1088/1742-6596/1530/1/012071

MOHAMMED, M. J.; MOHAMMED, A. T. Parameter estimation of inverse exponential Rayleigh distribution based on classical methods. International Journal of Nonlinear Analysis and Applications, 2021, 12, 935-944.‏

Bartley, C. Replication Data for: South African Heart Disease" Available online: https://doi.org/10.7910/DVN/76SIQD Harvard Dataverse, V1, 2016.

Bayda, A. Abdul Jabbar, K. B.; Iraq, T. A. Mohd, R. A.; Lee, L. S. Application of simulated annealing to solve multi-objectives for aggregate production planning. In: AIP Conference Proceedings. 2016, 1739, 020086.

Bogatinovski, J.; Ljupčo, T.; Sašo, D.; Dragi, Kocev. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications. 2022, 203, 117215.

FJELLSTRÖM, C.; NYSTRÖM, Kaj. Deep learning, stochastic gradient descent and diffusion maps. Journal of Computational Mathematics and Data Science. 2022, 4, 100054.‏

HASSAN, A. S.; KHALEEL, M. A.; MOHAMD, R. E. An extension of exponentiated Lomax distribution with application to lifetime data. Thailand Statistician. 2021, 19, 484-500.‏