A Scoping Review of Machine Learning Techniques and Their Utilisation in Predicting Heart Diseases

Heart diseases are diverse, common, and dangerous diseases that affect the heart's function. They appear as a result of genetic factors or unhealthy practices. Furthermore, they are the leading cause of mortalities in the world. Cardiovascular diseases seriously concern the health and activity of the heart by narrowing the arteries and reducing the amount of blood received by the heart, which leads to high blood pressure and high cholesterol. In addition, healthcare workers and physicians need intelligent technologies that help them analyze and predict based on patients’ data for early detection of heart diseases to find the appropriate treatment for them because these diseases appear on the patient without pain or noticeable symptoms, which leads to severe concerns such as heart failure and stroke and kidney failure. In this regard, the authors highlight an amount of literature considered the most practical in utilizing machine learning techniques in predicting heart disease. Twenty articles were chosen out of fifty articles gathered and summarised in a table form. The main goal is to make this article a reference that can be utilized in the future to assist healthcare workers in studying these techniques with ease and saving time and effort on them. This article has concluded that machine learning techniques have a significant and influential role in analyzing disease data, predicting heart disease, and assisting decision-making. In addition, these techniques can analyze data that reaches millions of cohorts.

using artificial intelligence techniques to get their work done. Personal health data is accumulated from multiple technology platforms such as web servers, electronic health records (EHRs), genetic data, personal computers, smartphones, mobile applications, wearable devices and sensors, demographic information, and healthcare. In concise, artificial intelligence can go beyond the cognitive ability of humanity to manage information effectively, as it has the role of extracting notes from x-ray images [12], laboratory results, genetic data, etc., efficiently and helping healthcare workers determine the patient's condition accurately. Machine learning techniques have developed significantly and remarkably recently. Physicians and healthcare workers have utilized them to support them in analyzing and predicting heart diseases of all types due to their excellent ability to give effects that help make decisions about the patient's condition. The principal contribution of this article is to summarise the role of machine learning in predicting heart disease by focusing on the 20 most influential papers issued between 2017 and 2021. The authors try to make this an essential reference for future use. In addition, the article shows the vital role of machine learning in analyzing heart images and understanding heart diseases and how they impact human life.
This article is outlined as follows; machine learning and its techniques are detailed in Section two, followed by Section three, which shows the articles selected by the authors as they are presented in a table with their details. Finally, a conclusion is drawn up in section four.

Machine Learning Techniques
Artificial intelligence, machine learning, and deep learning are concepts that are often perplexed by each other. Chiefly, artificial intelligence is the ability of programs to learn and act like humans, whereas machine learning is algorithms written for the same purpose. Machine learning has noticed tremendous interest in the last few years, and this matter is still going ahead in persistent steps [13]. Nevertheless, people look at machine learning as an advanced technology that could be developed, applied, and accessed only by experts. This view is becoming less adopted, and now more categories of professionals are interested in using or adopting machine learning and other artificial intelligence methods and tools to support their investigation and work [14]. It is built on the sub-domains of mathematics such as probability, statistics, and optimization. Also, it is a science that will continue to be an ever-expanding domain. There are many motivations for this. First, separate research communities in symbolic machine learning, computational learning theory (CoLT), statistics, neural networks, and pattern recognition discovered each other and began to work together. Second, machine learning techniques can be involved in new problems such as knowledge discovery in databases, robot control, language processing, and co-optimization, as well as traditional issues such as speech recognition, handwriting recognition, face recognition, and medical problems. In healthcare, researchers are now more interested in adopting and using artificial intelligence and machine learning techniques to gain more knowledge and help (see Figure 2). For instance, machine learning techniques can be a good source of originating hidden knowledge from medical data and records to assist society in reducing the number of trials required to diagnose a person's disease accurately. A survey shows that half of the organizations classified as healthcare are using or scheduling the use of artificial intelligence in imaging. Moreover, machine learning techniques are naturally divided into supervised and unsupervised [15]. Supervised learning is based on training a data sample from the data source with the correct pre-defined classification while self-organising neural networks learn to utilise an unsupervised learning algorithm to identify hidden patterns in unlabelled input data. Unattended refers to learning and organising information without providing an error signal to evaluate a solution. The type of supervised learning is divided into classification and regression. It is possible to use different algorithms according to the dataset's content. The most influential machine learning techniques will be briefly addressed in this section.

Naïve Bayes
This classification algorithm relies on a set of influential hypotheses for the independence of covariates and makes them consistent with Bayes theory [16]. Moreover, this algorithm assumes independence between the predictor variables conditional on the response and a Gaussian distribution of numeric predictors calculated from the training dataset with mean and standard deviation. Also, this classifier is the most comfortable supervised machine learning method. It is employed by few data mining practitioners at the expense of traditional methods such as decision trees or logistic regressions [17]. In addition, the benefit of this

IHJPAS. 5 3 (3)2022
179 technique is the simplicity of programming, comfort, and speed of parameter estimation (even in massive databases). Despite its benefits, its little use in practice comes partly from the fact that there is no simple explicit model (interpretation of preconditional probability); the practical usefulness of such a technique is called into question.

Support Vector Machine
SVM algorithm is mainly made to solve classification problems but has been grown and reworked over the years. Its most famous variant is the SVM for regression, SVM for solving integral equations, SVM for estimating density support, and SVM that uses different soft-margin costs and parameters. It is considered one of the broadly used algorithms in the machine learning subject within the cardiac domain. For instance, SVM can classify data in binary problems perfectly by finding the optimal solution to distinct data points of first-class from other data acting as a second class. It is a straightforward algorithm capable of solving complex nonlinear relationships, making it very suitable to be employed in heart disease prediction systems containing patient records with binary data [18]. In [19] mentions that SVM has 90% accuracy in predicting in-stent restenosis from plasma metabolite levels. Also, this technique tries to fetch the optimal hyperplane, which maximizes the distance from the nearest training data points of any class utilized in classification problems because of having influential capabilities in generalizing new unrecognized data items, flexible non-linear decision boundaries, and their dependency on few hyper-parameters.

Logistic Regression:
This technique is one of the most expressive, versatile, and diverse techniques for analyzing clinical and pandemics. It is a statistical model to interpret relations between a set of qualitative variables and a generalized linear model that uses a logistical function as a link function. Moreover, it is utilized to predict the probability of an event occurring, such as indicating a particular disease in the human body [20].for example: A patient dies or not before discharge.  A person stops smoking or not after treatment.  In a retrospective study, an individual is either a case or a control.  Whether or not an HIV-positive patient is in stage IV. Besides, logistic regression is a worthwhile quantitative procedure for concerns where the dependent variable takes values in a finite set. It was created in the 1960s by three scholars, Confield, Gordon, and Smith [21], but its actual use began in the 1980s due to its computational facilities.

K-Nearest Neighbours
Classification of query points whose class is unknown is from the goal of the K-nearest neighbours' technique due to their respective distances to point in a learning set [22]. This technique supposes that each example in the learning set is a random vector. In short, its purpose is to classify the quantitative or qualitative dataset based on fourth votes, namely metrics, kernels, overlap metrics, and value difference metrics. On the other hand, it is an essential and straightforward classification technique. It is explicitly used when there is little or no information about the data distribution. Also, it is a non-parametric technique [23]. This means that it does not make any presumptions about the data distribution used in the analysis. It can handle dimensionality reduction tasks in a unified

IHJPAS. 5 3 (3)2022
180 manner and is suitable for realistic environments where actual data availability doesn't follow the theoretical statistics like in normal distribution. Therefore, KNNs do not make any generalization and keep all data because it uses a quick training step.

Random Forest
It is considered one of the most widely used techniques in predicting and data analysis. We can convey that most studies employ machine learning techniques, including the random forest technique. It is a supervised machine learning technique operated on classification and regression issues. Furthermore, a random forest is an ensemble classifier [24]; that is, it consists of a large number of individual decision groups. Each Tree within its distribution provides a prediction of a specific event. The class with the most significant votes is considered the typical prediction. Each Tree in the forest collects random samples from the dataset with replacement; this process is called bagging (bootstrap aggregation) [25]. The out-of-bag score will give a complete description of the model's performance during the training phase.

Linear Regression
This technique is considered one of the most statistical techniques utilized to verify or estimate the relationship between dependent variables and a set of independent descriptive variables within a dataset [26]. In general, this technique is employed in data analysis. However, it is not influenced by the size of the database, as it has the ability in a qualitative research method to model and analyze many variables. In this technique, the dependent variable is a predictive or descriptive element and can be described as the result or response to a particular query within a large dataset. Moreover, this technique allows examining the relationship between two or more variables and identifying the most significant changes on a topic of interest, especially in medical data, because they are extensive data. Variables are divided into two styles: the dependent, which is the factor that tries to understand or predict, and the second is the independent variable, which is the factor that affects the special dependent variable.

Linear Discriminant Analysis
It is the standard feature extraction technique in pattern classification problems [27]. It is defined as a linear model for classification and dimensionality reduction. This technique is distinguished by dropping data from the dimensional feature space to the dimension space to increase the variability between the categories and reduce the variability within the categories. In short, the most crucial feature of this technique is the symmetric squared distance effects. This technique is an alternative to logistic regression when the qualitative variable has more than two levels. In addition, this technique is distinguished from logistic regression by the following:  If the categories are sufficiently separated, then the parameters estimated in the logistic regression model are inconsistent. The LDA method does not suffer from this issue.  If the number of observations is low and the distribution of the predictors is nearly normal in each category, the LDA will be more regular than the logistic regression.

Learning Vector Quantisation
It is a supervised machine learning method and is a type of artificial neural network inspired by biological models of neural systems [28]. Moreover, this technique can self-organize its network training and deals with the problem of multi-category classification. Also, it contains two layers of input and output. In general, LVQ is considered a model to classify learning which establishes a promising alternative for deep networks, mainly relying on Euclidean metrics to compare data vectors with prototype vectors. It is a known algorithm specialized in classifying patterns or selecting prototypes through its network that depends on nearest neighbour patterns in its classification process and is appropriate for solving non-linear separation problems and classification of large amounts of data [29]. By applying the smallest Euclidean distance, the LVQ algorithm is designed to decide which vector is winning, which will cause this vector to be chosen [30].

Machine Learning in Cardiology
Machine learning has the extraordinary ability to predict vital information from a large set of data [31]. Physicians can make decisions based on the effects of these techniques. For instance, the medical field is rich in information that arises from the womb of laboratory examinations and clinical and physiological observations. Physicians and healthcare workers started analyzing patient data with different data and organized algorithms that depend on constantly updated data sets to improve the ability to diagnose a disease or predict patient outcomes. In addition, machine learning cannot replace the physician, but physicians who utilize machine learning techniques will replace traditional physicians who are not keeping pace with artificial intelligence techniques. In this section, a list of contributions of machine learning techniques in diverse fields of cardiology in the last five years from 2017 to 2021 that we consider to be of paramount importance is included with a detailed description of each study in terms of the type of disease that is predicted, the number of patients and the names of the techniques employed in the prediction process with the conclusions of the study as illustrated in Table 1. To clarify, the literature is selected based on two main factors: the number of techniques applied and the accurate data about patients (Patient Cohorts).

Conclusion
More often than not, machine learning for some physicians is complicated mathematics, meaning that their responses are not optimistic when they hear the pronunciation of the word machine learning. There are also appropriate responses and a great desire to utilize machine learning techniques. These technologies have a fantastic future in making predictions in determining heart diseases and require cardiologists to interact and cooperate fully with the practices of these technologies. For this collaboration to grow, cardiologists need a high-level knowledge of the role presented by these technologies, awareness of machine learning efforts in detecting and analysing heart disease, whether from patient data or through images, and knowledge of potential shortcomings in machine learning techniques. Most studies that have been issued use machine learning techniques in analyzing heart images and other assignments.
Nevertheless, these techniques have a significant role in extracting patient data and predicting disease regardless of the large number of patient data or the increase in the size and variety of the training data set. Despite the significant effect of these techniques, no published clinical trials have compared these techniques to the human evaluation of the data set. Therefore, it is required to achieve future controlled clinical trials to demonstrate the proficiency and efficacy of these techniques in clinical practice. In addition, confirmation should be carried out not using data from the same group employed in training but also from other groups so that these techniques can be analyzed more efficiently. In the future, a group of studies will be executed on the use of machine learning techniques in analysing heart data and predicting potential human diseases.