JOURNAL OF LIAONING TECHNICAL UNIVERSITY
(NATURAL SCIENCE EDITION)
LIAONING GONGCHENG JISHU DAXUE XUEBAO (ZIRAN KEXUE BAN)
辽宁工程技术大学学报(自然科学版)
The Role of Data Pre-Processing Techniques In Improving Machine Learning Accuracy For Predicting Coronary Heart Disease
Osamah Sami Yousef Elsheikh Fadi Almasalha
Abstract: These days, in light of the rapid developments, people work day and night to live at a good level. This often causes them to not pay much attention to a healthy lifestyle, such as what they eat or even what physical activities they do. These people are often the most likely to suffer from coronary heart disease. The heart is a small organ responsible for pumping oxygen-rich blood to the rest of the human body through the coronary arteries. Accordingly, any blockage or narrowing in one of these coronary arteries may cause blood not to be pumped to the heart and from it to the rest of the body, and thus cause what is known as heart attacks. From here, the importance of early prediction of coronary heart disease has emerged, as it can help these people change their lifestyle and eating habits to become healthier and thus prevent coronary heart disease and avoid death. In this paper, we will work to improve the accuracy of machine learning techniques in predicting coronary heart disease using feature processing techniques. Feature processing is a technique used to improve the efficiency of a machine learning model by improving the quality of the feature. The popular Framingham Heart Study dataset was used for validation purposes. The results of the research paper indicate that the use of feature processing techniques had a role in improving the predictive accuracy of poorly efficient classifiers, and shows satisfactory performance in determining the risk of coronary heart disease. For example, the Decision Tree classifier led to a predictive accuracy of coronary heart disease of 91.39% with an increase of 1.39% over the previous work, the Random Forest classifier led to a predictive accuracy of 92.80% with an increase of 2.7% over the previous work, the KNN classifier led to a predictive accuracy of
92.68% with an increase of 3.64% over the previous work, the Multilayer Perceptron Neural Network (MLP) classifier led to a predictive accuracy of 92.64% with an increase of 2.68% over the previous work, and the Naïve Bayes classifier led to a predictive accuracy of 90.56% with an increase of 0.66% over the previous work.
Keywords: Coronary heart disease, heart, machine learning, feature processing, classification technique.