Supporting Real World Decision Making in Coronary Diseases Using Machine Learning

Peter Kokol1, Jan Jurman2, Tajda Bogovič2, Tadej Završnik3, Jernej Završnik4, Helena Blažun4
1University of Maribor, FERI, 2University of Maribor Faculty of Electrical Engineering and Computer Science, 3University Clinical Medical Centre Maribor Maribor, 4Community Healthcare Centre Dr. Adolf Drolc Maribor


Abstract

CVD are one of the leading global causes of death. Following the positive experiences with machine learning in medicine we performed a study in which we assessed how machine learning can support decision making regarding coronary artery diseases. While a plethora of studies reported high accuracy rates of machine learning algorithms (MLA) in medical applications, the majority of the studies used the cleansed medical data bases without the presence of the “real world noise”. Contrary, the aim of our study was to perform machine learning on the routinely collected Anonymous Cardiovascular Database (ACD), extracted directly from a hospital information system of the University Medical Centre Maribor). Many studies used tens of different machine learning approaches with substantially varying results regarding accuracy (ACU), hence they were not usable as a base to validate the results of our study. Thus, we decided, that our study will be performed in the two phases. During the first phase we trained the different MLAs on a comparable University of California Irvine UCI Heart Disease Dataset. The aim of this phase was first to define the “standard” ACU values and second to reduce the set of all MLAs to the most appropriate candidates to be used on the ACD, during the second phase. Seven MLAs were selected and the standard ACUs for the two-class diagnosis were 0.85. Surprisingly, the same MLAs achieved the ACUs around 0.96 on the ACD. A general comparison of both databases revealed that different machine learning algorithms performance differ significantly. The accuracy on the ACD reached the highest levels using decision trees and neural networks while Liner regression and AdaBoost performed best in UCI database. This might indicate that decision trees based algorithms and neural networks are better in coping with real world not “noise free” clinical data.