Improving the detection of acute coronary syndrome using machine learning of blood biomarkers

Khaled Rjoob1, Victoria McGilligan2, Raymond Bond1, Steven Watterson3, Melody Chemaly2, Roisin McAlister2, Tiago De Melo Malaquias2, Stephen Leslie4, Charles Knoery5, Aleeha Iftikhar6, Anne McShane7, Anthony Bjourson2, Aaron Peace8
1Ulster University, 2Centre for Personalised Medicine/Northern Ireland Centre for Stratified Medicine, Ulster University, 3Centre for Personalised Medicine/Northern Ireland Centre for Stratified Medicine, Ulster University, Northern Ireland, 4Department of Diabetes & Cardiovascular Science, University of the Highlands and Islands, Centre for Health Science, Inverness, 5Department of Diabetes and Cardiovascular Science, University of the Highlands and Islands, 6Faculty of Computing, Engineering & Built Environment, Ulster University, 7eEmergency Department, Letterkenny University Hospital, 8Western Health and Social Care Trust, C-TRIC, Ulster University


Abstract

Background: Acute coronary syndrome (ACS) is one of the main causes of death worldwide. The 12-lead electrocardiogram (ECG) is used to help diagnose ACS, along with clinical risk factors (smoking, diabetes mellitus, hypertension, and positive family history of ACS). These methods however are associated with many limitations resulting in variable sensitivity/specificity. There is a clear need for the development of new methods to improve ACS diagnosis. The aim of this study was to use a machine learning approach to investigate an optimum panel of blood protein biomarkers capable of independently diagnosing ACS. Methods: A hybrid feature selection (backward algorithm and filter method) and ML algorithms including two classifiers: 1) decision tree (DT) and 2) logistic regression were applied to protein biomarker data (367 proteins) collected from patients with ACS n=91 or Stable Angina n=97. Results: Using this approach, 20 specific proteins out of 367 proteins were able to accurately distinguish between ACS and stable angina patients using logistic regression (ROC-AUC=0.8 with confidence interval (CI)[0.69,0.9], accuracy=82.5% with CI [0.72,0.92]) and a DT (ROC-AUC=0.6 with CI[0.47,0.72], accuracy=64.9% with CI[0.52,0.77]) using 5-fold cross-validation. The other 347 proteins were excluded as they did not improve the performance. Conclusion: Logistic regression obtained a high performance compared to DT and showed promising results uncovering a panel of 20 protein biomarkers (included those associated with progressive atherosclerotic plaques, myocardial injury and inflammation) able to discriminate between patients with ACS and stable angina. Method validation is now required utilising other ML algorithms and new feature selection methods to confirm performance.