Demystifying Heart Failure with Mid-Range Ejection Fraction using Machine Learning

Achal Dixit and Soumi Chattopadhyay
Indian Institute of Information Technology Guwahati


The treatment of heart failure (HF) patients with mid-range ejection fraction (HFmrEF), is challenging due to prognostic uncertainty and transitional behaviour of HFmrEF, often referred to as “grey-zone”. In this study, we aim to address the uncertainty in the prognosis of HFmrEF through Machine Learning (ML). Dataset used in study includes 496 patients (267 HFwith preserved EF, 117 HF with reduced EF, and 112 HFmrEF) with 11 clinical attributes, admitted between December 2016 to June 2019 in Zigong Fourth People’s Hospital Sichuan, China. HFrEF is oversampled to 267 using adaptive synthetic minority oversampling ADASYN) to resolve class imbalance. Various studies indicate that HFmrEF patients are observed not only to exhibit similar characteristics as primary phenotypes: HFpEF and HFrEF but also transition into them. Thus, we classify HFmrEF into primary phenotypes based on the data from clinical attributes through ML. We formulate this problem as a semi-supervised classification problem and develop Active Learning (AL) based model to solve it. Furthermore, we compare with approaches developed using Logistic Regression (LR) and Random Forest (RF). For AL, RF and Multi-Layer Perceptron are used as base estimators. Through AL, we establish better generalizability via semi-supervised labelling strategy with significantly lesser data by querying only uncertain labels and, thereby, assigning HFmrEF samples to their respective class. Models were evaluated using 5-fold cross-validation. LR and RF models had an accuracy of 85%, 88% and auc-roc of 0.86 and 0.89 respectively. AL with RF had both accuracy and roc-auc of 88%. AL with MLP had an accuracy of 86% and roc-auc of 0.85. Both AL estimators required 43% lesser data with validation. The top predictors were left ventricular end-diastolic diameter and brain natriuretic peptide levels. The proposed ML model effectively classify HFmrEF into primary phenotypes to determine the prognosis, who otherwise are considered to be in “grey-zone”.