Interpretability Analysis of Machine Learning Algorithms in the Detection of ST-Elevation Myocardial Infarction

Matteo Bodini1, Massimo W Rivolta2, Roberto Sassi3
1Università degli Studi di Milano, 2Dipartimento di Informatica, Università degli Studi di Milano, 3Università degli Studi di Milano, Dipartimento di Informatica


Aims: Recent studies suggested that ST-Elevation Myocardial Infarction (STEMI) can be detected in the ECG relying on machine learning (ML) algorithms. However, most of ML algorithms lack of an interpretability analysis, since they do not provide any justification for their decisions. In this study, we provide an interpretability analysis on Random Forest model trained for STEMI classifi-cation.

Methods: We employed ECG signals from the PTB database which included 148 STEMI patients whose anatomical position of the infarct was annotated and 52 Healthy controls (HC). Here, Ante-rior, Antero-lateral, Antero-septal, Inferior, and Infero-lateral, and HC signals were selected, after standard ECG preprocessing. We computed multi-lead average ECG beats and ST segments to ob-tain two different feature vector representations. We trained five Random Forest (RF) models in a binary classification approach: for each model, we considered a STEMI infarct class and the HC class, considering the two vector representations as input. Then, we used the Local Interpretable Model-agnostic Explanations (LIME) method to highlight the input parts that mostly contributed to the classification. LIME interpretations were validated with the available anatomical position of the myocardial infarction.

Results: RF models achieved a high test set accuracy (ranging from 0.84 to 0.92) for both feature representations. When average ST segments were used, RF models relied on leads which were anatomically related to the considered infarct. On the other hand, in the case of average beats, for antero-septal and inferior infarcts, LIME identified areas within QRS complexes as the most relevant ones for the RF decision, rather than in the ST segment as expected.

Conclusion: Our study suggests that, despite the test set accuracy, ML algorithms for STEMI classi-fication, trained on small or unbalanced/biased populations, may rely on features which are not clinically significant. In this regard, interpretability algorithms like LIME may help in understanding possible pitfalls.