This study proposes a feature-engineering-based machine learning algorithm for the classification of 8 abnormalities in 12-lead ECG recordings. The approach focuses on feature engineering, with special attention to rhythm and morphological changes induced by the abnormalities.
An ensemble of filters was created to remove baseline wander, high-frequency noise and artefacts from ECG leads. Then, the best quality lead was selected based on signal-to-noise ratios, to perform Rpeak detection. The feature-engineering phase includes two sets of features, namely, global and per-beat features. These sets include commonly used features, but also novel handcrafted features that involve different ways of exploiting the spatio-temporal information hidden in the 12-lead ECG recordings. To this end, two new representations are proposed. On the one hand, the 3D autocorrelation of the ECG, projected onto the vectorcardiogram space, is used to extract short- and long-term correlations within the recordings. On the other hand, features are extracted from different tensor representations and low-rank decompositions of the 12-lead ECG. This results in a total of 2607 features, of which the 200 most discriminative features are selected based on the importance measured by the RUSboost algorithm designed to handle the classification of imbalanced data. The selected features were used to train a support vector machine classifier. Using the scoring function provided by the Challenge 2020, G-mean values of 0.68 and 0.59, were achieved for the training dataset with 5-fold cross-validation and for the hidden test set, respectively.
Although our short-term goal is to improve the performance of the classification, we will focus on the interpretability and usability of the new features for the study of each abnormality separately. Consequently, some insights into the morphological and rhythm changes can be unveiled by considering the spatio-temporal properties of these ECG recordings.