Classification of 12-lead ECGs using Gradient Boosting on features acquired with domain-specific and domain-agnostic methods

Durmus Umutcan Uguz, Felix Berief, Steffen Leonhardt, Christoph Hoog Antink
RWTH Aachen University


This year, the objective of the CinC challenge was the classification of 12-lead Electrocardiograms (ECG). 12-lead ECG is an important tool for detecting cardiac abnormalities and allows for a variety of analysis methods. The approach presented in this paper is separated into two parts, feature extraction and classification.

For one, domain-specific features are extracted for each ECG lead separately using statistical techniques, wavelet transforms, and other algorithms. For another, a set of domain-agnostic features is acquired by fusing the 12 leads using a convolutional neural network. All features are then combined and classified by gradient-boosted trees. Feature extraction as well as classification were implemented in Python and include the use of several Open-Source software packages.

To account for the complexity of multi-label and multi-class in the problem definition, a One-vs-Rest scheme is utilized, where distinct classifiers for each class determine whether a sample belongs to said class. This leads to highly imbalanced training sets for each classifier, which is mitigated by giving the positive samples a higher weight. The classifiers were trained using the XGBoost library with default parameters except for "max_depth" (5) and "scale_pos_weight" (20). As they currently cause overfitting, the features of the convolutional neural network are not included in the classifiers of the unofficial-phase submission.

Using 5-fold cross-validation, scores of 0.81 (f_beta) and 0.612 (g_beta) on the public data, and 0.781 (f_beta) and 0.58 (g_beta) on the private data were achieved, which will serve as a baseline for improvements with more complex features in the official phase.