Semi-supervised Learning for ECG Classification

Rui Rodrigues
FCT Nova


In the context of the 2021 PhysioNet/CinC Challenge we build a Deep Learning Classifier model for cardiac abnormalities. Our model is divided in two parts: first an ECG Encoder and then the Classifier. The Encoder produces a code for each ECG segment. This code is learned in order to be used by an auxiliary module in two tasks: detect the QRS location on the given segment and predict the ECG wave form following the given segment. The classifier receives, for each ECG, the code produced by the encoder and generates the list of cardiac abnormalities present in the ECG. The classifier is trained on the Challenge 43,101 labeled examples. The Encoder can use a much larger number of examples because after applying a QRS detector, any ECG segment can be used as a labeled training example. This approach is inspired in the semi-supervised learning technique used in Deep Learning Natural Language Processing. The objective is to reduce the amount of model overfitting to the training dataset. To deal with the variable ECG lengths our encoder and classifier both use a recurrent neural network, namely a stack of LSTMs. All the ECG features used by the encoder are learned together with the whole model throught a 2-layer 1-D convolution neural network. We train a different encoder and the correspondent classifier for each subset of ECG leads (12, 6, 3 and 2 leads) considered in the Challenge. We use half of the Georgia Dataset, 5,344 records, as validation data. In the unofficial phase our model used only two leads, either II and V5 or I and II. Therefore, the results are the same in all lead combinations. We report the challenge metric results. Train dataset: 0.287 Validation dataset: 0.369 Test dataset (public Challenge leaderboard): 0.35 (matFCT team)