Rhythm classification of 12-lead ECGs using deep neural network and class-activation maps for improved explainability

Sebastian Goodfellow1, Dmitrii Shubin2, Danny Eytan2, Andrew Goodwin2, Anusha Jega2, Azadeh Assadi2, Mjaye Mazwi2, Robert Greer2, Sujay Nagaraj2, Christian Esposito2, Peter Laussen2, William Dixon2
1The Hospital For Sick Children, 2Hospital for Sick Children


In this study, a deep convolutional neural network was trained to classify 12-lead ECG waveforms as one of 9 rhythm classes. The training dataset consisted of 6,877 labelled ECG segments and was split 80/20 into training and validation groups stratified by rhythm class. Our approach was based on two objectives, the first was to generate a classifier that performed competitively with top submission from the 2020 PhysioNet/CinC Challenge, and the second was to extract class activation mappings to help better understand which areas of the waveform and for which leads the model is focusing on when making a classification.

The input to our model is an array containing 12 channels of 60 seconds in duration with a sampling rate of 500 Hz. Samples less than 60 seconds in duration were symmetrically zero-padded. The model stem consists of three layers with the purpose of downsampling the input signal for GPU memory considerations. The output from the stem is input into a series of 7 residual layers that are modelled after WaveNet's residual layers with the only difference being that the convolutions are not causal. The 7 skip connections are summed and fed into a series of output layers containing 1D convolution, ReLU activation, and dropout. Class activation maps were generated using a global average pooling layer before the final sigmoid layer. The resulting class activation maps had a temporal resolution of 8 ms.

We have successfully submitted an unofficial entry with an F-measure of 0.055, however, this was using an untrained model and was done for the purpose of meeting the unofficial phase deadline. The class activation maps will allow for some level of interpretability by clinicians, which may be important for the adoption of this technology and would aid data scientists in making architecture modifications to improve model performance.