Deep Multi-task Learning with Unsupervised Representation Learning for Improved Generalization Across Datasets in 12-Lead ECG Classification

Kuk Jin Jang, Renukanandan Tumu, Nicole Chiou
University of Pennsylvania


Abstract

In recent years, data-driven approaches and deep neural networks applied to the medical domain have been shown to attain astonishing performance at the level of expert clinicians in tasks such as medical image diagnosis. Despite such successes in limited settings, generalization of performance across different tasks and datasets remains a difficult challenge for automated detection and classification of cardiac abnormalities using the electrocardiogram (ECG).

In this work, to address generalization across tasks, we formulate cardiac abnormality detection as a multi-task learning (MTL) problem where each task consists of detection of one type of abnormality. The intuition is that relationships exist between different tasks and can be exploited to improve overall performance. We will compare classical approaches of hard parameter sharing to more recent deep neural network-based approaches such as deep relationship networks and soft parameter sharing cross-stitch networks. To improve generalization across different datasets, we will incorporate unsupervised representation learning methods such as variational auto-encoders to learn a context variable derived from the latent space. This context variable can be used to fine-tune the weights of a trained network to improve the performance on a specific dataset. Finally, the architecture itself can greatly affect the performance of an approach and optimizing the structure of the network and its corresponding hyperparameters is a non-trivial task. We attempt to utilize neural architecture search methods such as AutoML and gradient-free swarm optimization approaches to further improve the performance of MTL.

Our first attempts have resulted in an average performance score of 0.074/0.030 (F-beta-measure/G-beta-measure) in cross-validation during training and 0.072/0.029 (F-beta-measure/G-beta-measure) on the validation set. We expect significant improvements as more sophisticated approaches are attempted.