Estimating the Minimal Size of Training Datasets Required for the Development of Linear ECG-Lead Transformations

Daniel Guldenring1, Ali Rababah2, Dewar Finlay2, Raymond Bond2, Alan Kennedy3, Michael Jennings2, Khaled Rjoob2, James McLaughlin2
1HS Kempten, 2Ulster University, 3PulseAI Ltd


Linear electrocardiographic lead transformations (LELTs) are used to estimate unrecorded ECG leads by applying a number of recorded leads to a LELT matrix. Such LELT matrices are commonly developed using a training dataset. The size of the training dataset has an influence on the estimation performance of a LELT. However, an estimate of the minimal size required for the development of LELTs has previously not been reported.

The aim of this research was to determine such an estimate. We generated LELT matrices from differently sized (from n = 10 to n = 540 subjects in steps of 10 subjects) training datasets. The LELT matrices and the 12-lead data of a testing dataset (n = 186 subjects) were used for the estimation of Frank VCGs. Root-mean-squared-error (RMSE) values between recorded and estimated Frank leads of the testing dataset were used for the quantification of the estimation performance associated with a given size of the training dataset.

The performance of the LELT matrices was, after an initial phase of improvement, found to only marginally improve with additional increases in the size of the training dataset. Our findings suggest that the training dataset should have a minimal size of 170 subjects when developing LELTs that utilise the 12-lead ECG for the estimation of unrecorded ECG leads.