Massive ECG Data: Patterns and Variability

William Mateus, Marco Paluszny, Marianela Lentini
Universidad Nacional de Colombia


The automatic detection of patterns is mandatory in massive ECG data, and these are increasingly popular because of low cost portable devices. Besides their size, the main problem with data gathered with portable devices is noise, which may be of various kinds and of unpredictable sources.

In this study we propose an algorithmic procedure for the definition, identification and comparison of salient features. The datasets are drawn from Taking into account the increasing clinical use of portable devices and their possible future use as a prognostic tool, we center our attention in normal and arrhythmia MIT-BIH data.

The algorithm is as follows:

• For any lead with a recognizable QRS complex we remove the wandering line with a filter due to Fasano and Villani.

• We use a peak counting function to estimate automatically a threshold for the detection of R-spikes.

• Identify the cycles using the R-spikes and compare with Pan-Thompkins like algorithms.

• Split each cycle into several zones, to be associated with the P wave, the QRS complex and the T wave.

We automatically detect persistence of morphologies by studying the variability of each of these zones in sequences of consecutive cycles. The repetitions of the time lengths and voltages of these zones in sequences of cycles, provides information on the dynamic changes in QRS complex, T and P waves. The time stamps of the repetition regions and those highly variable, provide information for correlation with activity patterns.

This also leads to the extraction of emerging patterns of the cycles of the given ECG dataset (no comparisons with templates required) and these may be displayed as averages of highly similar consecutive cycles. There is mounting evidence that such analysis is valuable in the context of personalized early detection and prevention.