Using mel-frequency cepstrum and amplitude-time heart variability as XGBoost handcrafted features for heart disease detection

Sergey Krivenko1, Anatolii Pulavskyi1, Liudmyla Kryvenko2, Olha Krylova2, Stanislav Krivenko3
1HealthEntire, 2Kharkiv National Medical University, 3Kharkiv National University of radioelectronics


Cardiovascular disease has been the world's leading cause of death for many years. Early diagnosis can reduce both fatalities and complications. The most common method for early diagnosis is electrocardiogram (ECG) analysis. The goal of the CINC-2021 competition is to create effective models for predicting 27 pathological conditions based on 2, 3, 6 and 12 leads of a standard ECG. Our models are based on handcrafted features obtained using two independent approaches: mel-frequency cepstrum and amplitude-time heart rate variability (HRV). Mel-frequency cepstrum is widely used in sound processing. However, varying the parameters of receiving cepstrum allows you to extract the characteristic features of the signal, including the periodic one, as in the case of the ECG. In this case, it is possible to use the spectrogram as a two-dimensional array of values ​​or to interpret it as an image. The second group of parameters is the parameters of the amplitude-time HRV. The generally accepted short-term ECG length for obtaining most of the temporal HRV parameters is the 5-minute segment. The competition, however, features ultra-short segments, most of which are 10-30 seconds long. Nevertheless, a number of non-linear parameters are significant for identifying certain properties. In addition, taking into account the amplitude component allows one to take into account the phase relationships between successive heart beats. Next, these properties are fed into the XGBoost algorithm. Our team performed under the name Sunset. The official result was -0.36, -0.16, -0.19, -0.19 respectively for the 12, 6, 3 and 2-lead models. This is due to the technical nuances of our submission. The local values ​​of the metric obtained using hold-out cross-validation are higher and amount to 0.515, 0.513, 0.513, 0.509 respectively for the 12, 6, 3 and 2-lead models.