Session SA4.3

Session SA4.3

Avoiding Pitfalls with Approximate and Sample Entropy: A Computational Recipe

R Sassi*, G Manis

Università degli Studi di Milano
Milano, Italy

Nearly 20 years ago S. Pincus suggested a regularity measure for time series which he termed “Approximate Entropy” (ApEn). ApEn quantifies the unpredictability of fluctuations in a time series; the “approximate” part of the name came from the fact that the index was derived from an entropy estimate, the Kolmogorov-Sinai entropy, a theoretical metric employed in the context of nonlinear dynamical systems. Up to date, hundreds of published papers employed this metric, first praising its quality but also, over the years, evidencing its limits. Such weaknesses have two main origins: first, the actual way in which the metric is computed introduces a bias and, second, the computation rests on the estimate of multivariate marginal probability densities and thus, as a necessary consequence, on the length of the series under analysis. The first situation can be resolved with a slightly different way of computing the metric, and this is the direction followed by Richman and Moorman with the introduction of a relate index, “Sample Entropy” (SampEn). Unfortunately, the dependence on the length of the series under analysis is inherent to ApEn (and SampEn as well) and cannot be avoided. Such dependence displays itself with a poor self-consistency of the metric. ApEn depends on two parameters, r (a sort of noise rejection level) and m (linked to the dimension of the marginal densities which need to be estimated): Given two different series one might appear more regular than the other for some value of such parameters and less regular for others.
Starting from a Holter recording of a healthy adult individual, we built two autoregressive models which mimic heart rate variability during day and night hours. The two models avoid us problems related to non-stationarity of the data (we can generate very long time series) and permit a precise estimate of the metric reference values. Then, we generated time series ranging from 300 to 20000 points long and computed ApEn (and SampEn) for different values of m and r. Night values of ApEn are larger than the corresponding daily ones for large values of N but for e.g. N=300, m=2 and r=0.2 (typical value of the parameters for HRV related studies) such regularity is reversed. Such inversion is clearly related to the fact that the small number of points did not allow for a proper estimate of the marginal densities of order 3 but disappear when m=1.
Our simulations suggest an effective recipe to check for a proper computation of ApEn, given the length of the time series at hand: several values of m and r need to be concurrently explored to avoid pitfalls. Selecting a couple of standard values as m=2 and r=0.15 (or 0.20) might lead to unexpected results. Finally, the concurrent evaluation of the metric for different values of m is computationally intensive, for this reason we also suggest an algorithm which does not increase the computational time by performing each necessary comparison only once.
(Abstract Control Number: 164)