Aims: This work describes an entry for the Computing in Cardiology
Challenge 2019, early prediction of sepsis from clinical data. There are
multiple challenges in modeling this data, including the time-series
nature of the data, the many missing values, and the great imbalance
between sepsis and non-sepsis patients.
Methods: The relatively naive approach implemented here involves a crude
translation of the time-series data to a supervised-learning task. The
data is preprocessed to add new columns to capture the "delta" of each of
the 40 attributes from time t-1 to time t (from the previous measurement
to the current one); additionally, all missing values are replaced with
the last known value for that attribute or zero. The learning model used
is a straightforward application of the Python sklearn MLPClassifier
(multi-layer perceptron); all data is scaled to the [0,1] range for use
with this model.
Results: Scoring from the competition was not available at the time of
abstract submission. Evaluation was done using a training set composed of
all time points for a random 90% of sepsis patients plus a random 10% of
non-sepsis patients; because it is relatively easy to predict non-sepsis
patients, the testing set for results reported here was the 10% of sepsis
patients not in the training data (this data includes sepsis and
non-sepsis time points). This was repeated 5 times. Training accuracy was
92.3, 94.0, 96.3, 82.7, and 92.9 across these five runs, with accuracy of
72.1, 66.2, 59.6, 63.8, and 53.3 on testing runs, an average training
accuracy of 91.6 and average testing accuracy of 63.0%.
Conclusion: This initial implementation looks at each line of data
separately, and the sequence of predictions for a specific patient have
not yet been looked at as a whole. However, this naive approach appears to
be a promising start.