Sepsis is a life-threatening response to infection that can lead to tissue damage, organ failure and death. Effective and long-range early prediction of sepsis from health data may prevent adverse health outcomes and reduce unnecessary cost. In this paper, we suggest a new two-stage approach to disease prediction, which consists of a generative model, to generate synthetic data for next few time steps, and a predictive model, to make long-range predictions based on observed and generated data. We explore the two-stage algorithm via long short-term memory (LSTM) for both the predictive and generative models, leading to a scheme called generative LSTM (GLSTM). We also explore other two-stage configurations with the generative step consisting of various shallow and deep learning models.
The Physionet dataset consists of multiple health features with missing values. Based on preliminary experiments on diverse data sets, we use Gaussian process regression for missing data imputation. Our experiments on a private vital sign dataset indicate that the proposed GLSTM outperforms a diverse range of strong benchmark models, with and without the two-stage configuration. As expected, the results also indicate that more accurate generated data leads to more accurate long-range predictions. In light of this, we explore various methods to improve the accuracy of the generated data, including a way to better train the generative model. We report cross-validation results on the private vital sign dataset and on the publicly available Physionet 2019 training dataset. On the private dataset, we can successfully predict the patient’s heart rate and blood pressure ten minutes in advance with a mean-absolute-percentage-error of less than 9% and 5.4%, respectively. On the Physionet 2019 dataset, our initial experiments give an accuracy of 0.7 with an F1-score of 0.8. We expect to be able to improve the results on the Physionet dataset in the coming months.