Early Prediction of Sepsis from Clinical Data Using a Specialized Hidden Markov Model

Morteza Amini
Univeristy of Tehran


A specialized hidden Markov model is introduced and applied to the data set of patients with and without sepsis during their hospitalization period. The model consists of some continuous variables of vital signs and laboratory values along with the binary indicator variable of sepsis. For this data set, a 3-state model is considered with three hidden states "healthy=1", "ill=2" and "sepsis=3". The observation model is factorized into two terms, a Gaussian model for the continuous observations and a known matrix for the distribution of the binary variable, given the states. The missing observations are treated by the EM algorithm as well as a three-state hidden variable. A specialized imputation method for initializing the hidden Markov model is also proposed. Application of the proposed method to the sepsis data is performed in three phases. In the estimation phase, a subset of the subjects are selected randomly from both sepsis and non-sepsis patients, such that a given proportion of the sample is selected from the sepsis patients. The second phase is tuning or cross-validation phase, in which the optimal value of threshold for the conditional probabilities of future states given observations are chosen from a set of candidate values in order the optimize the validity criteria of the model. The third phase is the prediction phase, in which the conditional probabilities of future states given the observations are computed for the whole training and test data sets and the sepsis labels are predicted using the optimized threshold value. The application of the proposed model to the sepsis data set through a 3-fold cross-validation on a subset of the train data set resulted in a maximum cross-validated utility is almost equal to 0.88.