Preliminary Program

Sepsis Detection Using Matrix Factorization and LSTM Networks

Sven Schellenberger¹, Kilin Shi², Jan Philipp Wiedemann², Fabian Lurz², Robert Weigel², Alexander Koelpin¹
¹Chair for Electronics and Sensor Systems, Brandenburg University of Technology, ²Institute for Electronic Engineering, Friedrich-Alexander University Erlangen-Nuremberg

Abstract

Sepsis is highly lethal and a very cost-intensive disease. Hospitals invest more money in curing sepsis than any other illness. A major problem hereby is the fact that many sepsis patients in a hospital are not correctly diagnosed at admission. Overall, early detection of sepsis is the most critical factor; each hour of delayed diagnosis increases the mortality by about 4-8%. The topic of this year’s PhysioNet/Computing in Cardiology Challenge is to address this circumstance and to propose an algorithm that is able to detect a sepsis infection from data that is gathered at an ICU units. A major difficulty is to predict sepsis from lots of different vital signs and laboratory values that are however not sampled in periodic intervals, meaning that a very sparse dataset is the basis for training and testing. To handle this problem, we firstly project the data into a latent space through supervised matrix factorization. Using our training dataset, we generate a matrix V, which is used to get the latent representation U of our data X. U is continuous, has no more missing values, and is used for further training. In case of testing, our data is transformed into latent space using the pre-trained matrix V. The data is fed into an LSTM network for training and classification. It is important to note that the transformation from X to U gives the same result even when it is done sample by sample and that the LSTM predicts new samples in a chronological order. This is crucial because in reality, we of course do not have access to data points that will be measured at a later point in time, it is therefore required not to use any future information. On a separate holdout set, our algorithm achieves an utility score of around 0.29.