Sepsis Detection Using Matrix Factorization and LSTM Networks

Sven Schellenberger1, Kilin Shi2, Jan Philipp Wiedemann2, Fabian Lurz2, Robert Weigel2, Alexander Koelpin1
1Chair for Electronics and Sensor Systems, Brandenburg University of Technology, 2Institute for Electronic Engineering, Friedrich-Alexander University Erlangen-Nuremberg


Abstract

Sepsis is highly lethal and a very cost-intensive disease. Hospitals invest more money in curing sepsis than any other illness. A major problem hereby is the fact that many sepsis patients in a hospital are not correctly diagnosed at admission. Overall, early detection of sepsis is the most critical factor; each hour of delayed diagnosis increases the mortality by about 4-8%. The topic of this year’s PhysioNet/Computing in Cardiology Challenge is to address this circumstance and to propose an algorithm that is able to detect a sepsis infection from data that is gathered at an ICU units. A major difficulty is to predict sepsis from lots of different vital signs and laboratory values that are however not sampled in periodic intervals, meaning that a very sparse dataset is the basis for training and testing. To handle this problem, we firstly project the data into a latent space through supervised matrix factorization. Using our training dataset, we generate a matrix V, which is used to get the latent representation U of our data X. U is continuous, has no more missing values, and is used for further training. In case of testing, our data is transformed into latent space using the pre-trained matrix V. The data is fed into an LSTM network for training and classification. It is important to note that the transformation from X to U gives the same result even when it is done sample by sample and that the LSTM predicts new samples in a chronological order. This is crucial because in reality, we of course do not have access to data points that will be measured at a later point in time, it is therefore required not to use any future information. On a separate holdout set, our algorithm achieves an utility score of around 0.29.