Sepsis is a life-threatening condition that seriously endangers millions of people over the world. Hopefully, with the widespread availability of electronic health records (EHR), predictive models that can effectively deal with clinical sequential data increase the possibility to predict sepsis and take timely preventive treatment. However, the timely prediction is challenging because patients’ sequential data in EHR encodes long temporal dependencies and is imbalanced labeled. Rather than following the main-stream method such as the LSTM-based models, we choose to enhance transformer architecture with a faster speed and a better ability to capture long temporal dependencies. We propose an end-to-end neural network that is adaptive for data with imbalance labels. In this model, we replace the input-embedding with an interpolation of the sparse raw data and follow the subsequent structure (positional encoding, self-attention mechanism, and dese interpolation) of SAnD (Simply Attend and Diagnose) to get low-dimensional representation vectors of patients. In addition, we adopt oversampling technics, such as SMOTE, in the patient representation space to avoid a lack of positive samples for training models. We randomly divided the training data into train, test, validation sets with the ratio size of 7:3:1. The test set serves as the evaluation of our model and we use a validation set to get the estimated utility score. At the present stage, we have run the official scoring code with the following results of different metrics: AUROC (0.891), AUPRC (0.121), Accuracy (0.936), and Utility (0.557).