Automatic Diagnoses Detection in Free Text ECG Medical Report

Derick Oliveira1, Gabriela Paixão2, Eder Figueiredo1, Paulo Gomes1, Milton Ferreira1, Jéssica Canazart1, Antonio Ribeiro1, Wagner Meira1
1Federal University of Minas Gerais, 2Doctor


Abstract

Introduction: Electrocardiogram (ECG) medical report is an important source of information for health research, but most of it is stored as free text. This work presents an algorithm to extract automatically ECG diagnoses from free-text medical reports in Portuguese.

Methods: We present a hierarchical free-text machine learning classifier. First, we preprocess the text by removing stop-words and generating n-grams from medical report conclusions, where each conclusion is a possible diagnosis. We then employ a Lazy Associative Classifier (LAC), which is built using a 2800-sample dictionary manually created by specialists based on text from real diagnoses. The final result is obtained by inputing the LAC result to a decision tree for class disambiguation. The decision tree is trained using the original dataset and the LAC output, and it is used to correct errors of the latter.

Results: The classification model was tested on 4557 medical reports manually labeled, with 7 possible diagnoses: atrial fibrillation (AF), atrial flutter (AFL), right bundle branch block (RBBB), left bundle branch block (LBBB), 1st degree atrioventricular block (AVB), sinus bradycardia (SB) , sinus tachycardia (ST). Considering the Macro F1 metric we achieved: (1) AF = 0.993; (2) AFL = 0.909; (3) RBBB = 0.849; (4) LBBB = 0.838; (5) AVB = 0.729; (6) SB = 0.991; (7) ST = 0.974. It is remarkable that we were able to distinguish well even quite similar diagnoses, such as RBBB and LBBB.

Conclusions and future works: In this work we presented a machine learning model that is able to accurately extract 7 diagnoses from medical ECG reports. We expect to improve and expand this model in future works.