Session S42.4
Similarity Retrieval of Cardiac Reports for Automated Decision Support
T Syeda-Mahmood*, SN Hashmi
IBM Almaden Research Center
San Jose, CA, USA
Diagnostic decision support is still very much an art for physicians in their practices today. With integrated information becoming available through large patient repositories, newer decision support systems are emerging that enable physicians to benefit from consensus opinions of other physicians who have looked at similar patients. The key to such decision support is the search for similar patients based on their available data. In the domain of cardiology, an important source of diagnostic information are the textual reports summarizing the findings in ECG and echo-cardiographic exams. Often these reports document important measurements taken from ECG and echo-cardiographic exams. They also contain written descriptions of the various cardiac structures such as main chambers (atria and ventricular regions) and valves (mitral, tricuspid, etc.). Thus mining the reports can reveal important information correlating diagnosis with their descriptions and measurements. Given a new patient’s cardiology reports, if we can find similar cardiology reports of pre-diagnosed patients, then we can use their recorded diagnosis to either validate the diagnosis of the new patient, or present physicians with alternate diagnosis from similar patients.
In this paper we present a method for finding similar cardiology reports and use it to infer the similarity in patients and their associated diagnosis. It uses a statistical machine learning approach to build multi-dimensional kernels that capture the common values of features extracted from cardiac reports corresponding to a disease. Thus all diseases are represented by their kernel models. Given a new cardiac report, we similarly extract features from the report and project them into the multi-dimensional kernels corresponding to learned models per category. A similarity metric that efficiently discriminates between classes was developed that takes into account all relevant features that are commonly observed and discriminatory between classes. The extent to which the query report agrees with the stored model is then used to rank the matches to retrieve similar cardiac reports from the database.
We tested this approach on a large corpus of medical reports assembled from India hospitals. The patients sampled a variety of diseases ranging from cases of Infarction (MI), conduction defects (e.g. BBB), valvular abnormalities (MR, TR), hypertrophies and other cardiomyopathy cases. There were a total of 32 disease categories and about 50 reports per category for a total of 1500 cardiology reports. We used 25 cardiology reports per category for training the kernel models. The remaining were used for testing the similarity retrieval. We found the system achieves 89% accuracy is retrieving matching documents of the same disease as the query. Of those queries missed, we found the matches to be reasonably close, for, example dilated LV, and LV dysfunction matching cases of Dilated Cardiomyopathy.(Abstract Control Number: 323)