Agile text mining for the 2014 i2b2/ UTHealth Cardiac risk factors challenge

Cormack J, Nath C, Milward D, Raja K, Jonnalagadda SR

J Biomed Inform. 2015 Dec; 58 Suppl:S120-7

PMID: 26209007


This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier.

We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.