The Heart of the Problem: Unstructured EMR Data and Cardiovascular Disease

Better diagnosis needs more than diagnosis codes

It’s well known that cardiovascular diseases are one of the major causes of death both in the US and globally. This level of disease puts great pressures on health systems to manage the patient load, both at the population level and at the individual level. As with all diseases, treatment is more effective and less costly if patients can be diagnosed earlier on their care journey. One barrier here is that diagnosis codes for conditions such as valvular heart disease can be inaccurate and vary across health systems. More information resides in the unstructured text of medical records but this is slow and tedious to extract manually.

Fast accurate diagnosis of aortic stenosis with Natural Language Processing

A recent short paper by Solomon et al from Kaiser Permanente Northern California (KPNC) used Natural Language Processing (NLP) algorithms to extract detailed clinical information from echocardiography (ECG) reports. NLP is an Artificial Intelligence (AI) technology used to transform free, unstructured text in documents and databases into normalized, structured data suitable for analysis. Their results were more accurate than using diagnosis codes to identify aortic stenosis, for a patient cohort of over 500,000 individuals.

The KPNC team used Linguamatics NLP platform, to develop and validate NLP queries, in an iterative process. Firstly queries were developed from a set of ~100 ECG reports that had been manually reviewed to confirm aortic stenosis (AS). Then, the queries were refined with additional development sets of ~100 reports until the NLP algorithm achieved both positive and negative predictive values (PPV and NPV) of >95%. After this, the team ran the query over ~960,000 ECG reports, from 2008-2018 (i.e. about 522,000 patients). The results were compared to reports coded with AS diagnosis by ICD9/10 codes.

From the 960,000 ECG reports, the NLP algorithm identified 104,000 ECG reports with AS, of which only 34.7% also had diagnosis codes for aortic stenosis. This means around 35,000 patients were going undiagnosed. The authors point out that accessing information buried in the unstructured text of medical records using NLP can facilitate more effective individual and population management than relying on administrative data alone.

Better use of Real World Data for patient care and outcomes

This study is just part of the recent research which suggests that being able to use the unstructured information in EMRs gives significantly higher accuracy of diagnosis than structured fields alone. The ability to find and diagnose cardiovascular patients effectively (and for other diseases) can advance personalized and population-based care strategies for surveillance and treatment. NLP has demonstrated these capabilities and we look forward to seeing how else it can advance care strategies.


Read more about NLP for Patient SafetyReal World Data, or contact us for a demo.