Systematic case identification is critical to improving population health, but widely used diagnosis code-based approaches for conditions like valvular heart disease are inaccurate and lack specificity.
We developed and validated natural language processing (NLP) algorithms to identify aortic stenosis (AS) cases and associated parameters from semi-structured echocardiogram reports and compared its accuracy to administrative diagnosis codes.
Using 1,003 physician-adjudicated echocardiogram reports from Kaiser Permanente Northern California, a large, integrated healthcare system (>4.5 million members), NLP algorithms were developed and validated to achieve positive and negative predictive values >95% for identifying AS and associated echocardiographic parameters. Final NLP algorithms were applied to all adult echocardiography reports performed between 2008-2018 (N>900,000), and compared to ICD-9/10 diagnosis code-based definitions for identification of AS.
We found that validated NLP algorithms were substantially more accurate than diagnosis codes for identifying AS, and provided richer clinical detail on ascertained cases.
Leveraging machine learning-based approaches on unstructured EHR data can facilitate more effective individual and population management than using administrative data alone.