Skip to main content

Using NLP to Improve Discrete Data Capture at Kaiser Permanente

Using NLP to Improve Discrete Data Capture From Interpretive Cervical Biopsy Diagnoses at a Large Health Care Organization
7th Apr 2022
Soora Wi


Soora Wi, MPH; Patricia E. Goldhoff, MD; Laurie A. Fuller, MHA, CT(ASCP); Kiranjit Grewal, MS, MLS(ASCP); Nicolas Wentzensen, MD, PhD; Megan A. Clarke, PhD; Thomas S. Lorey, MD


The terminology used by pathologists to describe and grade dysplasia and premalignant changes of the cervical epithelium has evolved over time. Unfortunately, coexistence of different classification systems combined with non-standardized interpretive text has created multiple layers of interpretive ambiguity.


To use natural language processing (NLP) to automate and expedite translation of interpretive text to a single most severe, and thus actionable, cervical intraepithelial neoplasia (CIN) diagnosis.


We developed and applied NLP algorithms to 35‚ÄČ847 unstructured cervical pathology reports and assessed NLP performance in identifying the severest diagnosis, compared to expert manual review. NLP performance was determined by calculating precision, recall, and F score.


The NLP algorithms yielded a precision of 0.957, a recall of 0.925, and an F score of 0.94. Additionally, we estimated that the time to evaluate each monthly biopsy file was significantly reduced, from 30 hours to 0.5 hours.


A set of validated NLP algorithms applied to pathology reports can rapidly and efficiently assign a discrete, actionable diagnosis using CIN classification to assist with clinical management of cervical pathology and disease. Moreover, discrete diagnostic data encoded as CIN terminology can enhance the efficiency of clinical research.

Read full publication