The terminology used by pathologists to describe and grade dysplasia and premalignant changes of the cervical epithelium has evolved over time. Unfortunately, coexistence of different classification systems combined with non-standardized interpretive text has created multiple layers of interpretive ambiguity. We used natural language processing (NLP) to automate and expedite translation of interpretive text to a single most severe, and thus actionable, cervical intraepithelial neoplasia (CIN) diagnosis.
The algorithms that were developed and then applied to 35,847 unstructured cervical pathology reports assessed NLP performance in identifying the severest diagnosis, compared to expert manual review. NLP performance was determined by calculating precision (0.957), recall (0.925), and F score (0.94). Using NLP also significantly reduced the time to evaluate each monthly biopsy file from 30 hours to 0.5 hours.
The use of NLP rapidly and efficiently assigned a discrete, actionable diagnosis using CIN classification that can assist with clinical management of cervical pathology and disease. Moreover, discrete diagnostic data encoded as CIN terminology can enhance the efficiency of clinical research.