Phenome Extraction - Phillip Payne

Philip Payne described how Washington University is leveraging the Linguamatics NLP platform in a large precision medicine initiative in partnership with Centene Corporation (the largest operator of Medicare advantage plans in U.S.) and BJC Healthcare (a 15 hospital system with ca. 6.4 million active covered lives). Centene funded research to address variations in quality, outcome, and cost of care in their patient populations, via precision medicine research in critical areas, including diabetes, obesity, neurodegeneration, breast cancer, and other solid tumors. 

The aim was to establish disease-specific patient cohorts from the 6.5 million patients and formalize phenotypic registries to support basic clinical and translational research. NLP is central to the project, because ca. 80% of the high value phenotypic data is encoded in the clinical narrative, not in structured or discrete fields in the electronic health record (EHR).

Philip drilled into the use of the Linguamatics NLP platform in processing the patient data in the Alzheimer’s cohort, where they have done the most interrogation on the rich, longitudinal data set. They used SME-created gold standards to assess the NLP output, and were able to combine structured and NLP-extracted unstructured data, and with ML assistance, find patients with a sub-phenotype, e.g. Alzheimer's disease severity, likely outcome, or implication of family history. These techniques are now being applied in other areas including solid tumors, cardiovascular disease, obesity, metabolic syndrome, and type 2 diabetes.

We think [Linguamatics NLP] is a powerful tool. The performance of our machine learning algorithms is substantially improved when we use these combined data sets as opposed to when we use the structured or discrete data in the EHR alone.