Using natural language processing (NLP) to improve patient cohort selection

May 29 2014

A recent customer project highlighted to me the importance of being able to apply NLP to cohort selection to support medical research, clinical trials recruitment and outcomes analysis.

A new customer of ours was setting up a study into patients with HIV and Hepatitis C and needed to identify potential subjects from their AllScripts EHR. As many organizations do, they had five medical students spend four months trawling through patient records to identify 700 potential study candidates.

The process was particularly painful because simply looking for the ICD-9 codes for HIV and Hepatitis C in structured fields was missing significant numbers of potential subjects. This was caused by variations in where the data was recorded; sometimes it was coded in structured fields; sometimes it was written in the patient narrative that he or she was positive for HIV or Hepatitis C; sometimes it was both.

Assessing the narrative is always a problem with variations in patient history vs family history and “tested for HIV, negative result” and “positive for HIV” requiring careful reading.

Our customer had recently installed our I2E NLP platform and had indexed a large collection of patient records by extracting documents from AllScripts via their analytical data warehouse.

The data sets were indexed with the usual domain ontologies covering diseases, medications, procedures etc. to support rapid searching in I2E.

As a test of the new platform, we reproduced the HIV and Hepatitis C project by writing queries to find these patterns in the structured and unstructured patient data using combinations of disease terms and ICD-9 codes. This took around two hours of building and testing but resulted in a potential cohort of 1100 subjects compared to 700 subjects found over 20 man-months. This is an amazing improvement both in terms of speed and recall of patients and one I feel highlights why NLP is key to supporting analysis of such patient data.

Fast and accurate cohort selection is vital for academic medical centers to get the best subjects for their studies and support clinical trial accrual.

Currently the manual effort is prohibitive in assessing patient records meaning that only the well-funded groups have the teams to do manually do this.

Use of NLP can help speed up this process, especially for groups in diseases with limited structured data, like mental health, to be able to more readily use patient data in their studies.

If you'd like to find out more about how NLP can be used to gain insight from unstructured text, view our webinar - Advanced NLP for Electronic Health Records.