Using Natural Language Processing (NLP) to Improve Patient Cohort Selection

Using Natural Language Processing (NLP) to Improve Patient Cohort Selection

May 29 2017

A project with Drexel University highlighted the importance of being able to apply Natural Language Processing (NLP) to assist with their cohort selection to support medical research, clinical trials recruitment and outcomes analysis.

Drexel was setting up a study into patients with HIV and Hepatitis C and needed to identify potential subjects from their AllScripts EHR. As many organizations do, they had five medical students spend four months (working part-time) trawling through patient records to identify 700 potential study candidates.

The process was particularly painful because simply looking for the ICD codes for HIV and Hepatitis C in structured fields was missing significant numbers of potential subjects. This was caused by variations in where the data was recorded; sometimes it was coded in structured fields; sometimes it was written in the patient narrative that he or she was positive for HIV or Hepatitis C; sometimes it was both.

Assessing the narrative is always a problem with variations in patient history vs family history and “tested for HIV, negative result” and “positive for HIV” requiring careful reading.

Drexel utilized our I2E NLP platform and had indexed a large collection of patient records by extracting documents from AllScripts via their analytical data warehouse.

The data sets were indexed with the usual domain ontologies covering diseases, medications, procedures etc. to support rapid searching in I2E.

As a test to see how we can utilize augmented intelligence to cut down on manual labor, we reproduced the HIV and Hepatitis C project by writing queries to find these patterns in the structured and unstructured patient data using combinations of disease terms and ICD codes. This took around two hours of building and testing but resulted in a potential cohort of 1100 subjects compared to 700 subjects found by 5 students over 4 months working part-time.

These five students might only equate to one person working full- time for 40 hours per week.  But even if this is the case:

I2E = 2 HRS -vs- Manual Hours = 640

This is an amazing improvement both in terms of speed and recall of patients and one I feel highlights why NLP is key to supporting analysis of such patient data.

Fast and accurate cohort selection is vital for any clinical trial.  Many clinical trials fail due to recruitment Issues such as 1. not finding enough eligible participants and 2. spending a vast amount of resources on finding participants manually (both time and money are not optimally utilized.)

Use of NLP can help speed up this process, especially for groups in diseases with limited structured data, like mental health, to be able to more readily use patient data in their studies.

If you'd like to find out more about how NLP can be used to gain insight from unstructured text, view our webinar on the use of Cancer NLP to support Precision Medicine, Clinical Research and Population Health.

Access and watch the webinar

You can also watch our webinar on improving patient recruitment and engagement with clinical trials using NLP.

Access and watch the webinar