A recent customer project highlighted to me the importance of being able to apply NLP to cohort selection to support medical research, clinical trials recruitment and outcomes analysis.
A new customer of ours was setting up a study into patients with HIV and Hepatitis C and needed to identify potential subjects from their AllScripts EHR. As many organizations do, they had five medical students spend four months trawling through patient records to identify 700 potential study candidates.
The process was particularly painful because simply looking for the ICD-9 codes for HIV and Hepatitis C in structured fields was missing significant numbers of potential subjects. This was caused by variations in where the data was recorded; sometimes it was coded in structured fields; sometimes it was written in the patient narrative that he or she was positive for HIV or Hepatitis C; sometimes it was both.
Assessing the narrative is always a problem with variations in patient history vs family history and “tested for HIV, negative result” and “positive for HIV” requiring careful reading.
Our customer had recently installed our I2E NLP platform and had indexed a large collection of patient records by extracting documents from AllScripts via their analytical data warehouse.
The data sets were indexed with the usual domain ontologies covering diseases, medications, procedures etc. to support rapid searching in I2E.