Skip to main content

I2E NLP in action - Extracting Laboratory Eligibility Criteria Data Elements from Internal Review Board Protocols

“It is easy to lie with statistics. It is hard to tell the truth without it.” -Andrejs Dunkels

This is a quote I first heard long ago, but was recently re-introduced to by a beloved colleague of mine. Anyone with a background in research can attest to just how true this quote is. Without good statistical power, life-saving pharmaceuticals never make it to the market. Undoubtedly, the ones that do, do so at a hefty cost. In 2012, published an article reporting that the average cost to develop a new pharmaceutical was $4 Billion, and could reach upwards to $11 Billion, staggering numbers, and that was 4 years ago[1]. Without any hesitation, I can confidently say, “those numbers aren’t going down.”

But WHY do pharmaceuticals cost so much?

There are genuine factors that contribute to these huge costs, and one of the most expensive phases of drug development are clinical trials. Those of us that have worked in research know that clinical trial recruitment is a huge factor that takes an exorbitant amount of time and money. If you don’t get enough eligible people successfully recruited, and finished in the study, the study won’t have the all-powerful “n”, the number of people that statistically is needed to prove that the study drug was safe and effective (or not).

How can Natural Language Processing (NLP) help in recruitment?

It’s somewhat of a Catch 22. As a patient, you can’t participate in a trial if you don’t know one exists, and how can the trial find you? It’s time to rethink clinical research recruitment. Luckily some of us already are. Thanks to the innovative efforts of groups like the researchers at the Medical University of South Carolina, Biomedical Informatics Center (BMIC), and its close affiliation with South Carolina Clinical & Translational Research Institute (SCTR), recruitment efforts can be initiated by better utilizing information that is already at the clinical researcher’s fingertips, the Electronic Health Record (EHR).

Under this alliance, MUSC researchers built a registry of patients that agreed to be contacted about research studies[2]. Of course this is only part of the answer, as every willing participant would need a matching clinical trial. Dr. Vivienne Zhu was able to utilize I2E’s Natural Language Processing (NLP) to extract relevant laboratory criteria from research protocols within MUSC’s Internal Review Board (IRB). She was able to annotate her test set of 180 protocols within 0.2 seconds. In her results, she reported performance as 94.5 % for precision, 90.2% recall, and 92.3% for F-measure. Click here to see additional information.

Dr. Zhu was able to prove that I2E can robustly and precisely abstract vital laboratory eligibility criteria from free text IRB protocols into a structured format for subsequent comparison to the structured laboratory data in an EHR. This would allow patients in the EHR to be automatically matched to the clinical trials detailed in the IRBs. Of course, Dr. Zhu and I2E can tackle any EHR any unstructured laboratory data too, if she discovers that obstacle in her way!

The overall ramifications for utilizing I2E NLP for this is astounding as traditional recruitment measures can often take years. Manual abstraction of information from patient charts is labor-intensive work (and often fruitless! I can say this personally as I have done this myself, for a number of years- in the name of clinical research). Using I2E NLP, this process can be more systematic, more effective, and less time-consuming.

1. Herper M. The Truly Staggering Cost Of Inventing New Drugs. In: Forbes [Internet]. 10 Feb 2012 [cited 16 Nov 2016]. Available:

2. Gainer C. Clinical Trials and Research at MUSC | Charleston SC [Internet]. [cited 17 Nov 2016]. Available:

Ready to get started?

Request a demo

Questions? Ask our experts