Skip to main content

One Data Source, Three Novel Clinical Trial Analytics Use Cases: Eric Su Shares Why Eli Lilly Loves NLP

Scientists Working on Computer In Modern Laboratory Pharma Clinical Trial Analytics

At a recent webinar, Eric Su, a consultant at Eli Lilly, shared three unique ways his organization is using Linguamatics’ Natural Language Processing (NLP) technology to make employees’ jobs easier. An interesting note about Eric’s examples is that they all focus on one data source,, which illustrates the impressive scope of insights that NLP technology can deliver, even across just a single data source. With diverse information on almost 350,000 clinical trials, has always been a rich source of data. However, a lot of important information around trial design, efficacy, and adverse event data are buried in free text fields, and hard to extract manually. Linguamatics is solving that challenge by offering advanced text analytics powered by our trusted NLP technology to deliver value from bench to bedside. In Eli Lilly’s case, they are using our NLP for clinical trial analytics in three interesting ways:

Answer a Question on Trial Design Strategy

Eli Lilly wanted to know the landscape of Phase I and Phase II clinical trials that are testing two or three drugs in the autoimmune area. If they tried to find the answer manually, it would involve reviewing thousands of autoimmune disease trials over an impractical amount of time. The company had a few trials they already knew about, but to get a comprehensive picture, they needed to mine text on

Eric used Linguamatics NLP to build a query that could identify such trials. The query identified more than 300 trials that fulfilled all the search needs. The output displayed company and drug names, as well as a variety of other fields that the clinical trial team wanted to explore. The results table showed companies ranked by the number of matching trials and users can drill down to see the trial names, specific drugs tested, disease condition, and more.

The I2E query automated manual work and saved time for Eli Lilly’s competitive intelligence team. Once the query strategy was tailored to the customer’s needs to achieve a high precision and high recall, they were able to re-use this approach with other disease areas.

Extract efficacy data for meta-analysis

Eli Lilly wanted to extract efficacy data for a specific disease area so its clinical statisticians could do a meta-analysis. Historically, the only way to do this was to use vendors to perform manual extraction, cut-and-pasting the data from into spreadsheets and reports. However, vendors are expensive, and the process is slow – typically around six months – so the data are rapidly out-of-date.

Eric and Eli Lilly solved this challenge by building I2E queries to rapidly extract the right data. Once the query is built, users can simply run the query weekly to extract newly posted trials and data.

Efficacy data is usually buried in many tables, and the appropriate metrics and values need to be extracted carefully. With Linguamatics’ NLP technology, Eric was able to achieve 100 percent accuracy by taking advantage of the XML structure underlying data in Eric built a query to match the data to the correct study arm, and published the query in an easy-to-use self-serve NLP web portal. With this intuitive GUI, Eli Lilly users can use the query for any disease, and return results on efficacy data, without relying on Eric to run the query.

NLP makes it feasible to rapidly extract data from with a high recall (aided by Linguamatics NLP’s powerful disease and drug ontologies) with 100 percent precision (versus manual input that is prone to error).

Extract adverse event data for drug repurposing

Eric was interested in the potential for repurposing existing drugs for cancer indications, and wanted to explore’s adverse event data as a way to reveal novel drug repurposing hypotheses. It was a two-part challenge: 1) how to extract all the data from relevant tables in And 2) after we have the data, how do we find the drug that has significantly less cancer events than control?

Eric designed an I2E query to extract adverse events from clinical trials from non-oncology trials, distinguishing adverse events in the placebo vs. control arms. The output from this query was a huge table, about 120,000 rows from more than 3,200 clinical trials. With all the relevant data at the ready, Eric was able to rank results statistically to find the highest (most significant) Z-score from the thousands of rows of data. The output allowed him to hypothesize that vitamin K1, Telmisartan and Aliskiren (among other candidates) had potential to be repurposed for cancer.

Adverse event data has long been available in, but the cost/benefit analysis of sifting through 120,000 rows of data has rarely made sense. With NLP, users can now glean these kinds of insights rapidly and painlessly.

While Eric’s examples focused on clinical trial analytics, NLP can also support everything from early discovery, regulatory review, gene mapping, key opinion leader identification and much more. For more information on how Linguamatics’ NLP technology can help mobilize and transform your clinical trial data visit our clinical trials analytics page.

Watch the webinar

Ready to get started?

Request a Demo

Questions? Ask our experts