Skip to main content

Clinical Trial Analytics

  1. Overview
  2. Use Cases and Blogs for Clinical Trial Analysis

The Linguamatics NLP platform is invaluable in clinical trial analysis, assisting with trial design, optimization, site selection and competitive intelligence.

Clinical trials are used to gather safety and efficacy data on new drugs in development, or existing drugs being tested for new indications. Although some information in clinical trial reports is well structured and searchable using keywords, much of the information lies buried in unstructured text.

Our solution is an essential tool for extracting and synthesizing the high value information that is found in this unstructured text. This can then be used in future study design and site selection, or to gain actionable information about competitors' worldwide clinical development activities.

Customer report that using I2E, the time for site selection can be reduced by over 80%. For patient recruitment, time spent can be reduced by at least 25%.


There are currently nearly 200,000 study records in, testing over 70,000 unique pharmacotherapies in approximately 190 countries. Other cancer registries, both public and commercial, also provide a rich source of clinical trial data. The case for using advanced text mining over clinical trials is particularly compelling as the industry looks to cut the costs and time required for trials.

Text mining with I2E is used widely by our pharma and biotech customers to aid in clinical trial site selection and study design. The outcome is significant time and cost savings: for example, as patient recruitment in the mature markets becomes increasingly difficult, I2E enables sponsors to locate clinical trial sites abroad. 

Customers are able to run queries over the detailed unstructured textual record fields in databases such as, Cortellis Clinical Trials Intelligence, WHO ICTRP, or Citeline's TrialTrove to rapidly identify, extract, synthesize and analyze relevant information such as clinical trial site, selection criteria, study characteristics, patient numbers and characteristics that would not be possible using other approaches. These data can be used to answer key questions such as:

  • What clinical endpoints would be appropriate to measure for xyz diseases?
  • Which investigators are expert in running clinical trials for diseases xyz?
  • Who else has drugs in clinical trials for indication xyz?
  • Which trials (in a given disease area) use drug xyz in combination with another drug?
  • What potential due diligence information can I find for in-licensing opportunities for disease area xyz?

Use Cases and Blogs for Clinical Trial Analysis

Fast-Tracking Clinical Trials at Eli Lilly and Company
Eli Lilly needed a solution for expediting the extraction, analysis, and synthesis of specific outcome statistics from oncology and diabetes clinical trials records. Doing so would enable the organization to understand the competitive landscape, and focus ongoing research and trial planning.



Comparing Real-World Evidence to clinical trial events at AstraZeneca
AstraZeneca set out to test the hypothesis that adverse reaction information from patients could effectively supplement information from clinical trials, to give scientists and clinicians a more accurate, well-rounded description of safety data for particular treatments. The Linguamatics I2E text-mining solution played a key role in rapidly assembling comparable data sets.


Mining clinical trials reports with I2E at AstraZeneca
Use of I2E for knowledge discovery from clinical trials reports. This case study outlines two investigations using I2E to provide answers for clinical decision makers; the first identifies the blind status of trials with differing intravenous (IV) drug doses, and the second examines the dose durations of follow-on clinical trials.


Network graph view of nodes during Phase 1 MAD (multiple ascending dose) studies and follow-on Phase 2 three-month dosing studies in the infectious disease area, linked via text-mining for sponsor, disease area and intervention.

Network graph view of Phase 1 MAD (multiple ascending dose) studies and follow-on Phase 2 three-month dosing studies in the infectious disease area, linked via text-mining for sponsor, disease area and intervention.

Clinical trial protocol design and optimization at Merck
Merck use Linguamatics I2E for text analytics over public domain clinical trial data, to improve clinical trial site selection. In this example, Merck Experimental Medicine division (EMS) needed to locate a clinical trial site that would be able to conduct gastric bypass trials with the ability to measure gut peptides before and after surgery [...]


Ready to get started?

Request a Demo

Questions? Ask our experts