Skip to main content

Generating Clinical Trial Design Insights using NLP

30th Nov 2023

Improve efficiency 10-100x and enhance key decision making support

Conducting clinical trials for drug development is an expensive but critical step to get patients their needed treatments. On average, each trial ends up with over 2 amendments during the trial, which can result in $0.2 – 1+ million of additional costs. Improving the design process from the start can build the foundation to prevent additional costs. This process can be tedious, complex, and requires data from a wide range of sources. AI, specifically NLP, can unearth the required data effectively at scale to help support, plan, and design clinical trials.

Linguamatics NLP combines award-winning NLP technology with 20+ years of domain expertise. It consists of a blend of tools, including linguistic rules, ontologies, machine learning (ML), and large language models (LLMs), which are used to find insights from text. It can be applied to multiple sources including, protocols (public or proprietary), publications, social media, and more. The insights can be standardized and unified across these sources using Linguamatics proprietary and public ontologies, breaking down silos in the process.

Client results

  • Speed up data generation by 10-100x
  • Increase clinical endpoint extraction by 12x • Find up to 42x more relevant trials
  • 93-96% F-scores

Accurately identified information from unstructured text. Linguamatics NLP can find information with high accuracy from difficult to parse literature. This example is from the endpoint benchmarking work where the endpoint was correctly linked to the endpoint type and timepoint.

Identifying Key Endpoints for Benchmarking. These examples contain endpoint outcomes that are challenging for automated methods to extract correctly. Many tools can identify that these sentences contain endpoint metrics and can’t go much further. Our NLP can correctly find and extract the endpoint outcomes with the corresponding time points (e.g. 1-year 84.3% OS rate, 2-year 63.0% LPFS rate, etc.).


NLP use cases within clinical trials

Among our top-20 pharma clients, Linguamatics NLP been used to identify new trial sites, endpoint benchmarking and design, protocol digitalization, and competitive landscaping.

Site Selection

Discover new trial sites

Identifying many sites to find enough participants is vital to trial success. Sites need to provide patients that fit specific inclusion criteria. Using award-winning linguistic capabilities the team were able to find sites based on the selection criteria of trials they supported. As a result, three ideal sites were found, one of which was unknown to the client.

Endpoint Benchmarking

F-scores of 93-96%

Benchmarks help set the clinical trial performance goals, but they are time consuming to extract from literature. Combining two nlp methods (rules-based queries and LLMs) were combined to extract efficacy endpoints such as objective, overall, partial response rates, etc. This approach enabled to quickly train highly accurate LLMs that had 93-96% f-scores.

Endpoint design

Understanding patient experience

Designing trials for new therapeutic areas that aren’t well understood can provide unique challenges. To incorporate the patient experience, IQVIA NLP was used on patient forums discussing living with early parkinson’s. The comments were analyzed confirming many signs and symptoms published in literature and identifying 4 new ones — vertigo, voice change, arm swing asymmetry, and tingling.

Protocol digitalization

Transform data for analytics

Clinical trial protocols and supporting documents contain significant amounts of information. One ambitious client utilized Linguamatics NLP to digitalize and standardize the information from legacy protocols. This enabled them to apply analytics and feed ML models. Scientists also gained unprecedented access to search across legacy data and get insights that were previously difficult to find or even unobtainable.

Competitive intelligence

42x larger competitive landscape

Knowing the therapeutics in the trial phase provides a preview of potential future treatments for a therapeutic area. A top 10 pharma wanted a comprehensive list of combination therapies in phase 2 & 3 trials for specific therapeutic areas. Using IQVIA NLP they found 300+ trials with minimal effort compared to 7 through previous methods, representing a 42x increase in trials found.

Diversity planning

Population distribution

Regulatory agencies are requiring the addition of diversity planning for new trials. A top 10 pharma used IQVIA NLP to extract the incidence and prevalence of metastatic and recurrent cancers for various races and ethnicities from literature. This data helped determine the appropriate distribution of underrepresented groups for cancer trials.


The success of a clinical trial begins with a comprehensive design. Linguamatics NLP has a proven and flexible solution to provide information for this complex process and key decision support.

Get in touch today to see how we can make your clinical trial processes more efficient and cost effective.