Skip to main content

How NLP is Disrupting Clinical Trial Design by Unlocking Historical Insights

clinical trials design analytics

Clinical trials, which account for a third of the cost and time involved in drug development, are ready for disruption. Regulations are high, costs are steep and even small mistakes in protocol design can significantly hurt timelines and slow approval. Across the industry, sponsors are looking to move forward from traditional approaches and embrace novel, smarter trial designs to mitigate these challenges. Their instinct is right; new approaches are merited. One powerful pathway to building better protocols is not to simply leave the past buried, but to examine it and carry forward lessons to inform the future.

Consider that many pharmaceutical companies have 20-plus years of study protocols, each of which can be mined for trial designs that are applicable to today’s challenges. Information on what biomarkers or lab tests are planned across the assessment schedules, inclusion/exclusion criteria for a given therapeutic area or for a given population, and more. And, if these protocols can be digitalized and accessible, the potential to connect to overall trial results brings additional value, for example, whether a particular trial design succeeded or failed, where and how patients for a specific condition were recruited (and where recruitment lagged).

Is there any doubt that the ability to mine these insights would bolster the success rate for future protocol designs? Of course not. But, until now, they have been largely ignored because they are locked away across disparate systems in mixed-format documents that would be far too resource-intensive to parse for every new study.

The NLP solution to free locked insights

Advances in NLP, or Natural Language Processing, over the past few years have been massive, and Linguamatics and IQVIA sit at the cutting edge of these solutions. Our best-in-class text mining (also referred to as text analytics) technology can now effectively “read” documents and rapidly transform unstructured text, like that contained in clinical trial protocols, into normalized, structured data suitable for analysis or machine learning (ML) algorithms. Importantly, the outputs from today’s NLP are not complex, hard-to-understand code that requires a data expert to translate. Intuitive keyword searches on digitalized documents can be displayed in in easily understandable visualizations, with the ability to drill down into specific data sources to better understand what you are seeing.

For sponsors, that means that a host of new insights can now be readily available to inform your protocol design. What might you learn by joining your historical protocol insights with today’s current operational execution information? Answers to questions like: What study designs were most efficient for us in breast cancer therapy? What inclusion/exclusion criteria helped us meet objectives in a 75+ population? What sites failed to recruit in patients with arthritis? The potential is huge.

Driving value from historical clinical trial protocols with NLP

The good news is that the promise of NLP for clinical trial design is not a pie-in-the-sky concept. It has already been taken forwards by a team at Novartis, presented at IQVIA Virtual NLP Summit in 2021.

The Novartis team wanted to put all of their historical clinical trial protocol records together, virtually, to enable queries on all of it so that scientists can ask the questions they weren't able to ask before. They wanted to:

  • Have a common set of structured files in a digital format that people could access easily and search rapidly with meta information and tags
  • Mine and use historical data to find new signals in the organization’s existing data and other analyses that it may want to conduct
  • Examine historical data in the context of planning new trials that are more targeted, more efficient, and have a better chance of success

Linguamatics worked with the Novartis team to extract the key data from clinical trial protocols. We focused initially on final protocols rather than those in progress. The protocols all differed slightly, reflecting individual teams’ preference for versioning documents, so we had to manually correct for that. From there, we deployed our NLP to normalize each protocol into specific parts using public domain document ontologies (such as the Document Components Ontology, DoCO), being careful to not alter existing text, but rather just tag it in a way that helped navigate the documents moving forward. We worked with the Novartis experts to continually build up capabilities over time to unlock richer search capabilities, using other ontologies and schema such as CDISC, MeSH, OpenEHR and others. These enabled the capture of overall trial metadata (when a trial took place, what indication, what phase), and also specific data, for example, outcomes in schedule assessment tables. After manual review, all these data could be loaded into a central repository of searchable files and metadata.  

Future implications for clinical trial design

The use of NLP for clinical trial protocol digitalization is just one example of the ability to unlock of historical data siloes, providing rich insights. This project has allowed Novartis to power a connected data environment wherein decades of historical data are centralized, harmonized and accessible for researchers to interrogate in ways that drive healthcare forward.  If you would like to see the power of Linguamatics’ NLP solutions to surface hidden insights in your unstructured data, including clinical trial protocols and more, reach out to us today for a demo.


Ready to get started?

Request a Demo

Questions? Ask our experts