Skip to main content

How NLP is Disrupting Clinical Trial Design by Unlocking Historical Insights

clinical trials design analytics

Clinical trials, which account for a third of the cost and time involved in drug development, are ready for disruption. Regulations are high, costs are steep and even small mistakes in protocol design can significantly hurt timelines and slow approval. Across the industry, sponsors are looking to move forward from traditional approaches and embrace novel, smarter trial designs to mitigate these challenges. Their instinct is right; new approaches are merited. One powerful pathway to building better protocols is not to simply leave the past buried, but to examine it and carry forward lessons to inform the future.

Consider that many pharmaceutical companies have 20-plus years of study protocols and trial data representing thousands or millions of patients. Each of those protocols has rich insights that are applicable to today’s challenges. Information on what trial designs succeeded or failed, where and how patients for a specific condition were recruited (and where recruitment lagged), and inclusion/exclusion criteria for a given therapeutic area or for a given population. Is there any doubt that the ability to mine these insights would bolster the success rate for sponsors’ protocol designs? Of course not. But, until now, they have been largely ignored because they are locked away across disparate systems in mixed-format documents that would be far too resource-intensive to parse for every new study.

The NLP solution to free locked insights

Advances in NLP, or Natural Language Processing, over the past few years have been massive, and Linguamatics and IQVIA sit at the cutting edge of these solutions. Our award-winning text mining (also referred to as text analytics) platform can effectively “read” documents and rapidly transform unstructured text, like that contained in clinical trial protocols, into normalized, structured data suitable for analysis or machine learning (ML) algorithms. Importantly, the outputs from today’s NLP are not complex, hard-to-understand code that requires a data expert to translate. Intuitive keyword searches on digitalized documents can be displayed in in easily understandable visualizations, with the ability to drill down into specific data sources to better understand what you are seeing.

For sponsors, that means that a host of new insights can now be readily available to inform your protocol design. What might you learn by joining your historical protocol insights with today’s current operational execution information? Answers to questions like: What study designs were most efficient for us in breast cancer therapy? What retention tactics best helped us meet objectives in a 75+ population? What sites failed to recruit in patients with arthritis? What wording was misunderstood on our consent form for pediatric trials? The potential is huge.

Data42: The Novartis “moonshot” that is now a blueprint for sponsors

The good news is that the promise of NLP for clinical trial design is not a pie-in-the-sky concept. It has already been achieved by Novartis (watch their presentation available on our community site). In 2018, the global drug developer launched what their leadership called the company’s “moonshot”. They wanted a massive digitization of 20-plus years of historical protocols and two million patient-years of data. Novartis knew what they were asking was unprecedented and had big implications. In fact, they called the project Data42, which is a reference to the iconic sci-fi novel series “The Hitchhiker's Guide to the Galaxy” where 42 is "the answer to the ultimate question of life, the universe, and everything".

As one part of this huge endeavour, Novartis wanted to put all of their historical clinical trial protocol records and data sets together, virtually, to enable queries on all of it with the capability to be specific to disease areas so that scientists can ask the questions they weren't able to ask before. Specifically, they wanted to:

  • Have a common set of structured files in a digital format that people could access easily and search rapidly with meta information and tags
  • Mine and use historical data to find new signals in the organization’s existing data and other analyses that it may want to conduct
  • Examine historical data in the context of planning new trials that are more targeted, more efficient, and have a better chance of success

Taking action

Over the past two years, Linguamatics worked with Novartis to extract the key data from clinical trial protocols. We focused initially on final protocols rather than those in progress. The protocols all differed slightly, reflecting individual teams’ preference for versioning documents, so we had to manually correct for that. From there, we deployed our NLP to normalize each protocol into specific parts using public domain document ontologies (such as the Document Components Ontology, DoCO), being careful to not alter existing text, but rather just tag it in a way that helped navigate the documents moving forward. We worked with Novartis to continually build up capabilities over time to unlock richer search capabilities, using other ontologies and schema such as CDISC, MeSH, OpenEHR and others. These enabled the capture of overall trial metadata (when a trial took place, what indication, what phase), and also specific data, for example, outcomes in schedule assessment tables. After manual review, all these data could be loaded into a central repository of searchable files and metadata.  

This project has allowed Novartis to power a connected data environment wherein decades of historical data are centralized, harmonized and accessible for Novartis associates to interrogate in ways that drive healthcare forward.

Future implications for clinical trial design

The use of NLP by Novartis for clinical trial protocol digitalization is just one example of the ability to unlock of historical data siloes, providing rich insights. As many other sponsors are looking to embark on similar data digitization journeys, Novartis is looking ahead to what it sees as the next step in digitization: the ability to design a drug completely in silico, meaning not in the lab, but rather on a computer.

Will the application of NLP to historic protocol designs yield “the answer to the ultimate question of life, the universe, and everything”? Probably not. But it will no doubt unlock a valuable treasure trove of insights that sponsors would be remiss to ignore. If you would like to see the power of Linguamatics’ NLP solutions to surface hidden insights in your unstructured data, including clinical trial protocols and more. 

Watch the webinar on-demand

Ready to get started?

Request a Demo

Questions? Ask our experts