Comparative Effectiveness Research - reuse of clinical trial data using text analytics

With the ongoing focus on healthcare outcomes-based payment models, pharmaceutical companies face powerful pressures to demonstrate not just safety and efficacy of a new treatment, but also both cost effectiveness and comparative effectiveness. This means they must show that their agent is not only better than placebo but also better than other agents. Comparative effectiveness of any particular treatment can be established by interventional clinical trials, observational real-world evidence studies, or systematic review and meta-analysis. Access to on-going and past clinical trials via trial registries provides much valuable information, but effective search can be hindered by issues such as search vocabularies and problems of searching the unstructured text.

Merck recently published a paper, demonstrating the success of a text-mining pipeline that overcomes these issues and extracts key information for comparative effectiveness research from clinical trial registries. Researchers in the Informatics IT group wanted to search clinical trial registries (NIH, WHO International Clinical Trials Registry Platform (ICTRP), and Citeline Trialtrove) and synthesize comparative effectiveness data for a set of Merck drugs, in order to:

  • Proactively identify opportunities and risks
  • Improve the understanding of the needs of external stakeholders
  • Make faster informed decisions on inline and development products

They developed a Pipeline Pilot workflow that used the power of Linguamatics I2E natural language processing to extract key fields from relevant records across the three different registries, to remove duplications and to present an integrated and structured output to the product development teams for decision support.

Information from each clinical trial record was extracted and harmonised using I2E, from both structured and unstructured text fields within the records.

Interestingly, the researchers found that the three clinical trial data sources provided some, but not all, relevant trials. Many of the comparative effectiveness research trials identified for each of the six Merck drugs were found in all three of the clinical trial sources, but a significant number were found in only one or two, emphasizing the need to search multiple sources for most complete results. Thus the text mining pipeline provided a more comprehensive and accurate data set, as well as a considerable speed-up, compared to the equivalent manual process. 

The timely information alerts provided by this system have enabled Merck to stay at the forefront of emerging clinical research.

Overlap of Clinical Trial reports. Venn diagram showing the metrics of the numbers of clinical trials retrieved from each of the three clinical trial registries, showing the overlap between the different sources.