Text Mining for Drug Safety Issues

November 29 2016

Uncovering new toxicities from chronic non-rodent studies

Preclinical toxicology studies are an essential part of the drug discovery-development pipeline, to support the safe conduct of clinical trials. And drug safety is, of course, one of the most critical aspects to ensure during drug development.

We were pleased to see the recent publication by Merck on a text-mining approach to assess the value of chronic non-rodent toxicology studies. 

Preclinical safety assessment groups employ a variety of animal models and assays to satisfy regulatory agency requirements to identify and characterize drug toxicities, describe drug exposures, and provide qualitative and quantitative risk assessments for human exposure. These require considerable resource investment, however the results are often “locked away” in internal reports. This means re-use of these valuable data is difficult and costly.

This is a common situation within the pharmaceutical industry – where critical information is locked away in textual reports, such as the informed scientific conclusions of pathologists, histologists, safety experts. Natural language processing can overcome the barriers, extracting structured facts from unstructured documents, and Merck’s paper describes an evaluation of a text mining workflow to access these important data.

Merck used I2E to identify the additional toxicities observed in 32 chronic (9 or 12 months) toxicology studies in dogs or monkeys and 27 chronic (6 month) toxicology studies in rats. I2E was used to find conclusions and interpretations from final study reports, antemortem reports, postmortem reports and protocols stored in a Documentum-based file repository. The I2E queries developed were able to identify, extract, and normalize study annotation metadata and organ pathology findings.

I2E enabled the use of standard ontologies such as NCI Thesaurus, MedDRA, and MeSH, but with additions to handle specific issues e.g. using organ adjectives and substructure terms to map to key target organs (myocardial or valve for heart, renal or tubular for kidney).

One of the key conclusions of the study was that this text mining pipeline enabled the authors to more effectively identify toxicities that were not seen in 3-month studies but became apparent only in chronic testing.

The authors state: “this report exemplifies the benefit of investing in such internal knowledge mining capability”.

Wendy Cornell, one of the authors, has stated that this study made significant business impact:

  • Driving regulatory change to take work out of [safety assessment] system without compromising human safety
  • Prioritizing capability development (biomarkers, imaging)
  • Identifying compounds with specific profiles to develop and qualify assays
  • Providing historical summary of findings to assess significance of new findings on pipeline compounds