Understanding drug-drug interactions can improve drug safety

A considerable proportion of adverse drug events are caused by interactions between drugs. With an ageing population, and associated increasing multiplicity of age-related illnesses, there is an increase in the potential for increased risk of drug-drug interactions (DDIs). One way of alleviating some DDIs is by ensuring that potentially interacting drugs are taken at suitable time intervals apart. But, what is the best interval to recommend?

In a recent seminar, Keith Burkhardt of the FDA described a project using text mining to survey the landscape of information on DDIs from FDA Drug Labels. And, in particular, the FDA review division wanted to find labelling for drugs where the time separation was stated, in order to prevent potential drug safety events.

Mining Data from FDA Drug Labels: dosing regimens and time separation

The drug classes of interest included bile acid sequestrants and exchange resins (such as cholestyramine, colestipol, colesevelam, all LDL cholesterol lowering drugs), phosphate binders (e.g. sevelamer; used for patients with chronic kidney failure), and chelators (used to treat excessively high levels of lead, iron or copper in the blood; e.g. deferasirox, deferiprone). These drug classes can all alter the bioavailability of other drugs, particularly for those with a narrow therapeutic range such as warfarin or antiepileptic drugs.


In this world of ever-increasing volume and variety of textual data, there is a growing variety of tools and technologies to handle and get value from these data.  We hear about a potentially bewildering barrage of AI technologies including Natural Language Processing (NLP), Machine learning, and other textual data science applications. A recent blog I read highlighted this, with a Venn covering over a dozen different disciplines (see figure below). These techniques all bring benefits, but often we just need straightforward simple access to our unstructured text data.

Empower a wide variety of users to find relevant data with high recall and precision

Linguamatics I2E brings a combination of powerful text mining tools to many pharma, biotech and healthcare users. We recognize that users’ -demands vary, and so we have created I2E Web Portals. I2E Web portals aim to engage users that want rapid easy access to scientific knowledge from both public domain knowledgebase (e.g. MEDLINE, ClinicalTrials.gov) and internal data silos, ranging from regulatory dossiers, preclinical safety data, patient/customer call transcripts, and many more.


There’s growing interest in the use of machine learning to solve challenges across the drug-discovery pipeline within the biopharmaceutical community. The availability of high quality data for training algorithms is vital to machine learning success - but much of this information is tied up in unstructured, or semi-structured text sources. Natural language processing (NLP) is the key to extracting the wealth of data hidden in unstructured text, and Linguamatics’ customers have been finding out first-hand what this approach can do for them.

Using Linguamatics I2E NLP text mining:

  • Eli Lilly researchers mine adverse event data to identify potential new uses for existing drugs.
  • A top-10 pharma company process and understand unstructured “voice of the customer” call feeds, to categorize the feeds and help build predictive models.
  • Roche and Humboldt University of Berlin identified MEDLINE abstracts containing both the protein target and specific disease indication of a known set of cancer therapeutics, and applied machine learning to predict the success or failure of drugs in Phase II or III with high accuracy.

Read the full “Data-driven NLP plus machine learning” application note to find out more about how NLP can support effective machine learning projects.


IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.

The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.

Linguamatics NLP extracts IDMP data elements

Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.

I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert.

Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.

Productizing text-mining workflows for IDMP compliance

Mundipharma Research is now planning to productize the text-mining workflow for IDMP, as well as exploring how to expand the use of I2E across the wider enterprise, including:


Pfizer improves Patent Search 10-fold with Linguamatics I2E

Intellectual property is critical in the drug discovery process. Before initiating any new project it is important to understand the patent landscape around any particular disease area, check if there is freedom-to-operate, and assess patentability. The business case to assess commercial viability for a project must cover not just the biology, such as “is there unmet medical need” but also, “what is the IP position”.

Streamlining patent research with natural language processing (NLP) text mining

So, scientists and researchers need to be able to access the information on genes and diseases in patents. But patents can be hundreds of pages long and contain complex information constructions and interconnected facts.  Manual patent research is a time-consuming and costly process. More and more pharma companies, such as Pfizer, are looking to NLP text mining to keep up to date with their patent literature.

Pfizer researchers use Linguamatics Life Science Platform powered by I2E to find patents relating to specific diseases. The results feed a database to visualize gene targets, invention type, competitor organizations and overall patent “relevancy”.