Tracking and reporting adverse events

In recent years, regulatory authorities such as the FDA and EMA have placed an increased emphasis on drug safety of marketed products, particularly the tracking and reporting of adverse events. Pharmaceutical companies are expected to regularly screen the worldwide scientific literature for potential adverse drug reactions, at least every two weeks. The use of text mining and other tools to streamline the literature review process for pharmacovigilance is more crucial than ever in order to ensure patient safety, without overloading drug safety teams.

Manual review of adverse events is time-consuming

Eric Lewis (Safety Development Leader at GlaxoSmithKline) talked at the Linguamatics Text Mining Summit about the challenges of reviewing medical literature for safety signals. For example, he looked for literature for a sample of just 20 marketed products across a 300-day period. Eric found that there were on average 60 new references per day (with a total of over 11,000 documents). He found that manual review time was 1.2 to 1.6 minutes per abstract. He extrapolated this to a typical pharma company product portfolio of 200 marketed products, and showed that this volume of literature would take over 2,200 hours to review – hugely time-consuming.


Understanding drug-drug interactions can improve drug safety

A considerable proportion of adverse drug events are caused by interactions between drugs. With an ageing population, and associated increasing multiplicity of age-related illnesses, there is an increase in the potential for increased risk of drug-drug interactions (DDIs). One way of alleviating some DDIs is by ensuring that potentially interacting drugs are taken at suitable time intervals apart. But, what is the best interval to recommend?

In a recent seminar, Keith Burkhardt of the FDA described a project using text mining to survey the landscape of information on DDIs from FDA Drug Labels. And, in particular, the FDA review division wanted to find labelling for drugs where the time separation was stated, in order to prevent potential drug safety events.

Mining Data from FDA Drug Labels: dosing regimens and time separation

The drug classes of interest included bile acid sequestrants and exchange resins (such as cholestyramine, colestipol, colesevelam, all LDL cholesterol lowering drugs), phosphate binders (e.g. sevelamer; used for patients with chronic kidney failure), and chelators (used to treat excessively high levels of lead, iron or copper in the blood; e.g. deferasirox, deferiprone). These drug classes can all alter the bioavailability of other drugs, particularly for those with a narrow therapeutic range such as warfarin or antiepileptic drugs.


In this world of ever-increasing volume and variety of textual data, there is a growing variety of tools and technologies to handle and get value from these data.  We hear about a potentially bewildering barrage of AI technologies including Natural Language Processing (NLP), Machine learning, and other textual data science applications. A recent blog I read highlighted this, with a Venn covering over a dozen different disciplines (see figure below). These techniques all bring benefits, but often we just need straightforward simple access to our unstructured text data.

Empower a wide variety of users to find relevant data with high recall and precision

Linguamatics I2E brings a combination of powerful text mining tools to many pharma, biotech and healthcare users. We recognize that users’ -demands vary, and so we have created I2E Web Portals. I2E Web portals aim to engage users that want rapid easy access to scientific knowledge from both public domain knowledgebase (e.g. MEDLINE, ClinicalTrials.gov) and internal data silos, ranging from regulatory dossiers, preclinical safety data, patient/customer call transcripts, and many more.


There’s growing interest in the use of machine learning to solve challenges across the drug-discovery pipeline within the biopharmaceutical community. The availability of high quality data for training algorithms is vital to machine learning success - but much of this information is tied up in unstructured, or semi-structured text sources. Natural language processing (NLP) is the key to extracting the wealth of data hidden in unstructured text, and Linguamatics’ customers have been finding out first-hand what this approach can do for them.

Using Linguamatics I2E NLP text mining:

  • Eli Lilly researchers mine adverse event data to identify potential new uses for existing drugs.
  • A top-10 pharma company process and understand unstructured “voice of the customer” call feeds, to categorize the feeds and help build predictive models.
  • Roche and Humboldt University of Berlin identified MEDLINE abstracts containing both the protein target and specific disease indication of a known set of cancer therapeutics, and applied machine learning to predict the success or failure of drugs in Phase II or III with high accuracy.

Read the full 'Data-driven NLP plus machine learning' application note to find out more about how NLP can support effective machine learning projects.


IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.

The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.

Linguamatics NLP extracts IDMP data elements

Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.

I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert. Find out more about I2E's Extract Transform Load (ETL) solutions here.

Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.