IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.

The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.

Linguamatics NLP extracts IDMP data elements

Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.

I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert. Find out more about I2E's Extract Transform Load (ETL) solutions here.

Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.


Pfizer improves Patent Search 10-fold with Linguamatics I2E

Intellectual property is critical in the drug discovery process. Before initiating any new project it is important to understand the patent landscape around any particular disease area, check if there is freedom-to-operate, and assess patentability. The business case to assess commercial viability for a project must cover not just the biology, such as “is there unmet medical need” but also, “what is the IP position”.

Streamlining patent research with natural language processing (NLP) text mining

So, scientists and researchers need to be able to access the information on genes and diseases in patents. But patents can be hundreds of pages long and contain complex information constructions and interconnected facts.  Manual patent research is a time-consuming and costly process. More and more pharma companies, such as Pfizer, are looking to NLP text mining to keep up to date with their patent literature.

Pfizer researchers use Linguamatics Life Science Platform powered by I2E to find patents relating to specific diseases. The results feed a database to visualize gene targets, invention type, competitor organizations and overall patent “relevancy”. 


The combined value of NLP and Machine Learning – a concrete example

With the rising costs of de novo drug discovery, and increasing focus on rare diseases, there is continuous innovation for methods and solutions to find new uses for existing drugs.  I was interested to hear of a novel approach for this, published recently by Eric Su and Todd Sanger at Eli Lilly. In this paper, “Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov”, the authors describe the combined use of Natural Language Processing (NLP) and Machine Learning (ML), to extract potential new uses of existing drugs.

It’s quite astonishing how often in the last weeks and months I’ve been asked about the interplay between NLP, Artificial Intelligence (AI), and ML. It seems that everyone wants to understand more about the real potential (rather than the hype that is being shouted from the rooftops) that these tools will provide to impact healthcare, research, and many other areas of our lives, in the next decade.

So, let’s delve further into this concrete example of the combined value of NLP and ML. The innovative step here was to exclude trials for a specific indication, such as cancer, and then find trials with Serious Adverse Events (SAEs) classified as cancerous. The researchers then looked to see if the placebo arm had more cancerous SAEs. If the placebo arm had more cancer-related SAEs than the treatment arm, they hypothesized that the treatment has a positive anti-cancer effect.


What are the challenges facing life sciences and healthcare organisations, where text analytics can play a part?  This is one of the key questions that I ask myself and others regularly. There is so much buzz at the minute around big data, real world data, healthcare informatics, wearables; but what is really working, and what is just hype?

One of the ways we get input on this question is, of course, meeting our customers and hearing about their successes. Linguamatics hosts two user group meetings every year, and our European Spring Text Mining Conference is coming up rapidly. Held over 3 days in April, the conference provides scientists and clinicians interested in text mining to come for hands-on training workshops, round table discussions, and a day of talks from both Linguamatics staff and our customers.

This year, our customer speakers encompass a wide range of use cases, spanning the pipeline of discovery, development, and delivery of therapeutics:


Clinical Trials text mining can speed key decisions, effective site selection and trial design 

Clinical trials form the cornerstone of evidence-based medicine, and are essential to establishing the safety and efficacy of new drugs. Each new drug, before being approved by regulatory agencies, must pass through a set of gates. At the very basic level these include phase 1 for first-in-human safety; phase 2 for efficacy and biological activity against the target; and phase 3 for safety, efficacy and effectiveness of the new therapeutic.

At each of these phases, careful planning is essential for a successful study. The clinical study protocol covers objective(s), design, methodology, statistical considerations and organization of a clinical trial, and ensures the safety of the trial subjects and integrity of the data collected.

Over recent years, clinical trial designs and procedures have become more diverse and more complex. The impact of precision medicine means trials have to be more carefully planned to ensure adequate statistical power for smaller patients groups, and adaptive, umbrella, basket and n-of-1 trials are now more frequent.

The regulatory requirements and growing complexity of clinical trials translates into more numerous and more complex eligibility criteria for study enrolment, increased site visits and required procedures, longer study duration, and more rigorous data collection requirements. From: PhRMA Biopharmaceutical Industry Profile 2016