The pharmaceutical industry is among the most heavily regulated in the world. Linguamatics NLP can bring time-saving benefits for regulatory compliance compared to manual efforts, which can be slow and expensive.


With both increased globalization and the enhanced understanding of risk comes a growing requirement for regulation. Changes in regulation, both over the past few years and looking to the future, mean companies need tools and solutions to assist with regulatory review and compliance. In some cases, meeting the regulators’ requirements is straightforward, whilst in other cases, accessing the necessary data can take a significant amount of time, money and effort, all of which increases costs but does not necessarily increase revenue. 

Linguamatics NLP provides a text analytics solution that can be deployed to find, highlight, and extract key data within regulatory documents, check for MedDRA coding, mark up inconsistencies across documents, and more.

Use Cases for Regulatory Compliance

Text Mining for Identification of Medicinal Products

IDMP (IDentification of Medicinal Products) is a set of international standards, developed by the ISO, which will become mandatory in Europe in a phased approach, effective from 2018, and will also be adopted by the FDA and globally over the next few years.

Capturing the 300 – 2,000 data entities required per product—70% of which lie in unconnected silos of unstructured text — demands time, resources, and investment. Thanks to recent developments in text mining, manual curation is no longer the only option for extracting the necessary data attributes.

I2E, Linguamatics’ world leading text mining solution, can save organizations time and money by finding, extracting, standardizing and structuring the required data elements from IDMP-relevant unstructured text documents, including:

  • Summary of Product Characteristics (SmPC)
  • Manufacturing licenses
  • Chemistry, Manufacturing and Control (CMC) documents
  • Regulatory and compliance documents, such as eCTDs (electronic common technical documents)

Johnson & Johnson NLP Analytics for Regulatory Compliance and IDMP Master Data Management

IDMP provides a common language to connect currently siloed data across R&D and supply chain systems and is being used by many pharma companies to assist with master data management across the enterprise. Christopher Dunn & Costas Mistrellides (Johnson & Johnson Consumer) gave a presentation on the value of I2E to extract ~30 standard IDMP data elements from regulatory documents including Summary of Product Characteristics and regulatory dossiers (eCTD sections 3.2.S and 3.2.P). The challenges included a varied set of documents, some up to 50 years old, in mixed formats (Doc, Docx, image and text PDFs), across 5 different languages (English, French, Spanish, German, Italian). The output needed to be mapped to the J&J schema for their IDMP submission and internal business use. Over 1300 documents were processed, with an overall accuracy above 94%, saving J&J Consumer significant time and resources.

Mundipharma Research's Journey Towards IDMP Compliance

Mundipharma found great value from aproject using Linguamatics NLP platform to find, highlight and extract data elements for Iteration 1 from unstructured documents.


Response to Regulatory Questions

Responding to regulatory questions can be a challenge. Companies often have a goal to respond to questions within a certain time frame. Short turnaround cycles can lead to a messy submission with lots of appended information and orphaned responses. Capturing and analyzing the Response to Questions (RTQs) sent to regulatory agencies around the world enables pharmaceutical companies to gain insights on:

  • frequently asked questions based on current new drug submissions;
  • current regulatory questions and concerns;
  • trends in regulatory concerns by product type (e.g. antibody-drug conjugates) and therapeutic area; and
  • geographical pattern of regulatory concerns.

By mining past questions, product-development teams can anticipate future regulatory questions and concerns, and proactively address them in the initial submission, thereby shortening approval times. Linguamatics I2E is used to mine RTQs in order to answer complex questions such as: “Was a request made for more information on the mechanisms of action of a product?” “Was product quality a concern?” “Were there questions around stability, clearance, model validation?”

Effective use of these RTQ responses by the product-development teams reduces the number of errors prior to submission, and allows the teams to anticipate what the different regulatory requirements are, and how that can influence current and future development.

Access our application note on Text Analytics for Regulatory Affairs. 


If you would like to know more about whether your organization is IDMP ready, access our application note.


If you would prefer to watch our webinar on text mining for your IDMP compliance journey, access the recording below.


Regulatory Quality Assurance (QA)

Linguamatics has worked with pharmaceutical customers to develop an automated process to improve quality control of regulatory document submission.

Consolidation of the various different reports and documents into the overview document set required by the regulator necessitates significant volumes of manual checking and cross-checking, from the subsidiary documents to the master.  The process is generally manual and, therefore, both slow and error-prone and errors can result in applications being delayed.

Using I2E to identify inconsistencies within submissions can save weeks of tedious manual checking and prevent a re-submission request, potentially saving millions of dollars.

To learn more about accelerating drug approvals with better regulatory QA, read our blog.