Skip to main content

Regulatory Compliance

The pharmaceutical industry is among the most heavily regulated in the world. Linguamatics NLP can bring time-saving benefits for regulatory compliance compared to manual efforts, which are often slow and expensive.

  1. Overview
  2. IDMP Compliance
  3. Regulatory Questions
  4. Regulatory Quality Assurance (QA)
  5. Platform demo

The pharmaceutical industry is among the most heavily regulated in the world. Linguamatics NLP can bring time-saving benefits for regulatory compliance compared to manual efforts, which can be slow and expensive.


With both increased globalization and the enhanced understanding of risk comes a growing requirement for regulation. Changes in regulation, both over the past few years and looking to the future, mean companies need tools and solutions to assist with regulatory review and compliance. In some cases, meeting the regulators’ requirements is straightforward, whilst in other cases, accessing the necessary data can take a significant amount of time, money and effort, all of which increases costs but does not necessarily increase revenue. 

Linguamatics NLP provides a text analytics solution that can be deployed to find, highlight, and extract key data within regulatory documents, check for MedDRA coding, mark up inconsistencies across documents, and more.

Use Cases for Regulatory Compliance

Text Mining for Identification of Medicinal Products

IDMP (IDentification of Medicinal Products) is a set of international standards, developed by the ISO, to define the rules that uniquely identify medicinal product and the relevant elements to identify them. IDMP is being adopted globally by health regulatory agencies. EMA, FDA and other agencies are committing to implement, with the EMA aiming for implementation by 2021. It will become mandatory as the standard for transmitting Regulatory Data to authorities throughout the medicine’s lifecycle from clinical trials, marketing application, approval and pharmacovigilance. The effort to prepare for IDMP is high (both for pharmaceutical industry and for regulators) so there is a strong need to build capability and readiness to respond to agency timelines.

Capturing the many data entities required — 70% of which lie in unconnected silos of unstructured text — demands time, resources, and investment. Thanks to recent developments in text mining, manual curation is no longer the only option for extracting the necessary data attributes.

I2E, Linguamatics’ world leading text mining solution, can save organizations time and money by finding, extracting, standardizing and structuring the required data elements from IDMP-relevant unstructured text documents, including:

  • Summary of Product Characteristics (SmPC)
  • Manufacturing licenses
  • Chemistry, Manufacturing and Control (CMC) documents
  • Regulatory and compliance documents, such as eCTDs (electronic common technical documents)

Johnson & Johnson NLP Analytics for Regulatory Compliance and IDMP Master Data Management

IDMP provides a common language to connect currently siloed data across R&D and supply chain systems and is being used by many pharma companies to assist with master data management across the enterprise. Christopher Dunn & Costas Mistrellides (Johnson & Johnson Consumer) gave a presentation on the value of I2E to extract ~30 standard IDMP data elements from regulatory documents including Summary of Product Characteristics and regulatory dossiers (eCTD sections 3.2.S and 3.2.P). The challenges included a varied set of documents, some up to 50 years old, in mixed formats (Doc, Docx, image and text PDFs), across 5 different languages (English, French, Spanish, German, Italian). The output needed to be mapped to the J&J schema for their IDMP submission and internal business use. Over 1300 documents were processed, with an overall accuracy above 94%, saving J&J Consumer significant time and resources.

Mundipharma Research's Journey Towards IDMP Compliance

Mundipharma found great value from aproject using Linguamatics NLP platform to find, highlight and extract data elements for Iteration 1 from unstructured documents.


If you would like to know more about whether your organization is IDMP ready, access our application note.


If you would prefer to watch our webinar on text mining for your IDMP compliance journey, access the recording below.


Response to Regulatory Questions

Responding to regulatory questions can be a challenge. Companies often have a goal to respond to questions within a certain time frame. Short turnaround cycles can lead to a messy submission with lots of appended information and orphaned responses. Capturing and analyzing the Response to Questions (RTQs) sent to regulatory agencies around the world enables pharmaceutical companies to gain insights on:

  • frequently asked questions based on current new drug submissions;
  • current regulatory questions and concerns;
  • trends in regulatory concerns by product type (e.g. antibody-drug conjugates) and therapeutic area; and
  • geographical pattern of regulatory concerns.

By mining past questions, product-development teams can anticipate future regulatory questions and concerns, and proactively address them in the initial submission, thereby shortening approval times. Linguamatics I2E is used to mine RTQs in order to answer complex questions such as: “Was a request made for more information on the mechanisms of action of a product?” “Was product quality a concern?” “Were there questions around stability, clearance, model validation?”

Effective use of these RTQ responses by the product-development teams reduces the number of errors prior to submission, and allows the teams to anticipate what the different regulatory requirements are, and how that can influence current and future development.

Text-Analytics to enable Data-Driven Quality Risk Management and Compliance

At GSK, the Biopharm Product Development and Supply (BPDS) utilizes a data-driven approach to risk management, by consolidating internal and external regulatory data feeds (e.g. RTQs, FDA Warning letters, BLA review reports) into a Data Lake. This is then structured using NLP  which then enables the extraction of intelligence embedded in the documents.

Chaya Duraiswami presented at Linguamatics TMS 2018, and you can read more here.

Access our application note on Text Analytics for Regulatory Affairs. 


Regulatory Quality Assurance (QA)

Linguamatics has worked with pharmaceutical customers to develop an automated process to improve quality control of regulatory document submission.

Consolidation of the various different reports and documents into the overview document set required by the regulator necessitates significant volumes of manual checking and cross-checking, from the subsidiary documents to the master.  The process is generally manual and, therefore, both slow and error-prone and errors can result in applications being delayed.

Using I2E to identify inconsistencies within submissions can save weeks of tedious manual checking and prevent a re-submission request, potentially saving millions of dollars.

To learn more about accelerating drug approvals with better regulatory QA, read our blog.

Platform demo



Ready to get started?

Request a Demo

Questions? Ask our experts