IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.
The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.
Linguamatics NLP extracts IDMP data elements
Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.
I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert. Find out more about I2E's Extract Transform Load (ETL) solutions here.
Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.
Productizing text-mining workflows for IDMP compliance
Mundipharma Research is now planning to productize the text-mining workflow for IDMP, as well as exploring how to expand the use of I2E across the wider enterprise, including:
- improving MedDRA coding consistency across clinical and regulatory data;
- improving efficiency and effectiveness of Quality Control and Assurance across regulatory and other documents;
- capturing regulatory Response to Questions and global regulatory guideline changes;
- enhancing pharmacovigilance risk-monitoring capabilities; and
- improving Clinical Development Program (CDP) design by providing more realistic CDP timelines and cost estimates to the Board, to support investment decisions.
To find out how powerful NLP platforms such as Linguamatics Life Science enable IDMP data elements to be extracted automatically from compliance documents and regulatory dossiers, streamlining the collection of data, please download the full case study here.